Optimising TCP for Citrix HDX/ICA including Netscaler

MArius
Marius Sandbu – NGCA (NVIDIA GRID Community Advisor)  aka Clever Viking!

The TCP implementation within Citrix HDX/ICA protocol used by XenDesktop and XenApp and also Citrix Netscaler is pretty Vanilla to the original TCP/IP standards and definition and the out-of-the-box configuration usually does a good job on LAN. However, for WAN scenarios particularly with higher latencies and certain kinds of data (file transfers), Citrix deployments can benefit greatly from some tuning.

 

One of our new NGCAs (NVIDIA GRID Community Advisors) Marius Sandbu has written a must-read blog on how to optimize TCP with a Citrix Netscaler in the equation: http://msandbu.org/tag/netscaler-tcp-profile/Marius highlights some of the configuration optimisations hidden away in the Netscaler documentation and you’ll probably want to refer to that  documentation too (https://docs.citrix.com/en-us/netscaler/11-1/system/TCP_Congestion_Control_and_Optimization_General.html).

Citrix HDX TCP is not optimized for many WAN scenarios but at the moment it can also be tuned manually following this advice: CTX125027 – How to Optimize HDX Bandwidth Over High Latency Connections. This is one configuration I’d love to see Citrix automate as having to tune and configure the receiver is fiddly and also not possible in organisations/scenarios where the end-points and server/network infrastructure might be provided by different teams or even companies (e.g. IaaS).

 

For Citrix NVIDIA GRID vGPU customers with looking at high network latency scenarios – it really is worth investigating the potential and benefits of TCP window tuning. I’d be really interested to hear feedback if you have tried this and what your experience / thoughts are too!

 

Norwegian, Marius Sandbu was recently awarded NGCA status by NVIDIA for his work with our community through his Netscaler, remoting protocols and experience with technologies such as UDP and TCP/IP. You can follow him on twitter @msandbu and of course do follow his excellent blog on http://msandbu.org/ !!!

NVIDIA GRID – A Guide on GPU Metric Integration for Citrix XenServer

Just a quick blog aimed at those looking to develop GPU hypervisor monitoring products by integrating the NVIDIA GPU metrics exposed by XenServer via their APIs. Really it’s a bit of a guide as to where to find the information provided by Citrix.

GPU-Graph

Background

Two NVIDIA GPU technologies are available on Citrix XenServer:

  • GPU (PCIe) pass-through (including GPU-sharing for XenApp and VDI passthrough)
  • vGPU (shared GPU technologies)

Owing to the nature of PCIe passthrough whereby the hypervisor is bypassed and the VM itself obtains complete control and sole access to the GPU, host and hypervisor level metrics are not available to the NVIDIA SDK and APIs on host nor to the hypervisor.

Developing a supported solution

Many Citrix customers insist on a monitoring solution being certified by the vendor via the Citrix Ready program. ISVs are advised to join the Citrix Ready program (access level is free) to obtain advise on developing a supported product and to eventually certify and market their product. In particular ISVs are recommended to evaluate the conditions of the vendor self-certification “kit” for supported products.

Whilst monitoring can be performed by inserting a kernel module or supplemental pack into XenServer’s dom0 this is an unsupported mechanism that Citrix generally will not support and customers are rarely willing to compromise their support agreements to use such products. ISVs are strongly advised to consider using the XenServer APIs and SDK to access metrics in a supported manner. See: https://www.citrix.com/partner-programs/citrix-ready/test.html (under XenServer-> Citrix XenServer (6.x) Integrated ISV Self-Certification Kit).

XenServer SDK / API

The XenServer API provides bindings for five languages: C, C#, Java, Python and Powershell.

XenServer maintains a landing page for ISV developers: http://xenserver.org/partners/developing-products-for-xenserver.html

Additionally there is developer (SDK) support forum where many XenServer staff answer questions: http://discussions.citrix.com/forum/1276-xenserver-sdk/

XenServer Metrics

XenServer captures metrics in RRDs. Details of the RRDs, code examples and information on how the XenServer SDK can be used to access the metrics are given on this landing page: http://xenserver.org/partners/developing-products-for-xenserver/18-sdk-development/96-xs-dev-rrds.html

XenServer have integrated many of the metrics available from NVIDIAs NVML interface into their RRDs. This means customers can access the metrics via the XenServer APIs in a supported manner rather than inserting unsupported kernel modules to call NVML in the hypervisor’s host operating system (dom0).  See https://www.citrix.com/blogs/2014/01/22/xenserverxendesktop-vgpu-new-metrics-available-to-monitor-nvidia-grid-gpus/

XenServer APIs – querying GPU infrastructure:

For information on which VMs have vGPUs, the type of vGPU profile etc. see http://nvidia.custhelp.com/app/answers/detail/a_id/4117/kw/citrix%20monitoring under “Checking your GPU configuration” for links to appropriate XenServer documentation.

 

Useful links:

 

NVIDIA GRID: Linux Guest OS support for Linux distributions on Citrix and VMware

I was recently involved in a support inquiry where a user wanted to know if NVIDIA GRID vGPU was available on Linux VDAs with the Linux guest OS, OpenSUSE LEAP (the answer at the time of writing is that it’s NOT!). Finding the answer was a lot harder than I expected as both VMware and Citrix documentation took a bit of hunting around.

Much of the marketing around Linux VDA’s mentions support for “SUSE”, “CentOS” or other genres of Linux, such as this blog. It is important that customers check both their hypervisor and VDI solutions official support matrix as both Citrix and VMware only certify, QA and support specific versions of Linux Guest OSs (usually only enterprise supported versions). Customers may find themselves unsupported by the virtualization vendors if they fail to check that the OS and specific version is supported by both their hypervisor and VDI solution (especially if mixing vendors such as Citrix XenDesktop on VMware ESXi).

Both vendors are evolving their Linux support rapidly and customers must check the documentation associated with the relevant versions of VMware/Citrix products they intend to use.

NVIDIA cannot provide support for guest OSs unsupported by the relevant virtualization vendor and as such customers are recommended to contact VMware/Citrix if they wish to use alternative versions/distributions. It is very likely many other varieties of Linux will “work” but customers should be aware that they will be unable to obtain hypervisor or VDI support in the event of an issue.

At the time of writing Horizon 7 on ESXi supports:

  • Ubuntu 12.04 and 14.04
  • Red Hat Enterprise Linux (RHEL) 6.6 and 7.1
  • CentOS 6.6
  • NeoKylin 6 Update 1 (Chinese)
  • SUSE Linux Enterprise Desktop 11 SP3

 

At the time of writing Citrix XenDesktop 7.9 on XenServer supports:

  • SUSE Linux Enterprise:
    • Desktop 11 Service Pack 4
    • Desktop 12 Service Pack 1
    • Server 11 Service Pack 4
    • Server 12 Service Pack 1
  • Red Hat Enterprise Linux
    • Workstation 6.7
    • Workstation 7.2
    • Server 6.7
    • Server 7.2
  • CentOS Linux
    • CentOS 6.7
    • CentOS 7.2

Ongoing if you want to check the OSs available for a Linux VDA you should follow the advice below.

Citrix

XenServer Support for Linux Guest OSs

This is documented in the “Citrix XenServer® Virtual Machine User’s Guide” for the relevant version of XenServer e.g. for 7.0, here: http://docs.citrix.com/content/dam/docs/en-us/xenserver/xenserver-7-0/downloads/xenserver-7-0-vm-users-guide.pdf

XenDesktop Guest OSs Supported by the Linux VDA

This can be found in the Linux VDA product documentation for the relevant version of XenDesktop under the section “System Requirements” e.g. for XenDesktop 7.9 Please see http://docs.citrix.com/en-us/xenapp-and-xendesktop/7-9/install-configure/suse-linux-vda.html (This is where I had to hunt around as bizarrely Citrix detail the genres and versions of Linux supported under each supported OS rather than in a master list, so the SUSE documentation is where you can find RHEL and other supported versions listed)

VMware

ESXi/vSphere Support for Linux Guest OSs

Supported Linux OSs are listed in the “VMware Compatibility Guide”: https://www.vmware.com/resources/compatibility/search.php?deviceCategory=software

Horizon Support for Linux Guest OSs

The versions and distributions supported by Horizon are listed in the FAQ for the appropriate release e.g. for Horizon 7, here: http://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/products/horizon/vmware-horizon-for-linux-faq.pdf

New Cisco Validated Design featuring UCS B200 M4 with NVIDIA GRID M6 vGPU – available now!

It’s great to see a new validated design released by Cisco in recent weeks. Particularly as this features the NVIDIA GRID M6 options for blade servers to enable virtualized GPU-accelerations (vGPU). This reference architecture joins other available for UCS but in particular features a reference blueprint for Citrix XenDesktop/XenApp 7.7 and VMware vSphere 6.0 for 5000 Seats. Key features include

  • Citrix XenDesktop/XenApp 7.7.
  • Built on Cisco UCS (including Cisco B200 M4 Blade Server) and Cisco Nexus 9000 Series
  • with NetApp AFF 8080EX
  • VMware vSphere ESXi 6.0 Update 1 Hypervisor Platform

Cisco have done a great job providing a comprehensive guide and reference for a full VDI/XenApp deployment that includes networking, storage and graphics acceleration considerations.

 

Cisco-NVIDIA Relationship

There are plenty of case studies, whitepapers and webinar recording covering Cisco long-investment in NVIDIA GRID and vGPU too:

NVIDIA GRID GPUs perfect for keeping up with the Raspberry Pi and the next generation of end points

piaio2

Citrix have been making a fair bit of noise about their end-client (Receiver) being available and supported in-conjunction with partner ThinLinx on the Raspberry Pi, which with peripherals is proving a sub-$100 thin-client, capable of handling demanding graphics and frame rates (fps) of 30fps or more (YouTube is usually 30fps).

The Raspberry Pi and other low-cost end-points such as the Intel NUC are capable because they support hardware decode of protocols such as H.264 and JPEG used by HDX/ICA, they have SoC (system on a chip) hardware designed to handle graphics really very well.

There has been a lot of excitement in the industry and community, with traditional thin-clients typically costing $300-$600 the Pi offers potential cost savings but also the opportunity to use VDI/Application remoting (XenDesktop or XenApp) in scenarios where it wouldn’t have made financial sense.

There have been some stunning videos demonstrating the potential of this new class of low-cost endpoint such as:

And what do all these videos have in common – THEY WERE ALL RECORDED USING SERVERS BACKED BY NVIDIA GRID virtualized GPUs (vGPU or GPU-sharing via XenApp and GPU pass-through)! Because:

  • If you have effective hardware decode on the client you need your server to be able to keep up and pump out high frame rates and visual quality
  • With low cost clients virtualized (shared vGPU or XenApp GPU-sharing via passthrough) offers a cost effective way to boost the graphical power of the server without the need for a dedicated GPU per user

The Citrix Pi project highlights the power of using hardware encode and decode on GPUs. With the GPU able to take the brunt of the workload leading to:

  • Battery and power savings
  • The opportunity to offload server CPU for the protocol encode on the server boosting scalability, whilst Citrix only currently have hardware encode for their Linux VDA. With VMware Blast Extreme and NICE already offering it. The opportunity for NVIDIA GRID customers to see future value, as Citrix catches up, are there. The Pi project and VMware / NICE developments are vindication that this is where the industry is going – there are simply many tasks associated with virtualizing graphics that GPUs are best suited to.
  • Tests with blast extreme and NVENC shows up to 51ms lower latency on screen updates and lower CPU usage http://blogs.vmware.com/euc/2016/02/vmware-horizon-blast-extreme-acceleration-with-nvidia-grid.html so the potential for Citrix to emulate is clear.

Savings on the end-client costs is likely to change the balance of VDI deployment costs, allowing customers to invest more in the server data centre and freeing up budget for GPUs to access the user-experience improvements, benefits of consolidation and power savings. It really doesn’t make sense to have high-end workstations or PCs dotted around remote locations in use for a few hours a day for many customers anymore.

Once you get GPUs into a data centre, many sys admins report a reduction in costs associated with sluggish performance and  helpdesk calls. It’s not just high-end graphics benefit from GPU-acceleration but regular office applications, browsers and unified communications (Cisco Jabber, Skype, etc). You can read more here: https://www.virtualexperience.no/2015/11/05/mythbusting-browser-gpu-usage-on-xenapp/ and also here: https://www.citrix.com/blogs/2015/12/03/gpu-for-the-masses-with-xenappxendesktop/.

Exciting times! I somehow suspect we’ll see Citrix make (quite rightly) quite a bit of fuss over the Pi at their upcoming Synergy event! A performant end-client though needs to be fed by a performant server and NVIDIA GRID is a great match.

More Info:

When less is more! Avoiding excessive web content!… I hate hi-res stock photography!

Last week I had a twitter user make comment that implied the reason some of our recent thinwire enhancements have happened is because some of the HDX development and product management team are in the UK! Stefan made this comment

goodhdx

After seeing a fellow tweeter struggling to get a decent mobile signal in central London, supposedly one of the world’s most developed cities! Continue reading When less is more! Avoiding excessive web content!… I hate hi-res stock photography!

Great real user feedback on thinwire compatibility mode (thinwire plus)!

My colleague, Muhammad, blogged a few weeks ago about a new optimised graphics mode that seems to be delighting users with significant ICA protocol innovations, particularly those users with constrained bandwidth (read the details – here). During its development and various private and public tech previews this feature has been known as Project Snowball/Thinwire Plus/Thinwire+/Enhanced Compatibility mode but in the documentation it is now “Thinwire Compatibility Mode” (read the documentation – here).

I was delighted to read a detailed review by a Dutch consultant (Patrick Kaak) who has been using this at a real customer deployment. In particular it’s a good read because it contains really specific detailed information on the configuration and bandwidth levels achieved per session (<30kbps). Unfortunately (if you aren’t Dutch) it is written in Dutch so I had to pop it through google translate (which did an amazing job).

You can read the original article by Patrick here (if you know Dutch!): http://bitsofthoughts.com/2015/10/20/citrix-xenapp-thinwire-plus/

What I read and was delighted by is the google translated version below:

Since Windows 2012R2, Microsoft make more use of DirectX for the graphic design of the desktop, where they previously used GDI / GDI + API calls. This was evident at the ICA protocol, which was heavily optimized for GDI and triggering a higher bandwidth in Windows 2012R2.

1. without tuning halfway this year we were at one of our customers engaged in a deployment of XenApp 7.6 Windows 2012 R2. Unfortunately, this client had a number of low bandwidth locations. The narrowest lines were 256kbit and there were about seven session running over, which equates to approximately 35 kbit / s per session. We had the h264 (Super Codec) compression already disabled because it caused a lot of high bandwidth and a lot of optimization applied in the policies, but we did not get the line under the 150kbit / s. On average, we came out of around 170 kbit / s. The 35 kbit / s never seemed to be achievable.

After some phone calls with Citrix Project Snowball, we decided to embrace a project that focused on optimizing ThinWire within the ICA protocol and what we call since Feature Pack 3 now ThinWire Plus. This would again reduce the bandwidth to a level which previously Windows 2008R2 was feasible.

After installing the beta on the test servers turned out that we had to force the server to choose the compatibility mode. A moment of choice, because to do so we had to turn off the Super Codec in its entirety for the server for all users that are on there. This forces you to use each session to ThinWire, even where the lines have enough bandwidth and the Super Codec can be used. This is done by implementing the following registry key:

HKLM \ Software \ Citrix \ Graphics
Name: Encoder
Type: REG_DWORD
Value: 0

It has furthermore been put to Medium in the policy Progressive Compression Level, as was indicated in the guidelines for ThinWire Plus.

snowball active – plus thin wire without optimizations: first results were superb. Immediately after installing ThinWire Plus dropped the average bandwidth already with 50% to 83 kbit / s.

After further tuning of all the components, it was even possible to still continue to go down. Previously had to some extreme measures for people on low bandwidth. The settings were made to further reduce the bandwidth. In the eye is the target frame rate that has been put to 15fps, and the use of 16 bit colors was carried out. Finally, a limitation per session bandwidth imposed maximum of 150 kbps.

gpoMaximum allowed color depth: 16 bits per level. (reduction of 10-15% of bandwidth only for entire server to switch)
Allow Visual Lossless Compression: Disabled
Audio over UDP: Disabled
Client audio redirection: Disabled
Client microphone redirection: Disabled
Desktop Composition Redirection: Disabled (prevents DCR is preferred over Enhanced ThinWire)
Desktop Wallpaper: Disabled (ensures uniform background color)
Extra color compression: Enabled (reduction of bandwidth, increased server CPU)
Additional color space threshold: 8192 kbs (default)
Heavyweight Compression: Enabled
Lossy Compression Level: High
Lossy compression threshold: 2147483647 Kbps (default)
Menu animation: Prohibited (reducing bandwidth by not using menu animations)
Minimum Image Quality: Low (always apply additional compression top sharper image)
Moving image compression: Enabled
Optimization for Windows Media redirection over WAN: Disabled (WMV prevents the WAN towards the client)
Overall Session bandwidth limit: 150 Kbps (for non-GMP, maximum bandwidth per session)
Progressive compression level: medium (required for enhanced thin wire)
Progressive compression threshold: 2147883647 Kbps (default)
Target frame rate: 15 fps
Target minimum frame rate: 10 fps (default)

3. snowball heavy tuned implementation of this policy came in the test situation, the average at 16 kbit / s. A value that we absolutely did not think we could get to in the beginning. In the user tests it was revealed that it still worked well on the environment, despite all the limitations that we had set in the policy.

After all changes were made in the production environment, we see that an average session now uses around 30 kbit / s. Slightly more than in the test environment, but certainly not values ​​that we complained about. Users can operate well and be happy.

Incidentally we discovered when testing behind it at a pass-through application (where users first connect to a published desktop and then launch a published application on another server), the ThinWire Plus configuration on both servers must be running. If we did not see we increase the bandwidth usage to the client again significantly.

(all my colleagues, thank you for providing the performance measurements!)