Back in May 2015, at Citrix’s Synergy 2015, I was involved with an HDX demo at Citrix Synergy 2015, the remit was pretty simple – to show really good HDX 3D Pro frame rates, good enough to satisfy gamers, on low-cost hardware using ONLY existing, shipping-today, fully-supported production ready products. We also want to show that HDX 3D Pro can run really well on low cost entry level thin clients, and so we chose a single core Intel NUC DE3815TYKHE Atom Processor. The upshot was we showed 60fps gaming on a hardware device costing $150 with a $15 (System on a Chip) SoC enabled plugin supported OS.
Low cost Thin Clients can be super performant if they have good hardware decode support and SoC technology. The Citrix Linux Receiver provides an SDK for Thin Client vendors, or a third-party to write a software plugin that will connect the hardware on the Thin Client to HDX and the Receiver. The presence (or not) of such a plugin and the quality of it is usually key to a performant thin-client.
The Citrix Thin Client Model
This is the Citrix thin-client model, we provide a number of default software and hardware plugins to process various types of encoding (jpeg, H.264 etc.) plus stubs into which an OEM or third party can drop additional plugins optimised for the specific device.
The diagram shows the Linux Receiver provides a number of graphical acceleration libraries such as GStreamer, libjpeg-turbo etc. (these may change in time) that will be used if a third-party does not supply alternatives. An OEM such as an OS vendor or a thin-client manufacturer can supply additional hardware accelerated plugins such as CTXJPEG or CTXH264 to take advantage of the thin-client hardware to do hardware on-chip decode of jpeg or H.264 data.
The Intel NUC DE3815TYKHE Atom Processor and ThinLinx OS
Intel as far as I know don’t supply HDX SoC accelerated plugins for the NUC, so we used our partner ThinLinx’s OS which does include and fully support them for the NUC. ThinLinx OS not only includes such hardware acceleration (plugins for CTXH264 etc.) but also the tools and software to effectively manage thin clients and for around $15 per client. You can read the details, here on the ThinLinx website.
Gaming and Frame Rates
A lot of our HDX engineers quite like their gaming, there are genuine “business” uses such as helicopter pilot training, health and safety simulation training (firefighters) etc… but in general gaming workloads and the user-expectations of gamers is good way to harden our technologies.
A lot of people think HDX (High Definition user eXperience) is just ICA and the graphics protocols, but it is far more. It’s about the whole user-experience: audio, USB support for peripherals like gaming controllers and general responsiveness beyond visual.
The human eye is generally considered to find 30fps pleasing (TV is typically 24 fps), and the incremental benefits between 30-60 fps minimal; but perception is a funny thing and many users find 60fps more pleasant although visually the differences are mathematically minimal. A lot of this for gamers is associated with the responsiveness (when they “pull the trigger”). At 30fps, there is 0.03333 secs between frames at 15 fps 0.0666 so if a mouse click just misses a frame you pick up an extra 0.033s (33ms) of “latency” in the user feel – this is why gamers obsessed with 60fps (it’s not necessarily the visuals above 30ms but it can “feel” different.
The Linux Receiver has built in end-user instrumentation
A lot of those who tried the demo seemed slightly surprised to see an instrumentation window on the Linux Receiver. This is actually a feature I’d like to see expanded to all our receivers long term.
To enable the instrumentation you need to configure your client by adjusting the wfica options using
- “-rm Dcdf” to enable the instrumentation
As you won’t normally be calling wfica directly there is an environment variable WFICA_OPTS which can be set to pass the command line options for example:
- export WFICA_OPTS=”-rm Dcdf”
Update (3rd Aug 2015): Grrr…. my readers are too sharp! The instrumentation is indeed screenshot from a different system. Bonus points for the sharp-eyed who spotted it was on a Quad core system form the CPU readings, yes this was the case. The NUC in the demo was single core….
- Highly Graphical/Interactive Content
- Borderlands 2 Multiuser Game
- Up to 60fps 1080p (1920×1080 resolution) Graphics
Purpose of Demo
- Show Challenging near native UX
- Monitor the performance
- Ask questions
- Have fun
THE SHOPPING LIST TO DO THIS DEMO YOURSELF
Shipping Citrix Software Products
- XenServer 6.5
- XenDesktop 7.6 with 3DPro VDA
- Linux Receiver 13.1
Shipping Partner Software Products
- Windows 8.1
- ThinLinX OS (Linux) with HDX SoC Plugins
Shipping Partner Hardware
- Dell R720 and HP DL380 G8 servers
- NVIDIA GRID K2 Graphics Cards (K260Q vGPU profiles used to share GPUs)
- Intel NUC DE3815TYKHE Atom Processor
- Additional memory for the NUC: 2GB (widely available e.g. here and here for around $10-$15): Crucial 2GB Single DDR3 1600 MT/s (PC3-12800) CL11 SODIMM 204-Pin 1.35V/1.5V Notebook Memory Module CT25664BF160B
- ViewSonic 27” 1080p Monitors
- Xbox USB Game Controllers
- Shows the low cost performance capabilities of Citrix and Partner shipping technologies.
- The sum is greater than the parts. You can configure this yourself.
- Fun – “A World Where People Can Work and Play from Anywhere”.
- Things you don’t think of when planning a demo: Borderlands 2 requires you to buy ammunition, which resulted in our most senior engineering manager spending an awful lot of time running around the game buying bullets for the next demo – sorry, Joanna! We didn’t think about that one too well!
In normal production usage we would recommend customers avoid fiddling with registry keys and policies – especially if they don’t understand what they are changing. I normally recommend new HDX 3D Pro users start with Jason Southern from NVIDIA’s policy template, available and explained – here.
For the Synergy demo, we did do a bit of tuning. With hindsight I’m not sure we should have as it just raised questions and I’m not sure the performance gains warranted the questions this fiddling raised. However many of those who saw the demo took away the configuration and have asked questions, so in the interests of transparency I’ve included some uber-nerdy detail after this blog.
The feedback from those who saw this demo at Synergy was superb:
- “Stunning” said one blogger, here
- CTPs Neil Spellings and Remko Weijnen visited the demo and Neil tweeted “it was awesome. Best kept secret@Synergy!”
- “Just finished up playing Borderlands 2 with NVidia GRID on the back end. HDX3D Pro has never looked better!!!”
- One visitor put their experience on YouTube so you can see for yourself: here
Demo Policies and Settings
- Overall session bandwidth limit
User setting – ICA\Bandwidth
38000 Kbps (Default: 0)
- Target frame rate
User setting – ICA\Visual Display
60 fps (Default: 30 fps)
- Client USB device redirection
User setting – ICA\USB Device
Allowed (Default: Prohibited)
- ini “[ClientAudio]” Section
- wfica options
“-rm Dcdf” To enable instrumentation
HDX 3D Pro VDA Registry
- GfxProvider = 0x2
- TwumEnabled = 0x1
- Vd3dEnabled = 0x0
- EncodeSpeed = 0x2
- Encoder = 0x2
- MinFPS = 0x3c
- H264EncodedData = 0x0
- LowVisualQualityCRF = 0x17
- HighVisualQualityCRF = 0x14
- SystemFlowControl = 0x0 (a setting pertaining to the Linux Receiver that users can disregard)
- Remove ExtraSharpenCount (this is a legacy setting that users will be able to disregard in the future)
All these values are in HEX, there is a HEX to DEC (decimal) calculator available, here. Some of these keys are the defaults. We are in the process of tidying up the registry keys, so please do not use the above as any type of general reference. I’m publishing them to answer the questions from those who saw the demo as to why we used them.
These settings are associated with the use of the 3D Pro VDA, and aren’t significant these should be installed as default by the HDX 3D Pro VDA:
- GfxProvider = 0x2
- TwumEnabled = 0x1
- Vd3dEnabled = 0x0
Encoder = 0x2 (In decimal = 2), means we are using the default HDX 3D Pro encoder which is pure H.264 without lossless text. Encoder = 1 is used on the Standard VDA and is H.264+lossless text.
EncoderSpeed = 0x2 (In decimal = 2). The default for HDX 3D Pro is currently (XD7.6) EncodeSpeed=1. For the Standard XenDesktop and XenApp VDAs EncodeSpeed is set to 2 by default.
- What does EncodeSpeed do?
- Value set to 1: Better image quality, but needs more CPU on client.
- Value set to 2: Image quality drops, but CPU on client is able to support higher resolution and improves performance on thin client. Favours performance.
So why did we tweak the EncodeSpeed setting? This was a gaming demo, i.e. not much text and lots of movement (transients), the image quality improvements were negligible but the gamers’ desire for that 60fps responsiveness is better met by this tweak.
These next two are quality settings we usually strongly discourage users from tweaking as it is very easy to configure a system into a state where you turn off beneficial adaptive behaviour. In this case a LAN gaming demo, and gamer engineers who knew exactly what they were doing got involved!
- LowVisualQualityCRF = 0x17 (=23 in decimal)
- HighVisualQualityCRF = 0x14 (=20 in decimal)
QualityCRF ranges between 18 and 45 with 18= highest quality and 45= lowest quality. It’s a measure of compression so the lower the QualityCRF value, the less compression applied and the higher the visual quality.
In this gaming demo scenario, where there are continuous full-screen updates going on – the system rarely, if ever, gets a chance to stop and rebalance for HDX’s adaptive behaviour to fully kick-in. The narrow Min/Max quality range was set such that:
- We lowered the Max quality slightly as the screen is always changing, so converging a static screen to “near-perfect” isn’t a requirement, and indeed we don’t want the system to raise quality too far that it has to “self-tune” back down again.
- We raised the Min quality such that if there’s a momentary “blip” in performance and H.264 tries to change the quality to the minimum – it doesn’t overshoot.
Similarly, raising the MinFPS to be close to the MaxFPS (The target frame rate set in the policies, raised to 60fps from the default 30fps) was done such that:
- In the event of a momentary blip in performance H.264 doesn’t overshoot. MinFPS (0x3c corresponds to 60fps in decimal).
In anything other than a gaming environment when using business or graphical applications you really shouldn’t have to consider such adjustments. However our HDX engineers like their gaming and their perfect trigger-happy experience!
We did end up tuning down the demo and ran the game at 50-55fps, on a 60fps session, this was due to getting some unsightly screen tearing on the client when everything was running at 60fps. One we are looking into!
We have been using thin clients for over a decade and especially with the advances in SoC technology, the versatility of these units is way more now as it was then, with not only cost and power savings as factors, but also security a major concern to take into consideration. The estimated power savings of a thin client vs. a desktop computer is around $70 per year, depending of course on regional cost differences, and that easily pays for the cost of these units over their lifetimes.
What resolution were they playing at?
I’ve setup something similar and I get great fps at 1280×800 but performance seems to drop rapidly as resolution increases (probably to be expected).
we used 1920×1080…
Great article and a good inspiration.
I agree on the point with having gamers to harden the Technology, as this usersegment usually are very susceptible to Even the smallest of changes (especially if degrading the experience).
“Thank you for your details interpretation about thin clients, here I am introducing a Versatile thin client Terminal RDP XL-500
We can use it as a
1) high end thin client device
2) Mini PC/Individual PC
3) Virtualization Ready “
Have you tried to recreate this demo (or similar game demo) using the Raspberry Pi3? I’d be curious how many FPS a $60 thin client can sustain… (ThinLinx has $10-licensed OS for RPi 2 and 3, which have h.264 HW decoders).
If using a high-end server, what are your thoughts on Zero Client gaming.
LikeLiked by 1 person
If the zero client has the decode capability to keep up with the server and the network latency is good enough… why not 😀
LikeLiked by 1 person
Hi there, would like to know more details about the hardware powering the clients, how many k2s are powering how many clients? Would love to set up one myself 🙂