Microsoft RDP: End-client H.264 (4:4:4) Hardware-decode Support on existing decoders 4:2:0! Elegant and Cool!

2376-image_thumb_20aa6c9e
Great pictorial explanation on Microsoft Blogs!

Microsoft have just announced a series of tech preview enhancements to their RDP protocol, which you can read here. I’ve blogged endlessly about the problems of image quality for CAD line data and text with H.264 4:2:0 protocols which have been a long established standard for older encoders and GPUs.

The Microsoft announcement gives a very good overview (with pictures) of some of the visual issues that can arise with H.264 4:2:0 in their announcement – which I recommend looking at, here.

H.264 4:4:4 has a higher visual quality but at a cost of bandwidth and cost to encode/decode and with 4K monitors, multi-monitors and hi-res tablets becoming standard, software encoders can struggle. Many existing end-client devices/GPUs/SoC support hardware decode 4:2:0 but not 4:4:4 and whilst newer hardware and GPUs are introducing 4:4:4 decoders many existing devices only support 4:2:0 decode.

Hardware encode and decode is really useful!

Offloading encoding and decoding from CPU is very beneficial. On servers it frees up CPU and vCPU to allow higher server scalability and lower costs, whilst being in hardware is really performant and can offer great user experience and higher frame rates. This is also true on the end-client but on mobile devices there can be an additional benefit of saving battery life.

VMware have announced their new Blast Extreme protocol in the last few weeks which offers hardware encode using NVIDIA’s NVENC capabilities for GPUs, so seem to agree with this view. As do NICE Software who have a similar implementation.

End-Client Decode

Many end clients have SoC/GPUs capable of hardware decode for 4:2:0 and most vendors such as VMware, Citrix, fra.me, NICE, RDP etc. take advantage of this. However if Microsoft have done what I suspect they have – that’s something I haven’t come across and probably new. There are various ways of plugging in an extra bit of hardware to an end-client adding 4:4:4 decode if the original client doesn’t have it but it’s hardly a mobile solution.

About a year ago I read a paper from Microsoft Research: Tunneling High-Resolution Color Content through 4:2:0 HEVC and AVC Video Coding Systems, describing how to remap 4:4:4 frames into a series of 4:2:0 frames. A similar concept to mapping a 2D physics problem into a series of 1D problems, or the algorithms that allow you to do double precision maths on a single precision GPU…. i.e. super elegant and a fantastic idea. As soon as I saw it I thought it was fab and had legs. The documentation on the RDP tech preview suggests they may well have implemented this or something similar:

  • “As part of the AVC 444 mode in RDP 10 we solved the challenge to get 4:4:4 quality text with 4:2:0 hardware encoders / decoders.”
  • “With the Windows Remote Desktop Client (MSTSC.EXE) the AVC 444 mode automatically uses the AVC/H.264 Hardware decoder if available via the Windows DirectX Video Acceleration (DXVA) API”

It does sound like they have done this and this would make H.264 4:4:4 a viable option for 4K and a range of mobile device end-clients where CPU would struggle to give an acceptable frame rate and 4:2:0 simply isn’t good enough on CAD data or text. Citrix offers a CPU based software 4:4:4 mode called “Visually Lossless” which gives an idea of the quality 4:4:4 can offer and it is recommended using the “Build-to-visually-lossless” mode to avoid excessive CPU limitations – which I’ve previously written about (here).

I think Microsoft may have just added another big gun to the protocol arms race! It will be interesting to see if others follow…. RemoteFX and RDP enhancements are definitely worth reading! As are obscure academic Microsoft Research papers, they’ve some interesting stuff in there.

2 thoughts on “Microsoft RDP: End-client H.264 (4:4:4) Hardware-decode Support on existing decoders 4:2:0! Elegant and Cool!”

  1. Many thanks for sharing this, Rachel. When progress is made in one area it drives development in others. While 64-bit emulation isn’t always so great if you look at some of the benchmarks on both some servers and GPUs, offloading heavy-duty pure computational work like this to a GPU as opposed to burdening the CPU with the task makes total sense. CPUs with built-in GPUs are looking better all the time!

    Liked by 1 person

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s