VMWare ESXi announce High Availability (HA) for NVIDIA GRID vGPU VMs with vSphere 6.5

I was very pleased yesterday to see Pat Lee from VMware’s PM team tweet about this yesterday…

patleetweet

It’s something we knew VMware had added to vSphere 2016, vSphere 2016 supported in the GRID 4.1 (Nov 2016) release. As a VMware implemented feature this was something we at NVIDIA had to wait for them to announce. I think there have been a few problems with the documentation update staging which is why this has been a rather quiet feature release. I’ll update this blog with links to the documentation when it becomes available which should be soon!

But since Pat has let the cat out of the bag…. Probably best to answer a few basic questions straing away.

What is High Availability (HA)?

Basic HA is a feature to ensure VMs are up and running as soon as possible in the event of host failure. The VM will automatically restart as soon as possible on another host if one is available with sufficient resources. So for vGPU enabled VMs that means on a host with an appropriate GPU etc. Although the user will experience some down-time where possible this is minimized without the need for manual intervention by a system administrator.

Guaranteed High Availability…

This can be provided by HA features by allowing resources to be resourced such as RAM/CPU on hosts e.g. maybe 15% of a hosts capacity, which allows a guarantee that resource will be available to restart VMs upto a certain number of host failures. I believe that VMware’s configuration does not extend to configuring GPU resource reservation and so the support announced today will not offer guaranteed HA. It is a feature VMware could add in the future though if they saw sufficient demand, it is not a feature engineered by NVIDIA.

Can HA provide continual up-time?

No, not alone. Many hypervisors though offer Fault Tolerance (FT) which can provide such support, this is a very expensive feature to use as it relies on running essentially a duplicate VM on mirrored hardware which is phase-locked to the original (i.e. milliseconds behind), in the event of failure the user is switched to the duplicate with only a momentary glitch in user experience. It’s a feature essentially only used in a few safety / mission critical use cases as it’s so costly to implement.

So is Fault Tolerance (FT) supported for vGPU?

No not today, the technology to continually essentially snapshot a live GPU is not available. This is also a pre-requisite for live migration/motion e.g. vMotion and also regular snapshots.

The Future

NVIDIA and all the partners such as Citrix and VMware appreciate that live motion and snapshotting are key enterprise datacenter needs so we continue to work towards making such technology happen (it’s very technically hard I’m told!). We all know what you want and what you want our priorities to be!!!

NVIDIA GRID is architected with a software model which gives us the ability to add additional support for new OSs for customers existing hardware allowing them to pick up new features.

A few FAQs on Azure N-series inc. Do Azure N-Series VMs include NVIDIA GRID software license?

Last week I was at the UK Citrix User Group in London on December 1st 2016, the day Azure launched a new series of graphics focused. There is an NVIDIA blog as well as a Microsoft one overviewing the technology. Whilst I was there CTP Thomas Poppelgaard and others asked a few questions that I couldn’t answer on the spot, so this blog is a bit of an update for them.

The interest in Azure N-Series which use NVIDIA GRID GPUs via DDA (Direct Device Assignment – essentially pass-through) has been incredible and quite a few questions have come up already.

What regions have N-Series availability?

Check at the time of reading, but initially on 1st Dec 2016 the N-series was launched in 4 regions: South Central US, East US, West Europe and South East Asia. You can check the availability and pricings here: https://azure.microsoft.com/en-us/pricing/details/virtual-machines/windows/

Are NVIDIA GRID Software Licenses included in the pricing?

I went back and asked the Product Manager for this release and I’m told that if using VDI (e.g. for Citrix XenDesktop) then yes they are included within the Azure pricing and there is no need to buy additional NVIDIA GRID software licenses.

However if using a RDSH solution such as XenApp. End-users are required to purchase NVIDIA GRID vApps licenses per session, I believe this is because currently Azure doesn’t have the mechanism to bill for this in their current implementation, that may of course change in the future though. You can find out more about NVIDIA GRID Licensing in the guides available under “Resources->Deployment Guides” on the NVIDIA GRID home page, here.

I have struggled to find documentation on this though – so if anyone has a link! That would be useful! I’ll update this blog when I find one!

So I can use XenApp with Azure N-series?

Yes, although not the requirement to purchase vApps Licenses. Citrix themselves are very enthusiastic about this option and have published some excellent blogs, deployment guides etc. See:

Can I use Teradici Cloud Access Software on Azure N-series?

I believe so and NGCA Marius Sandbu has been trialing this already http://msandbu.org/test-run-of-teradici-cloud-access-software-on-azure-n-series/; Marius works intensely with Azure and has been part of the N-series early access program so I’d highly recommend reading his other blogs.

Can I use fra.me with Azure N-Series?

Yes, fra.me who specialize in CAD and graphical application delivery have done some great work in conjunction with NVIDIA’s own performance engineering teams to investigate ESRi ArcGIS and AutoDesk Revit performance on N-Series. Read here: https://blog.fra.me/a-first-look-microsoft-azure-n-series-vs-aws-g2-instance-type-9930c5b1a644#.2a2yh15so

Autodesk Applications – Anywhere, Anytime Mobility on Any Device! Powered by NVIDIA GRID technologies.

greendrafters1-300x225NVIDIA GRID vGPU was launched in 2013 to enable applications to virtualize GPU resource and share GPUs to enable super-responsive graphics at a cost-effective price. Autodesk applications benefit greatly from GPU acceleration and over the last 3 years we’ve seen a huge number of customers deploy Autodesk applications such as Maya, Inventor, Moldflow, Revit and Autocad with our technologies.

This week NVIDIA is yet again at the amazing Autodesk University and I thought it would be nice to highlight some of the benefits of these technologies via a review of 4 of my personal favourite real customer implementations and stories and what they achieved virtualizing Autodesk applications.

 

  • Allow university students doing course work with Autodesk applications to do their coursework on their own laptops (Macs, Chromebooks, Windows laptops) or even their iPad from wherever they want without compromising security.

For many Autodesk users a choice of end-point is vital. Students want to remotely log in to use software for course work on their own Chromebooks, Apple Macs, tablets, laptops etc. BYOD is a huge driver for virtualizing Autodesk software.  Istanbul Aydin University (IAU) have recently deployed an NVIDIA GRID deployment to 1000 users using Citrix XenDesktop and VMware vCenter on Dell R720 servers. You can read more about the BYOD driving factors that drove the project here in this Dell Case Study.

Professional AEC users and designers also often have their own preferences for iMacs or even Linux workstations and love the ability to run Windows applications on their workstation or laptop of choice. Many companies already have lots of hardware so the flexibility to repurpose it as an end-point has cost benefits too!

 

  • Architects (AEC) able to buy cloud services rather than run and maintain hardware so can quickly adapt to changing projects.

dwp|suters are an architectural firm headquartered in Australia who operate in 15 countries.  With one  option being a very costly hardware upgrade  this company decided instead to implement a Citrix XenApp DaaS solution from a service provider on HP servers. It’s worth reading the Citrix Case Study as it covers how as well as savings it allowed dwp|suters to work in a completely different way freeing them from IT management and allowing them to scale their business, take on new projects rapidly and build a larger distributed team with enhanced security and data backup benefits too.

 

  • How Autodesk with NVIDIA GRID is simply the power behind amazing world renowned AEC and BIM projects including Olympic Stadiums!

Populous has created some of the most recognisable sporting venues, including Wembley Stadium in London, the new Yankee Stadium in New York, Soccer City in South Africa, and the Fisht Stadium in Sochi, which was the focal point of the 2014 Winter Olympics. The firm used VMware Horizon View and vSphere together with NVIDIA GRID K1 and K2 to virtualize AutoDesk Revit alongside other AEC applications. You can read more here: NVIDIA Case Study or even watch this VMware Video;

 

 

  • Enables designers, engineers and users to work remotely, flexibly and differently.

At Citrix Synergy in the spring of 2014 (NVIDIA GRID’s been around a while!), I saw one of the most fascinating demos of NVIDIA GPUs running Citrix XenApp to deliver Autodesk REVIT. XenApp (or similar RDSH solutions) can be a really cost effective and user-satisfying way to deliver Autodesk applications on VMware vSphere, Citrix XenServer or physical servers using NVIDIA GPUs.

This demo wasn’t so much about the underlying cost of the technologies though but end-user experience and utility. NVIDIA commissioned a real CAD drafter to design a house over three days at Synergy using our GRID GPU technology; this meant that you could go back time-after-time and watch how it had progressed and the various stages of design from 2D plans to 3D rendering and ray-tracing. It was a fascinating insight into the design process and I was delighted to find a blog by the designer himself on the GreenDrafters blog, here.

Designers are by their nature demanding and sensitive to the user-experience and I was delighted to see his comments:

  • “The short answer is that I was blown away.”
  • “The bottom line is that I did not notice any performance degradation between my personal workstation and the NVIDIA GRID K2 VDI”

I was also excited to see how the designer himself started imagining working differently, better, with virtualisation technologies:

  • “What I did notice was the simple thin client on my desk that took up far less space, needed far fewer cables, and caused far less clutter than my workstation.”
  • “I could use my Macbook Air or possibly even an Ipad to continue the design work away from my office, maybe at a Starbucks.”
  • “I also realized the intriguing potential of client/designer collaboration at the job site. Tweaking the design with the client in Revit, on-site. Now that is cool.”

 

 

Interested? Come and talk to NVIDIA at Autodesk University this week

Learn how designers can create from any connected device and location with NVIDIA GRID, which will be running Autodesk Revit 2016, 123D Design, AutoCAD, Inventor and other applications on VMware Horizon with vSphere. See our side-by-side comparison and discover how GPU-accelerated cloud services such as Autodesk’s Fusion 360 can offer 3D CAD, CAM and CAE tools on a single cloud-based platform with twice the performance of the CPU.

We’re also helping lead some informative AU classes:

 

 

Visit the BOXX, Dell, HP and Lenovo booths to see how our partners use NVIDIA graphics technologies to deliver the best performance and visual experience for Autodesk users. And be sure to follow us on Twitter on @NVIDIAGRID and @NVIDIA_MFG for the latest on our activities at AU or follow #AU2016.

 

There will also be plenty of other NVIDIA technologies and exciting demos at #AU2016 including Photorealistic rendering, Virtual Reality as well as Quadro and workstation products.

 

Can’t make it? There’s plenty of other ways to learn more:

NVIDIA GRID vGPU: 512MB profiles, Win 10, framebuffer – new support article

I’ve been a bit quiet on the blogging front over the last month or so – a long vacation coupled with a slight role change – but also working on some internal support documentation around 512MB profiles which is now published.

As ever the NVIDIA Enterprise Support KB articles are living documents and you should check the main document here: http://nvidia.custhelp.com/app/answers/detail/a_id/4238/ at the time of reading for up-to-date NVIDIA sactioned advice.

However I thought it would be nice to highlight the availability and reproduce the article as it stands today. The search facility on the KB system is reasonably good and it’s always worth checking for new articles and searching for answers: http://nvidia.custhelp.com/app/home/.

The latest article gathers together some support and technical advise around 512MB including some additional advice on the use of small profiles with Win 10, as well as links to VMware and Citrix best practice on workign with Win 10.

The KB article as published is below, this was the work of a number of people within NVIDIA but also incorporated feedback and work with our NGCA (NVIDIA GRID Community Advisors), especially Rasmus Raun-Nielsen whose customer review and perspective were particularly helpful.

NVIDIA GRID vGPU: Memory exhaustion can occur with vGPU profiles that have 512 Mbytes or less of framebuffer

Symptom/Error

Symptoms and errors that may occur:

  • When playing video content (full screen 1080p) in a browser the session hangs and session reconnect fails on 512MB profiles.
  • This issue typically occurs when multiple display heads are used with Citrix XenDesktop or VMware Horizon on a Windows 10 guest VM.
    • When this error occurs, the NVIDIA host driver reports Xid error 31 and Xid error 43 in XenServer’s /var/log/messages file.
    • Or on VMware When this error occurs, the NVIDIA host driver reports Xid error 31 and Xid error 43 in the VMware vSphere log file vmware.log in the guest VM’s storage directory.

Root Cause

There is a known issue associated with changes in the way recent Microsoft Operating systems handle and allow access to overprovisioning messages and errors. NVIDIA is working with Microsoft closely to resolve these issues ongoing. Users with correctly provisioned systems should not encounter issues. As such users need to take care to ensure there is sufficient frame buffer to support their uses.

512MB is a very small framebuffer and as such users should be aware that the multiple demands made in a virtualized environment can lead to memory exhaustion. Uses that place demand on the framebuffer:

  • Use of more recent Microsoft OSs that place more demand on the framebuffer e.g. using Windows 10 rather than Windows 7. Windows 10 demands far more resources
  • Use of multiple monitors
  • Use of higher resolution monitors
  • Use of the framebuffer for hardware protocol encode (NVENC) – to reduce the probability of users encountering issues NVENC has been disabled for 512MB in the GRID 4.0 (August 2016) release for protocols such as Blast Extreme (VMware) and Citrix HDX/ICA.
  • Frame buffer intensive applications

Documentation

This issue is documented in the driver release notes for the GRID 4.0 (August 2016) release. Customers are advised to always read the known and resolved issues lists contained within the driver release notes for each release for their hypervisor (links below), for Citrix XenServer the release notes state:

Memory exhaustion can occur with vGPU profiles that have 512 Mbytes or less of framebuffer. This issue typically occurs when multiple display heads are used with Citrix XenDesktop or VMware Horizon on a Windows 10 guest VM. When this error occurs, the NVIDIA host driver reports Xid error 31 and Xid error 43 in XenServer’s /var/log/messages file.

The following vGPU profiles have 512 Mbytes or less of frame buffer:

  • Tesla M6-0B, M6-0Q
  • Tesla M10-0B, M10-0Q
  • Tesla M60-0B, M60-0Q
  • GRID K100, K120Q
  • GRID K200, K220Q

GRID Release notes for Citrix XenServer: http://us.download.nvidia.com/Windows/Quadro_Certified/GRID/369.17/XenServer-6.5/367.43-369.17-nvidia-grid-vgpu-release-notes.pdf

GRID Release notes for VMware ESXi: http://us.download.nvidia.com/Windows/Quadro_Certified/GRID/369.17/ESXi-6.0/367.43-369.17-nvidia-grid-vgpu-release-notes.pdf

Verifying framebuffer usage

Users can follow the advice in this article on monitoring NVIDIA GRID framebuffer usage  (http://nvidia.custhelp.com/app/answers/detail/a_id/4108/~/monitoring-the-framebuffer-for-nvidia-grid-vgpu-and-gpu-passthrough) to assist correctly size their environment with respect to framebuffer usage to avoid memory exhaustion. The advice is also of use to assess whether issues users are encountering are actually caused by memory exhaustion.

Consequences

To reduce the consequence of users encountering issues NVIDIA have disabled NVENC on 512MB profiles with the GRID 4.0 (August 2016 release) to minimize the risk of users encountering the Memory exhaustion. Application GPU acceleration remains fully supported and available with all profiles including 512MB. NVENC support from both Citrix and VMware is a recent new feature and as such the majority of users on older versions should encounter no change in functionality.

Workarounds and Solutions

Windows 10

Microsoft Windows 10 has significantly increased the demands upon graphical resources such as GPU framebuffer above older OS releases, as well as on other non-graphical system resources. As such both Citrix and VMware have published tools and configuration advice as to how users can reduce resources. Customers using Windows 10 are encouraged to consider following advice from virtualization vendors.

  • VMware: VMware have provided an OS optimization tool for Horizon View which can make and apply optimization recommendations for Windows 10 and other OSs. Users of Citrix/other virtualisations stacks may find this tool useful for the recommendations made even if they cannot then use the automated configuration tools. The tool can be found here: https://labs.vmware.com/flings/vmware-os-optimization-tool
  • Citrix: Citrix consultant Daniel Feller has published a number of articles on Windows 10 best practice and configuration many of which will also be relevant to VMware / other virtualization stacks. See: https://virtualfeller.com/?s=windows+10+optimization

Some users will find that a 512MB is inappropriate for their Windows 10 workload and that a 1GB profile is more appropriate.

Support

NVIDIA customers with support who believe they are encountering issues as a result of frame buffer memory exhaustion should raise a support case with NVIDIA Enterprise Support via https://nvidia-esp.custhelp.com and can reference issue #200130864.

Applicable products

NVIDIA GRID vGPU

GRID GPUs including M60, M6, M10, K1, K2

VMware Horizon and ESXi

Citrix XenDesktop and XenServer

Users are most likely to encounter this issue if using:

  • heavy graphical or video workloads
  • recent more graphically intensive Microsoft OSs e.g. Windows 10 rather than Windows 7
  • small framebuffers e.g. 512MB
  • Remoting protocols leveraging NVIDIA NVENC hardware encode e.g. recent versions of Citrix HDX/ICA or VMware Blast Extreme

Disclaimers

This Web site contains links to Web sites and third-party tools controlled by parties other than NVIDIA. NVIDIA is not responsible for and does not endorse or accept any responsibility for the contents or use of these third party Web sites or tools. NVIDIA is providing these links to you only as a convenience, and the inclusion of any link does not imply endorsement by NVIDIA of the linked Web site. It is your responsibility to take precautions to ensure that whatever tools or information you select for your use is free of viruses or other items of a destructive nature.

NVIDIA GRID: More info on vApps and VPC/vWS Licensing

lukeblog
Check out Luke Wignall’s blog on NVIDIA GRID licensing and other GRID topics!

I wrote a blog on RDSH (including XenApp) licensing and the options available with NVIDIA GRID vGPU and GPU-passthrough a few weeks ago, which you can read – here (including support for multi-monitor and resolutions). Since then my colleague Luke has added some more information in a blog where he outlines various case studies including many on vApps, which is worth a read here:

Luke answers how many licenses and what type you will need for various use cases, answering questions such as:

  • Q: I am deploying Citrix XenDesktop for 5000 global users, using two data centers, to meet a follow the sun productivity goal.  The data centers are also backup sites to each other.  I expect at most 1200 users at each of our three regional areas to be on during their workday, connecting to their closest data center, but there is some overlap (people working late or starting early) so I am architecting with a buffer for a total of 1500 virtual desktops.  I need to be able to run all users from either data center of one should go down. My users are all engineers and their apps require Quadro.
  • Q:  I am deploying virtual desktops but using XenApp to do so, and am looking for improved end user experience, for 1000 users.  At any given time I expect no more than 850 users to be connected.  I have no other desktop delivery method.
  • Q:  I chose to run XenApp on a bare metal host, so no hypervisor (I would question the decision to forgo the flexibility and manageability of virtualization), delivering three Microsoft Office applications so .  I have 500 users but expect no more than 350 of them to be connected at any given time.  I have no Virtual desktops for these users.
  • Q:  I have 250 engineers using CATIA and similar apps, they must have Quadro drivers, but usually only 200 of them are working at any given time.  I also have 1000 knowledge workers that range from sales to support, their apps do not need Quadro but perform much better with GPU (=happy users), of those I typically see 800 actively on their desktops.  I am deploying VMware Horizon.  We have a set of web apps that all 1250 employees use for time keeping, expenses, and safety training, these I am delivering with XenApp.

 

There is a lot of information on GRID licensing in our knowledge base – just search on “GRID licensing” on our KB home page here:

Highlights include:

Licensing Documentation:

Of course one of the best references is the official licensing guides on the GRID resources page (under deployment guides) here: http://www.nvidia.com/object/grid-enterprise-resources.html. In particular these two are useful:

 

Questions

Any questions – ask below or on the support NVIDIA GRID forums at https://gridforums.nvidia.com

NVIDIA GRID – Citrix offers multi-monitor NVENC hardware encode for both Linux and Windows

A few weeks ago I wrote a KB article outlining the support on various virtualization stacks for hardware encode support.using NVIDIA NVENC functionality.  I’m now in the process of having the KB article updated as today Citrix announced with XenDesktop/XenApp 7.11 that they have added support for Windows alongside their existing Linux support.

A few vendors already offered hardware encode for both Linux and Windows such as NICE DVC and HP RGS, however prior to this release from Citrix:

  • Citrix only supported Linux
  • VMware only supported Windows (and then only single monitor)

This sees Citrix now offer multi-monitor support on both Linux and Windows. I expect VMware will now be under some pressure to match.

Many customers use both Windows and Linux so this parity makes life much easier for the system administrator.

VDI can use GPUs in many ways:

  • To accelerate the applications
  • For the protocol encoding and decoding e.g H.264 e.g. NVENC
  • Allowing protocols access to the frame buffer directly

So solutions without NVENC also do still benefit from GPU availability.

You can read about the HDX/XenDesktop 7.11 enhancements here:  https://docs.citrix.com/en-us/xenapp-and-xendesktop/7-11/hdx/gpu-acceleration-desktop.html

There are also some other big leaps in HDX such as the addition of hardware H.264 4:4:4 encode (removes the artefacts of standard H.264 4:2:0 but at a cost of a bit more bandwidth).

Nicely any Citrix Receiver that also supports H.264 decoding can be used with NVENC hardware encoding (H.264 4:2:0). Which should ease users trialing and adopting.

The protocol war rages on!