I’ve been a bit quiet on the blogging front over the last month or so – a long vacation coupled with a slight role change – but also working on some internal support documentation around 512MB profiles which is now published.
As ever the NVIDIA Enterprise Support KB articles are living documents and you should check the main document here: http://nvidia.custhelp.com/app/answers/detail/a_id/4238/ at the time of reading for up-to-date NVIDIA sactioned advice.
However I thought it would be nice to highlight the availability and reproduce the article as it stands today. The search facility on the KB system is reasonably good and it’s always worth checking for new articles and searching for answers: http://nvidia.custhelp.com/app/home/.
The latest article gathers together some support and technical advise around 512MB including some additional advice on the use of small profiles with Win 10, as well as links to VMware and Citrix best practice on workign with Win 10.
The KB article as published is below, this was the work of a number of people within NVIDIA but also incorporated feedback and work with our NGCA (NVIDIA GRID Community Advisors), especially Rasmus Raun-Nielsen whose customer review and perspective were particularly helpful.
NVIDIA GRID vGPU: Memory exhaustion can occur with vGPU profiles that have 512 Mbytes or less of framebuffer
Symptom/Error
Symptoms and errors that may occur:
- When playing video content (full screen 1080p) in a browser the session hangs and session reconnect fails on 512MB profiles.
- This issue typically occurs when multiple display heads are used with Citrix XenDesktop or VMware Horizon on a Windows 10 guest VM.
- When this error occurs, the NVIDIA host driver reports Xid error 31 and Xid error 43 in XenServer’s /var/log/messages file.
- Or on VMware When this error occurs, the NVIDIA host driver reports Xid error 31 and Xid error 43 in the VMware vSphere log file vmware.log in the guest VM’s storage directory.
Root Cause
There is a known issue associated with changes in the way recent Microsoft Operating systems handle and allow access to overprovisioning messages and errors. NVIDIA is working with Microsoft closely to resolve these issues ongoing. Users with correctly provisioned systems should not encounter issues. As such users need to take care to ensure there is sufficient frame buffer to support their uses.
512MB is a very small framebuffer and as such users should be aware that the multiple demands made in a virtualized environment can lead to memory exhaustion. Uses that place demand on the framebuffer:
- Use of more recent Microsoft OSs that place more demand on the framebuffer e.g. using Windows 10 rather than Windows 7. Windows 10 demands far more resources
- Use of multiple monitors
- Use of higher resolution monitors
- Use of the framebuffer for hardware protocol encode (NVENC) – to reduce the probability of users encountering issues NVENC has been disabled for 512MB in the GRID 4.0 (August 2016) release for protocols such as Blast Extreme (VMware) and Citrix HDX/ICA.
- Frame buffer intensive applications
Documentation
This issue is documented in the driver release notes for the GRID 4.0 (August 2016) release. Customers are advised to always read the known and resolved issues lists contained within the driver release notes for each release for their hypervisor (links below), for Citrix XenServer the release notes state:
Memory exhaustion can occur with vGPU profiles that have 512 Mbytes or less of framebuffer. This issue typically occurs when multiple display heads are used with Citrix XenDesktop or VMware Horizon on a Windows 10 guest VM. When this error occurs, the NVIDIA host driver reports Xid error 31 and Xid error 43 in XenServer’s /var/log/messages file.
The following vGPU profiles have 512 Mbytes or less of frame buffer:
- Tesla M6-0B, M6-0Q
- Tesla M10-0B, M10-0Q
- Tesla M60-0B, M60-0Q
- GRID K100, K120Q
- GRID K200, K220Q
GRID Release notes for Citrix XenServer: http://us.download.nvidia.com/Windows/Quadro_Certified/GRID/369.17/XenServer-6.5/367.43-369.17-nvidia-grid-vgpu-release-notes.pdf
GRID Release notes for VMware ESXi: http://us.download.nvidia.com/Windows/Quadro_Certified/GRID/369.17/ESXi-6.0/367.43-369.17-nvidia-grid-vgpu-release-notes.pdf
Verifying framebuffer usage
Users can follow the advice in this article on monitoring NVIDIA GRID framebuffer usage (http://nvidia.custhelp.com/app/answers/detail/a_id/4108/~/monitoring-the-framebuffer-for-nvidia-grid-vgpu-and-gpu-passthrough) to assist correctly size their environment with respect to framebuffer usage to avoid memory exhaustion. The advice is also of use to assess whether issues users are encountering are actually caused by memory exhaustion.
Consequences
To reduce the consequence of users encountering issues NVIDIA have disabled NVENC on 512MB profiles with the GRID 4.0 (August 2016 release) to minimize the risk of users encountering the Memory exhaustion. Application GPU acceleration remains fully supported and available with all profiles including 512MB. NVENC support from both Citrix and VMware is a recent new feature and as such the majority of users on older versions should encounter no change in functionality.
Workarounds and Solutions
- Users can avoid memory exhaustion issues by ensuring the framebuffer supplied to a VM via the vGPU is adequate for their workloads (i.e. using an appropriately sized vGPU.
- Many Windows 10 users will find a 1GB profile vGPU is more suitable for their needs and may find a 512MB profile is insufficient.
- Users are advised to investigate their frame buffer usage by:
- Monitoring their framebuffer e.g. by following the advice in this article: http://nvidia.custhelp.com/app/answers/detail/a_id/4108/~/monitoring-the-framebuffer-for-nvidia-grid-vgpu-and-gpu-passthrough
- Using a monitoring tool capable of monitoring framebuffer usage, there a number of commercial and free tools available. One free third-party tool is “GPU Profiler” available here: https://github.com/JeremyMain/GPUProfiler
- Use the nvidia-smi functionality provided along with GRID vGPU software to investigate frame buffer usage; some information on nvidia-smi is given in our knowledge base:
Windows 10
Microsoft Windows 10 has significantly increased the demands upon graphical resources such as GPU framebuffer above older OS releases, as well as on other non-graphical system resources. As such both Citrix and VMware have published tools and configuration advice as to how users can reduce resources. Customers using Windows 10 are encouraged to consider following advice from virtualization vendors.
- VMware: VMware have provided an OS optimization tool for Horizon View which can make and apply optimization recommendations for Windows 10 and other OSs. Users of Citrix/other virtualisations stacks may find this tool useful for the recommendations made even if they cannot then use the automated configuration tools. The tool can be found here: https://labs.vmware.com/flings/vmware-os-optimization-tool
- Citrix: Citrix consultant Daniel Feller has published a number of articles on Windows 10 best practice and configuration many of which will also be relevant to VMware / other virtualization stacks. See: https://virtualfeller.com/?s=windows+10+optimization
Some users will find that a 512MB is inappropriate for their Windows 10 workload and that a 1GB profile is more appropriate.
Support
NVIDIA customers with support who believe they are encountering issues as a result of frame buffer memory exhaustion should raise a support case with NVIDIA Enterprise Support via https://nvidia-esp.custhelp.com and can reference issue #200130864.
Applicable products
NVIDIA GRID vGPU
GRID GPUs including M60, M6, M10, K1, K2
VMware Horizon and ESXi
Citrix XenDesktop and XenServer
Users are most likely to encounter this issue if using:
- heavy graphical or video workloads
- recent more graphically intensive Microsoft OSs e.g. Windows 10 rather than Windows 7
- small framebuffers e.g. 512MB
- Remoting protocols leveraging NVIDIA NVENC hardware encode e.g. recent versions of Citrix HDX/ICA or VMware Blast Extreme
Disclaimers
This Web site contains links to Web sites and third-party tools controlled by parties other than NVIDIA. NVIDIA is not responsible for and does not endorse or accept any responsibility for the contents or use of these third party Web sites or tools. NVIDIA is providing these links to you only as a convenience, and the inclusion of any link does not imply endorsement by NVIDIA of the linked Web site. It is your responsibility to take precautions to ensure that whatever tools or information you select for your use is free of viruses or other items of a destructive nature.
I’ve found that when using a K120Q profile with windows 10 (1607 ltsb), makes using 2 or more monitors impossible when using windows 10. Documentation says 2 display heads should be available, but it simply doesn’t work. Any thoughts?
LikeLiked by 1 person
Hi Hans,
Hope the guys I put you in touch with at NVIDIA sorted you out 🙂 A great team!
Best wishes,
Rachel
LikeLiked by 1 person
Definitely! Thanks so much!
LikeLiked by 1 person