Just a quick blog aimed at those looking to develop GPU hypervisor monitoring products by integrating the NVIDIA GPU metrics exposed by XenServer via their APIs. Really it’s a bit of a guide as to where to find the information provided by Citrix.
Two NVIDIA GPU technologies are available on Citrix XenServer:
- GPU (PCIe) pass-through (including GPU-sharing for XenApp and VDI passthrough)
- vGPU (shared GPU technologies)
Owing to the nature of PCIe passthrough whereby the hypervisor is bypassed and the VM itself obtains complete control and sole access to the GPU, host and hypervisor level metrics are not available to the NVIDIA SDK and APIs on host nor to the hypervisor.
Developing a supported solution
Many Citrix customers insist on a monitoring solution being certified by the vendor via the Citrix Ready program. ISVs are advised to join the Citrix Ready program (access level is free) to obtain advise on developing a supported product and to eventually certify and market their product. In particular ISVs are recommended to evaluate the conditions of the vendor self-certification “kit” for supported products.
Whilst monitoring can be performed by inserting a kernel module or supplemental pack into XenServer’s dom0 this is an unsupported mechanism that Citrix generally will not support and customers are rarely willing to compromise their support agreements to use such products. ISVs are strongly advised to consider using the XenServer APIs and SDK to access metrics in a supported manner. See: https://www.citrix.com/partner-programs/citrix-ready/test.html (under XenServer-> Citrix XenServer (6.x) Integrated ISV Self-Certification Kit).
XenServer SDK / API
The XenServer API provides bindings for five languages: C, C#, Java, Python and Powershell.
XenServer maintains a landing page for ISV developers: http://xenserver.org/partners/developing-products-for-xenserver.html
Additionally there is developer (SDK) support forum where many XenServer staff answer questions: http://discussions.citrix.com/forum/1276-xenserver-sdk/
XenServer captures metrics in RRDs. Details of the RRDs, code examples and information on how the XenServer SDK can be used to access the metrics are given on this landing page: http://xenserver.org/partners/developing-products-for-xenserver/18-sdk-development/96-xs-dev-rrds.html
XenServer have integrated many of the metrics available from NVIDIAs NVML interface into their RRDs. This means customers can access the metrics via the XenServer APIs in a supported manner rather than inserting unsupported kernel modules to call NVML in the hypervisor’s host operating system (dom0). See https://www.citrix.com/blogs/2014/01/22/xenserverxendesktop-vgpu-new-metrics-available-to-monitor-nvidia-grid-gpus/
XenServer APIs – querying GPU infrastructure:
For information on which VMs have vGPUs, the type of vGPU profile etc. see http://nvidia.custhelp.com/app/answers/detail/a_id/4117/kw/citrix%20monitoring under “Checking your GPU configuration” for links to appropriate XenServer documentation.
- Citrix’s XenCenter code is open source and includes monitoring functionality. In addition to APIs for XenServer, the XenCenter code is also available and XenCenter is extensible via the XenCenter Plugin model.
- XenServer provides a tool that can convert RRDs to csv format rrd2csv, this is documented in the end-user monitoring documentation. Chapter 9 of the appropriate version guide e.g. Citrix XenServer 7.0 Administrators Guide” covers XenServer metrics available via XenServer APIs: https://docs.citrix.com/content/dam/docs/en-us/xenserver/xenserver-7-0/downloads/xenserver-7-0-administrators-guide.pdf
- Some historical advice on monitoring vGPU on XenServer is included in the NVIDIA Knowledge Base e.g. http://nvidia.custhelp.com/app/answers/detail/a_id/4117/kw/citrix%20monitoring