Limitations in monitoring shared NVIDIA GPU technologies

gpumonitoringA physical GPU can be shared in a number of ways such as NVIDIA vGPU with Citrix XenDesktop or XenApp or GPU-sharing with XenApp (this is actually a form of pass-through with the sharing done at the RDS layer), or used 1:1 with VMs in VDI with GPU pass-through.

When using a pass-through mechanism the GPU is given entirely to the VM and a hypervisor such as XenServer or vSphere will be unable to access or query the physical GPU (pGPU).

When using vGPU however it is possible to monitor the pGPU (that’s the whole physical GPU) via the nvidia-smi utility within the hypervisor or using the XenServer and XenCenter metrics that also utilise it. I’ve blogged about how third party monitoring products or end users can do this before, here.

The nvdia-smi utility is also installed in-guest i.e. within the windows guest. This provides access to the metrics for the pGPU i.e. the whole GPU though. So when you are using vGPU with multiple VMs sharing the GPU the results pertain to the whole GPU and NOT the vGPU. You will however find that the results will vary between VMs because of how the physical GPU is polled and accessed and should not be misled into thinking this pertains to the portion of the GPU allocated to that VM only. Jason Southern from NVIDIA has produced a very good video explaining this, here. Jason also explains why you should expect to see perfmon reporting 0% utilisation.In another video, here, Jason expands on why you should not measure vGPU load within a VM.

If you are looking to monitor the pGPU or framerate within the VM there are a number of great tools available including:

  • nvidia-smi
  • GPU-Z
  • Fraps – a product that can measure framerates in guest
  • If working with Citrix technologies involving HDX, such as XenDesktop you can use HDX monitor 3.3 (Advanced thin wire graphics) to monitor many performance metrics including framerates. See http://support.citrix.com/article/CTX135817
  • A large number of others alongside other benchmarking tools and applications are listed on Ronald Grass’s blog, here. Widely considered the best resource list for these specific technologies.
  • I’d also recommend Jason Samuel’s list of tools, which covers a number relevant to 3D graphics and GPUs as well as practically every other property of a virtualised environment, you can find this goldmine, here.

At the time of writing it is not possible to monitor the currently available NVIDIA GRID K1 and K2 vGPU profiles from in-guest or the hypervisor. GPU pass-through can only be monitored from within a guest. It is possible in the future with new technologies this may change and the reader should check whether this information has changed.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

Up ↑

%d bloggers like this: