I had some fun at NVIDIA GTC 2016 taking part in a hands-on lab run by the SA (Solution Architecture) organisation of which I am a part. These labs are proving really useful for walking new-users to GRID through key operations on both VMware and Citrix stacks. The guys running it mooted adding more on monitoring once you have got set-up and I kind of volunteered to have a crack at a bonus chapter for the hands-on around monitoring on Citrix.
Having worked at Citrix this is was an easy one, I may well set myself the bigger challenge of getting more familiar with VMware metrics in the future…. if this proves useful and depending on the feedback from you the reader!
I often get asked how many vCPUs/how much RAM etc… a VM should be provisioned with for the most random applications and even when I am familiar with the applications, many are used so differently by different users that it’s hard to say. For example AutoCAD or SolidWorks have a vast range of functions from 2D to 3D to rendering.
However a user can tell for themselves if they may have a problem in the provisioning by developing a little knowledge of the XenServer/XenCenter metrics especially those not on by default! I’ve included some information below that I’m hoping will guide the reader to working out if the number of vCPUs allocated is causing a problem or not…. let’s see how it goes…. and do look out for those hands-on labs at GTC and other NVIDIA events.
Customers aren’t always aware of all the metrics available on XenServer / within XenCenter. Particularly to help them assess if they have provisioned resources like vCPU and RAM for the VM optimally for the applications they are using.
This article is to help new users become more familiar with metrics available, how to view them in XenCenter. I’m hoping it can be incorporated into a hands-on lab or user guide, so please add suggestions for improvements.
Citrix XenServer has a good deal of metrics which can be accessed from a command prompt in the hypervisor or from within the XenCenter management console. Many metrics are off by default to avoid unnecessary system load where they not normally be needed. There is a very detailed guide on which metrics are available, how to configure thresholds for alerts and how to trigger email alerts within Chapter 9 of The XenServer Administrator Guide. Always consult the version of the guide pertaining to the version of XenServer you are using e.g. for XS6.5 – Citrix XenServer® 6.5 Administrator’s Guide.
The metrics for monitoring GPU usage though are not documented in the administrator guide as this is a set of metrics currently associated with the NVIDIA vGPU feature rather than available for any GPU vendor. This guide does contain all the information on how to add graphs for metrics such as those for GPUs and how to set up alerts etc. I’ve blogged about the availability of these metrics: https://www.citrix.com/blogs/2014/01/22/xenserverxendesktop-vgpu-new-metrics-available-to-monitor-nvidia-grid-gpus/
For NVIDIA vGPU the main metrics of interest are
|Class||Name||Units||Description||Enabled by default?||Condition for existence|
|Host||gpu_memory_free_<pci-bus-id>||Bytes||Unallocated framebuffer memory||No||A supported GPU is installed on the host|
|Host||gpu_memory_used_<pci-bus-id>||bytes||Allocated framebuffer memory||No||A supported GPU is installed on the host|
|Host||gpu_power_usage_<pci-bus-id>||mW||Power usage of this GPU||No||A supported GPU is installed on the host|
|Host||gpu_temperature_<pci-bus-id>||°C||Temperature of this GPU||No||A supported GPU is installed on the host|
|Host||gpu_utilisation_compute_<pci-bus-id>||(fraction)||Proportion of time over the past sample period during which one or more kernels was executing on this GPU||No||A supported GPU is installed on the host|
|Host||gpu_utilisation_memory_io_<pci-bus-id>||(fraction)||Proportion of time over the past sample period during which global (device) memory was being read or written on this GPU||No||A supported GPU is installed on the host|
Note: GPU metrics are available in XenCenter for GPU-passthrough but because of the nature of PCIe pass-through the hypervisor has no access to the actual data (pass-through means only the VM can see/access the GPU) and so these graphs and metrics will be zero (i.e. equal to 0).
If you are trouble-shooting a performance issue it is important that you identify which resource is the bottleneck. Often it may not be the GPU. Metrics that are particularly worth checking include:
- Those pertaining to CPU usage on the Host
|Class||Name||Description||Condition for existence||XenCenter Name|
|Time CPU<cpu> spent in C-state <cstate> in miliseconds.||C-state
exists on CPU
|Host||cpu<cpu>-P<pstate>||Time CPU <cpu> spent in P-state <pstate> in miliseconds.||P-state
|Utilisation of physical CPU <cpu> (fraction). Enabled
|Host||cpu_avg||Mean utilisation of physical CPUs (fraction). Enabled
C-State and P-state information is particularly insightful in the context of bursty (CAD applications often are) applications where peak vs. average usage can vary. Many servers are shipped in power saving mode rather than for maximum performance. This needs to be changed in the BIOS to allow the hypervisor and hence app to use the full range of P/C-States. I wrote a guide to C/P-states a long time ago: http://xenserver.org/partners/developing-products-for-xenserver/19-dev-help/138-xs-dev-perf-turbo.html I’m not sure whether the information is correct with respect to the XenServer commands to optimally configure a system but the monitor instructions should be correct.
Many CAD/3D applications can be highly single-threaded and benefit from using turbo mode. Catia is one such application that has often been like this. P-state (P0) the highest mode is traditionally used to indicate if turbo is in use but you must be very careful if using XenCenter to note the convention that if turbo is in use, P0 will be turbo mode and P1 the highest non-turbo mode. There is a convention of labelling turbo-mode with a frequency +1MHz above normal maximum frequency means that XenCenter does not reflect the true frequency of the turbo mode and as such users may interpret it that turbo mode is not occurring. E.g. on a 3400MHz Intel system, P0 will be logged as 3401MHz, where the maximal non-turbo mode is P1 with 3400MHz.
- Those pertaining to CPU usage on the VM
|Class||Name||Description||Condition for existence||XenCenter Name|
Enabled by default
|Utilisation of vCPU <cpu> (fraction).||vCPU<cpu> exists||CPU <cpu>
|VM||memory||Memory currently allocated to VM (Bytes).Enabled by default||None||Total Memory
|VM||memory_target||Target of VM balloon driver (Bytes). Enabled by default||None
|Memory used as reported by the guest agent (KiB).
Enabled by default
|Fraction of time that all VCPUs are running.
|None||VCPUs full run
|Fraction of time that all VCPUs are runnable (i.e., waiting for CPU)
|VCPUs full contention
|Fraction of time that some VCPUs are running and some are runnable||None||VCPUs concurrency hazard
|Fraction of time that all VCPUs are blocked or offline
|VM||runstate_partial_run||Fraction of time that some VCPUs are running, and some are blocked
|VCPUs partial run
|VM||runstate_partial_contention||Fraction of time that some VCPUs are runnable and some are blocked
|None||VCPUs partial contention
- VM runstate_ metrics, these allow you to assess vCPU contention. This is especially worth monitoring if you are overprovisioning. The background to understand this and details of how to do this can be found within this blog: https://www.citrix.com/blogs/2014/03/11/citrix-xenserver-setting-more-than-one-vcpu-per-vm-to-improve-application-performance-and-server-consolidation-e-g-for-cad3-d-graphical-applications/
- If you are interested in measuring the vCPU overprovisioning from the point of view of the host, you can use the host’s cpu_avg metric to see if it’s too close to 1.0 (rather than 0.8, i.e. 80%): If you are interested in measuring the vcpu overprovisioning from the point of view of a specific VM, you can use the VM’s runstate_* metrics, especially the ones measuring runnable, which should be less than 0.01 or so. These metrics can be investigated via the command line or XenCenter.
XenServer metrics are stored by a mechanism of RRD (Round Robin Database) which means that data stored is limited by degrading historical data in granularity. E.g. the last 10 minutes of data can be accessed at a sample interval of 5s as collected, older data is sample-binned and so becomes increasingly averaged. This means the graphs in XenCenter will become smoother and data on short-lived events is lost. Each archive in the database samples its particular metric on a specified granularity:
- Every 5 seconds for the duration of 10 minutes
- Every minute for the past two hours
- Every hour for the past week
- Every day for the past year
XenCenter contains a very generic interface to metric data, which means that any available metric can be graphed and plotted. Knowing the GPU metrics the guide will show you how to add those GPU metrics into XenCenter graphs.
Exercise: Adding P-state graphs to XenCenter
Find the section “Configuring Performance Graphs” with in the XenServer Administrators Guide and follow the steps:
To Add A New Graph
- On the Performance tab, click Actions and then New Graph. The New Graph dialog box will be displayed.
- In the Name field, enter a name for the graph.
- From the list of Datasources, select the check boxes for the datasources you want to include in the graph, i.e. those with the format CPU<cpu>P-state<pstate>:
- Add all available P-states for the first CPU
- What C-states are available?
- Click Save.
- Now view the graph:
- Is turbo-boost in use, can you tell? (hover over the graph)
Exercise: Check whether vCPU contention is occurring using XenCenter
- Hint: you may need to add a graph for certain runstate_ metric
- Hint: you may also need to check a CPU metric, which one?
Checking your GPU configuration
The XenServer CLI (Command Line Interface) offers many commands to probe your XenServer environment. Again these are documented in the Administrators guide but in an Appendix sub-section titled “GPU Commands”. The CLI has good, if esoteric, tab completion.
Exercise: Check what vGPU types are used on each pGPU (physical GPU) in the system)
- xe pgpu-list
to get a list of the pGPUs use the output from this as input to the xe command:
to find out what vGPUs have been configured on each pGPU.
- If you need to measure GPU framebuffer you should read this: https://virtuallyvisual.wordpress.com/2015/09/09/monitoring-nvidia-gpu-usage-of-the-framebuffer-for-vgpu-and-gpu-passthrough/
- You cannot measure GPU usage from the hypervisor either command line or when using XenCenter when using GPU-passthrough. In XenCenter you will see metrics associated with GPU as zero (=0). You can read more here: https://virtuallyvisual.wordpress.com/2015/07/27/limitations-in-monitoring-shared-nvidia-gpu-technologies/
- Measuring vGPU usage in guest is misleading and should not be done – read this: https://virtuallyvisual.wordpress.com/2015/07/27/limitations-in-monitoring-shared-nvidia-gpu-technologies/