Troubleshooting with vRops part 3: Object Level Metric

9 Nov 2015 Michael Ryom

metric object-level performance troubleshooting views vRealize Operations Manager vRops

In this part I am going to show you how to use object level metrics, to fast and easily assess if for example a VM is having problems. This is something that would give you a quick indication of this object is having problems that need to be address. This should be looked at before diving into the raw metrics, as this will help you and can guide you in the right direction.

Blog series

vRops Object-level metrics

Object-level metrics is an excellent way of getting a fast overview of how a particular object is behaving. These metrics can be grouped into two groups, Boolean or value based. The different is quite simple, Boolean based metric, can be 0 or 1, or false or true if you will - These Boolean based object-level metrics, have the clear advantage of only having two values which makes it easy to read the metric and leave very little room for misinterpretation. These metrics are being calculated by vRops, so as long as vRops is seen as a trusted platform, these metric shouldn't be interpreted.

On the other hand value based metrics are up for more scrutiny - Not that they are not correct but they might not end up being true. This is metric like Capacity/Time Remaining, where the trend might not continue or might come to a halt cause of how the application is designed.

Using Object-level metrics

I'm going slow in this first part of the series. If your are familiar with how views are created just skim the pictures quickly and notice which metrics and filter that are being used. I will explain the choice after the walk through of how the view was created.

As can be seen below I have chosen a datacenter, but it could have been any object, which is of value to you, in order to see how well the VMs are running below the object selected. I have then clicked on "Details" tab, and on the "Plus sign", to create a new view.

First up, is giving the view a name and a description, next click on "2. Presentation"

Choose "List" and click on "3. Subjects"

In the search field type "virtual machine" and click on it in the select box as shown below and then click on "4. Data"

Now you need to select the data which is going to be part of the view. In the search field type "is", this narrows the metrics to look at and then click on "Summary" and choose the three metrics, "is idle", "is stressed" and "is oversized".

Don't like the given names, so first thing is to rename the metrics to something more useful. I'm going with "VM idle", VM stressed" and "VM Oversized". Just click on each metric and change the "Metric label" as needed.

Now you are going to create a filter. A filter as the name suggests, is a way of filtering the data. Here we are only interested in getting VMs which are powered on, as VMs that are powered off usually doesn't have a performance impact :)

Next click on "Filter". Select "System" and "Powered ON"

Now select "is" and "1", which means the first value is true. In this example meaning VMs that are powered on. Finish by click "Save".

Now behold the view we just created. "Name" is the name of the VM and for each VM, the three values that we select are presented with there Boolean value. These metrics are calculated values, meaning that vRops have on bases of a lot of other values created this Boolean valued metric which can easily be read and hopefully also understood.

Why did I choose these three values ? The VM idle metric speaks to how little utilization a given VM has, this could be VMs which isn't doing anything or the application haven't be deployed on yet and there for isn't doing anything. If some one wants you troubleshoot a VM which is idle, they should probably have a look at the application, as the VM isn't doing much of anything.

The "is idle" metric is calculated from the below metrics

Average CPU < 100 MHz
Average Disk I/O < 20 KBps
Average Network I/O < 1 KBps

The is the definition of an idle CPU metric:

An object is considered to be idle when the object operates below the idle level for the defined percentage of time. For example, when the CPU idle level is set to 100 MHz for a virtual machine, and the flag for the idle level is set to 90%, the virtual machine is considered to be idle when the speed of its CPU drops below 100 MHz for 90% of the time.

So it important to note that time also is a relevant factor in looking a these metrics - The default time periode for a view is 7 days, but can be change either by editing the view or when viewed, choose a different time range from the toolbar. On a side note this is also where you have the option to export the view to csv. If you should choose to do some magic in excel or what ever spreadsheet you are using.

The next metric is VM Stressed, which could be two things, either the VM is undersized and there for running a 100% for long periods of time or the infrastructure is undersized, meaning that the host might not have enough CPU for example or there are to many vCPUs to schedule for the host to provide a good service to all it's VMs. This is a noisy neighbor situation. I'm going to look at different ways that we can easily see where the problem is in a coming blog post.

The last metric for today is "VM Oversized", it sort of speaks for it self. This is your right size situation. This could also be the reason that your VM is stressed. It might simply have to many cores. Each which needs it's fair share of resources, which the host needs to schedule for the VM, with a little overhead for each vCPU, this could add up to a lot if the VM's isn't rightsized. More on this in a coming blog post.

Wrap up

VMware has done a very good job with documentation on this release which can be found here

vRops Documentation

The document to look at is "vRealize Operations Manager Customization and Administration Guide" which can be found in the button of the page of vRops Documentation page. This includes what I must assume is a complete list of metrics in chapter five, "Metric Definitions in vRealize Operations Manager". This is a very welcome addition to the vRops documentation - Would be favorable if at some point this also would include how these metrics are calculated/derived from and at what interval.

Next: Part 4: Standard Deviation