vRops - Super Metric enhancements

6 Jan 2017 Michael Ryom

Super Metric vRealize Operations Manager vRops

One of the areas in where vRops 6.3 has been getting a huge improvement is Super Metrics. You can read about vRops 6.3 enhancements here: https://www.starwindsoftware.com/blog/vrops-6-3-walkthrough-new-features#SuperMetricenhancements

It might seem some what trivial - the enhancements. The addition of a few new operators. I guarantee you it is not, these new operators are a game changer for Super Metrics. Lets recap all the new operators

Operators	Function	Example
[]	Array	[A, B, C]
==	Equal	1==1
!=	Not equal	1!=2
<	Less than	1<2
<=	Less than or equal	1<=2
>	Greater than	2>1
>=	Greater than or equal	2=>1
\|\|	Or
&&	And
? :	If then else	A ? B : C
!	Not	!(1>2)
Where	Where	1==1 where = “==1”

I have not found a use case for each and everyone of the operators, but as it can be seen, it now possible to compare two values, which is a huge improvement! The rest is normal operators which most should be familiar with. Note that the where clause is some what cumbersome to work with. After "where" you need an equal sign and then in quotes your statement.

Now lets look at a use case... In vRops you can get VM uptime. This is nothing new. The problem with uptime in vRops is that it is an ever growing number, until a reboot that is. So for any given time period you will end up with some number which might be very high or very low and as such it becomes very hard to make sense of them in terms for uptime statistics. Just look at the graph below.

Every time there is a drop in the chart, the VM's OS has been restarted for some reason. vRops does everything in 5 minutes cycles or 300 seconds if you will. It means that for every data point 300 seconds is added to the previous value. If the VMs OS has been up for the entire period, that is. If not the number will be lower, as the metric is reset on reboot of OS. As can be seen in the graph the lowest number is 86 and the highest is 1.631.929.

Super Metric

So how can we turn these data points into something useful in the context of uptime statistics which we can use for management or customer reports? Create a Super Metric of course.

${this, metric=sys|uptime_latest}<=300)?((((${this, metric=sys|uptime_latest}<300))*(${this, metric=sys|uptime_latest}))):(${this, metric=sys|uptime_latest}>=300)*300

This simple example will for each data point give you the uptime in seconds, meaning that if every data point is 300, the VMs uptime is 100%. So how does it work ? Let me break it down for you.

IF

Lets start with the first part before the question mark. This is the "if" statement part of the Super Metric. If this is true, it will go the "then" statement afterwards or else it will jump to the "else" statement. As we are doing math, the true or false statement will come in the form of a boolean value. A zero or a one. Zero means go to else statement and one mean to then statement. Very basic stuff.

${this, metric=sys|uptime_latest}<=300)

It will be true only if the metric "sys|uptime_latest" is equal to or more than 300. The reason for this is that we know that if the value is 300 or more then the system has not been down, i.e. rebooted. Then it is safe to jump to the conclusion, that the update of the OS in the time period is 300 seconds or 100%

Then

The "then" statement is anything between the question mark(?) and the colon(:). Again, this part of the Super Metric is only used if the first part (if statement), is zero as a boolean value. This will mean that OS has had downtime in the period. How much is what this part of the Super Metric statement is going to tell us.

((((${this, metric=sys|uptime_latest}<300))*(${this, metric=sys|uptime_latest})))

The statement could have been done shorter. I chose to have a way of validating the value of the statement. The first part, before the asterisk, looks at the time and validate it is less than 300. If true it returns a value of one which is multiplied by the uptime of the time period. This is what would be return to vRops is uptime for that time period, if the "if" statement was true, if not read ahead to what would happen else. Pun indented :)

Else

Last part of the Super Metric. This is the all good scenario and as such, the last part could just had been 300 as a value. Cause this is what it will always be. The only reason for having the validation in there is so you can see what is going on. Also I used it a lot when I tried to figure out how this could be used.

(${this, metric=sys|uptime_latest}>=300)*300

I am not going to touch on this again it is quite simple.

Simplified

How much can the Super Metric be shorted to you ask? This is how little! Once you know how it works it actually simpler to read it this way.

(${this, metric=sys|uptime_latest}<=300)?(${this, metric=sys|uptime_latest}):300

Everybody likes percentages

If you don't care for uptime in seconds, you can always use the below and get it as an percentage of the OS uptime over a given time period.

(${this, metric=sys|uptime_latest}<=300)?((((${this, metric=sys|uptime_latest}<300))*(${this, metric=sys|uptime_latest})))/300*100:(${this, metric=sys|uptime_latest}>=300)*300/300*100

All I have done is divide the formula with 300 which is the expected out come and then multiple it by 100, to get it into percentage. Of course this can be shorted as well, but I'll let you do it your self.

As every data point is now in percentage, all you have to do to get the uptime for a given period is to get the average of the data point in the time period and you know the uptime for that VM. As easy as pie. Let me illustrate that.

So the this given VM, I am showing you three Super Metrics. The two first are OS uptime and the last is VM uptime. As you can see only the OS was down and not the VM. This is one of the graph which can be used to stop pointing fingers at you if you are a VMware admin only. This clearly show the VM has been up and only OS was affected.

The only problem with graphs is that you need to read them and understand what it is showing to you. Not a good thing to send to customers or to the management layer. Here reports and view are gods. If you look at the view below. I have taken the same three Super Metrics as above and changed the transformation from last to average. As the values have been standardized it is now easy to read the uptime of the VM for that given time period. Again it can be seen that the OS have had down time, but this VM it self have had non. That is just the way we like it.

I will leave you here, hopefully a little wisher to what Super Metrics has to offer of new capabilities. Now go forth and create new Super Metrics at will.

Oh, just before you leave. I want to show you how you can make the Super Metric statement even simpler, with an alias. bye now.

(${this, metric=sys|uptime_latest} as uptime<=300)?uptime:300