ESXiLog InsightScriptvRops

ESXi IOStats – vmkmgmt_keyval

Just wanted to do a short post about vmkmgmt_keyval, as I think most admins only uses this tool for getting HBA driver/firmware versions, but you get so much more than just that. There is different forms of statistics in there, per target, per lun and block size.

So first what is vmkmgmt_keyval ? vmkmgmt_keyval is an ESXi command which can be executed from ESXi’s direct console user interface (DCUI) or via SSH, in both examples you get access to ESXi Shell.

The way I’m going to use vmkmgmt_keyval, is with an “-a”, after, just like in the example here: “/usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -a”. What this does it list all key value instances. If you want help “-h” can be used instead of “-a”.

So besides getting driver/firmware of HBA’s, there’s some data around the HBA, like queue depth, link speed, etc. But this is not what this post is about. I think the IO stats are much more important.

What is it that you can get from this? – First lets look at the data available on a per target basis.

What you see above is an example of a target and its stats. The names of the stats, can be a bit cryptic, but with a little storage know how this shouldn’t be too hard. Suddenly we can easily see if/what target is having problems.

Going forward to per lun stats – We here see queue depth, fcp errors, abourts and lun resets, which can be used to help troubleshoot.

The data we just looked at has one problem – it aggregated and we don’t have an idea of when the problem occurred. This is why management tools like vRealize Operations Manager (vRops) and vRealize Log Insight exists. Both are tools which can help analyze, these kind of problems.

This brings me to the last part of the vmkmgmt_keyval output, which is block size and latency. Again here comes an example.

The interesting part here is the historygram, it shows block size, count and avg. latency – So now for each block size interval you can see the latency and see what this datastore it serving and the impact it has on latency. Again this is a good indication of how the storage/datastore is serving the VMs on top of it, but it cant be used for profiling the VMs or knowing which VM might be impacted by “poor” storage performance.

If there’s a need to understand storage better, there are better tools than vmkmgmt_keyval. Such a tool is Pernix Data, Architect – I’m not going to do a write up about Architect as others have already done that very well. Take a look at Pete Koehler’s blog post called Viewing the impact of block sizes with PernixData Architect.

 

This was all for now, I hope that I made my self pretty clear – vmkmgmt_keyval is tool to get better inside to your storage, but there are even better tools which does this with more precision and more granular.

Leave a Reply

Your email address will not be published. Required fields are marked *