ESXiLog InsightScriptvRops

ESXi IOStats – vmkmgmt_keyval

Just wanted to do a short post about vmkmgmt_keyval, as I think most admins only uses this tool for getting HBA driver/firmware versions, but you get so much more than just that. There is different forms of statistics in there, per target, per lun and block size.

So first what is vmkmgmt_keyval ? vmkmgmt_keyval is an ESXi command which can be executed from ESXi’s direct console user interface (DCUI) or via SSH, in both examples you get access to ESXi Shell.

The way I’m going to use vmkmgmt_keyval, is with an “-a”, after, just like in the example here: “/usr/lib/vmware/vmkmgmt_keyval/vmkmgmt_keyval -a”. What this does it list all key value instances. If you want help “-h” can be used instead of “-a”.

So besides getting driver/firmware of HBA’s, there’s some data around the HBA, like queue depth, link speed, etc. But this is not what this post is about. I think the IO stats are much more important.

What is it that you can get from this? – First lets look at the data available on a per target basis.

Tgt00  WWNN 00:00:00:00:00:00:00:00  WWPN 00:00:00:00:00:00:00:00  Target path is ok
	IOStat:	  max  0090    pend    0000    txcnt 8317318393
	IOErr:	  busy 0000    retry   0000    seq_tmo    0000
	TMGMT:	  tgt_rst 0000    lun_rst   0000
	ABORT:	  issue 000000    IOcnt   000000
	Events:	  npr  0001    devloss 0000    no_connect 0000
	LCLRJT:	  nrsc 0000    inv_rpi 0000    lcl_rjt    0000
	FRAME:	  drop 0000    underrun 000    overrun    0000    scsidone   0000

What you see above is an example of a target and its stats. The names of the stats, can be a bit cryptic, but with a little storage know how this shouldn’t be too hard. Suddenly we can easily see if/what target is having problems.

Going forward to per lun stats – We here see queue depth, fcp errors, abourts and lun resets, which can be used to help troubleshoot.

LUN[0:0]  WWNN 00:00:00:00:00:00:00:00  WWPN 00:00:00:00:00:00:00:00  path is ok
   qdepth 30    fcperr 0005    abts issue 000000 cnt 000000    lun_rst 0000    tx_cnt 5892684093

The data we just looked at has one problem – it aggregated and we don’t have an idea of when the problem occurred. This is why management tools like vRealize Operations Manager (vRops) and vRealize Log Insight exists. Both are tools which can help analyze, these kind of problems.

This brings me to the last part of the vmkmgmt_keyval output, which is block size and latency. Again here comes an example.

lpfc IOStats Page:
									Snapshot				Total
									--------				-----
									IOPrd	IOPwr	 MBrd	 MBwr		IOPrd	IOPwr	 MBrd	 MBwr
Tgt00  WWNN 00:00:00:00:00:00:00:00  WWPN 00:00:00:00:00:00:00:00	3426.5	 53.1	3353.3	 29.3		571.2	 58.2	540.0	  8.3
	size[   512 -    512 ]	cnt 20272480	avg 1ms
	size[  1024 -   1536 ]	cnt 11490892	avg 0ms
	size[  2048 -   3584 ]	cnt 10001518	avg 0ms
	size[  4096 -   7680 ]	cnt 843186616	avg 0ms
	size[  8192 -  15872 ]	cnt 5342245988	avg 0ms
	size[ 16384 -  32256 ]	cnt 1130441743	avg 0ms
	size[ 32768 -  65024 ]	cnt 278804303	avg 1ms
	size[ 65536 - 130560 ]	cnt 200924556	avg 1ms
	size[131072 - 261632 ]	cnt 295706725	avg 1ms
	size[262144 - 523776 ]	cnt 181901841	avg 4ms
	size[524288 - 1048064 ]	cnt 02161818	avg 7ms
	size[1048576 - 2096640 ]	cnt 00011727	avg 5ms

The interesting part here is the historygram, it shows block size, count and avg. latency – So now for each block size interval you can see the latency and see what this datastore it serving and the impact it has on latency. Again this is a good indication of how the storage/datastore is serving the VMs on top of it, but it cant be used for profiling the VMs or knowing which VM might be impacted by “poor” storage performance.

If there’s a need to understand storage better, there are better tools than vmkmgmt_keyval. Such a tool is Pernix Data, Architect – I’m not going to do a write up about Architect as others have already done that very well. Take a look at Pete Koehler’s blog post called Viewing the impact of block sizes with PernixData Architect.

 

This was all for now, I hope that I made my self pretty clear – vmkmgmt_keyval is tool to get better inside to your storage, but there are even better tools which does this with more precision and more granular.

Leave a Reply

Your email address will not be published. Required fields are marked *