Been some time since the last blog post and I apologize for that, but it only because off happy circumstances, I became a father, for the second time to a healthy boy 🙂 So I have had some time off to be with my family. Well with that sorted lets move on to the topic of the day.
Doing COW math
Hope you have been doing your VMware COWs math… And here I’m of course talking about copy on write snapshots. Some time ago now got a few service request stating the some services were down for some VMs (50+). There were nothing obvious wrong, VMs were running, no BSOD, host was online, OS was running etc. So I started looking at host log files since this problem was host wide, by the way I did it with my new favorite tool Log Insight. After some digging I found some odd warning in the log. This let my to think of physical memory, which there were plenty for and then heap… I’ve see similar problems with heap before, but mostly with third party software, where the heap space had run full, which had some unwanted impact on service up time. After some more digging I validated that this was the problem, and I started mitigating the problem, in this case it was quite simple a matter for getting all the VMs off the host with a vMotion, as there were quite a few VMs running on the host I chose to put the host into maintenance mode.
Copy on write snapshot
So were does the COW math come in? you might ask. After seeing the problem I realized I don’t know the first thing about how snapshots affect the COW heap, let alone that there is some thing called COW heap, which is a limitation on a host level. Goggling for documentation on how COW heap work didn’t give me anything, so I turned to twitter and Cormac Hogan, answered my question with a KB article, which was just what I needed. Link can be found a the bottom of this post. The article explains that COW heap as default is 192MB and can be configured to a maximum of 256MB, it also tries to give you an understanding as to how COW heap is being consumed. So my initial thought was I need to make a COW calculator in Powershell/PowerCli so I can see if this was just a one time issue, or if this is a problem that will occur again and again, and if it could be mitigated by changing the heap size. But as always work chases up with me and i’m forced to leaved at this. So this is what I have done so far, it pretty basic it list the values and how you can calculate the different aspects of COW heap and how one parameter affects the other.So this is what I leave you with, I hope to get the time to finish the job but until then the calculations have to be done some what manually.
#Data is in bytes #Z = Number of Virtual Machine's #Y = Number of disks #X = Number of snapshots that each disk has #W = Size (in bytes) #COW_HEAP_SIZE = COW Heap size $Z = "364,68" $Y = "5" $X = "2" $W = "53687091200" $COW = (Get-AdvancedSetting -Entity HOSTNAME -name COW.COWMaxHeapSizeMB).value * 1024 * 1024 $Z = (75 / 100 * $COW) / (($W / (2 * 1048576) * 4 * $X) * $Y) $Y = (75 / 100 * $COW) / (($W / (2 * 1048576) * 4 * $X)) / $Z $X = (75 / 100 * $COW) / (($W / (2 * 1048576) * 4) * $Y) / $Z $W = (75 / 100 * $COW) / (($Z / (2 * 1048576) * 4 * $X) * $Y)
In the event that you want to change the COW heap this KB, does a great job at explaining how it works.
3 thoughts on “Doing COW math – Copy on write snapshot heap”
Hi Ryom
Did you ever find the root cause for running out of COW heap? Was it a third party thing or was it just a coincidence (that could well happen again)?
Regards
Monberg
No never did a proper RCA – I concluded that is was due to larger than normal snapshots – there were quite a few 100gb+ snapshots on the host. No 3rd party tools were installed on the host. And vms started responding as soon as I moved Them of the host eg. Freeing up heap.