2012 Cluster Stability Hotfix

Recently I was on a customer’s site migrating the storage for a three node Hyper-V 2012 cluster. As well as moving from the older storage on to a 3PAR, we also moved from iSCSI to FC.

One of the reasons for moving was down to the stability of the old array, previously they had lost volumes and had to rebuild them. I therefor found it a little concerning that post all of the storage migrations we were seeing CSV timeouts. When these timeouts occurred, about once a week, you could not even browse the CSV directory on some, but not all hosts, you’d just be stuck with an hour glass.

The logs were not really pointing to anything in particular, the 3PAR was fine, the FC switches were fine, after all some hosts could still see all of the volumes. We patched the servers up to the hilt, updated all of the HW to the latest and greatest, but we were still seeing the issue. The only way to resolve the situation when it happened was to reboot one of the hosts, it was if a something was locked and would not release resulting in a “broken” cluster.

After further investigation I came across quite an old hotfix, KB2870270, a cluster stability hotfix. This has now been superseded by KB2878635, don’t let the title put you off “Update is available that improves the resiliency of the cloud service provider in Windows: December 2013”, this fix has the KB2870270 stability hotfix rolled up into it. It’s not the newest of hotfixes either:

https://support.microsoft.com/en-us/kb/2878635

It’s now over 3 weeks since the hotfix has been installed on the three hosts and there have been no more issues. In addition they had a Windows 2000 server running in the cluster as a VM that would crash every hour…! This issue has also now gone away (so win 2000 “will work” on a 2012 cluster 🙂 ).

It brings me back to the question of when do you install a hotfix? Before you have a know issue or after? Microsoft always warn that you should not install if you are not having the exact problem. KB2878635 for example is titled “Update is available that improves the resiliency of the cloud service provider in Windows” and does not talk about CSV timeouts, but it does include the stability hotfix KB2870270. In my opinion having worked on several 2012 cluster issues I’d say if there is a hotfix out there, get it installed before you see the issue, it may save you pain later.

Of cause the best advice here would really be upgrade to 2012 R2!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s