Just a few points I picked up from watching a MS engineer do his thing.
Gather a list of the VM guests affected by the issue, their VM hosts and the cluster volume shares their disks are located on and msinfo32 from each VM.
Start -> Admin tools -> Failover cluster shows the cluster. You can see which VMs are highly available under the applications and services.
On the vm's affected start -> run eventvwr. Filter the current log, look for 1001, 6008, 41. You should be able to find the time it shutdown or crashed.
Check C:\Windows\ for memory.dmp. If the date is recent on it you can look into opening it and reading it. I didn't have a mem dump.
Run a validate on the cluster. Choose only tests I select, uncheck the whole storage section. If you leave this on you can take the cluster offline.
Collect C:\Windows\Cluster\Reports\cluster.log from each VM host
start -> run -> cmd. run "cluster . log /gen" this collects logs from all nodes in the cluster.
start -> run -> fltmc. when you type fltmc.exe a list of filter drivers appears. Filter drivers are often the cause of blue screen's. Not sure how to use this command.
He noted there was a FileServer running on the cluster resources and that it should not be there. Only the cluster IP and the quorm should appear.
Networks should be named. Heartbeat, public, iSCI 1 and iSCSI 2.
You should have a dedicated NIC for live migration.
You should set up the preferred owners so in the event of an issue VMs will migrate to the vm hosts you select other wise they will choose themselves.
Network binding should be host, heartbeat
Any network connections not in use should be disabled.
No comments:
Post a Comment