Another new feature in vSphere 5 is the way it's handled the HA process.
There is no more AAM agent like in vSphere 4.1. Instead, there has been a new agent introduced which is named FDM – Fault Domain Manager. The Primary/Secondary concept with 5 primary nodes which has been known in vSphere 4, is gone. You no longer needs to worry not to loose all those 5 primary nodes at the same time …. and loose the HA functionality for the rest of the cluster. Now there is only one agent in the cluster which plays the role of Master. The agent is called FDM – Fault Domain Manager. One host takes the role of Master. The other agents on other hosts plays only roles as a Slaves, and can became Masters in case the master fails.
The master monitors the availability of ESXi 5 hosts and also the VM availability. The master agent also monitors all slave hosts and in case this slave host fails, all VMs on that host are restarted on another host. Within each individual host the status of each protected VM is monitored and if a failure of that protected VMs happens, the master proceeds with the restart of this VM. The FDM master keeps a list of VMs being protected, which is updated after every power off or power on status initiated by user. FDM master keeps track of all hosts being a members of a cluster, any adding/removing of hosts refresh this list as well.
Now you might be thinking, what if… the master fails. In that case, there is a re-ellection process (this was not the case in vSphere 4) and the host which has an access to the greatest number of datastores is elected as a master. You might be thinking why that? It's because the secondary communication channel is through datastores. There are other considerations for a Slave to became elected as a Master as well.
The hosts with slave roles maintain a direct point-to-point TCP connection (no broadcasts) which is encrypted, with the Master. The election process is done via UDP, and then again only via SSL encrypted TCP the communication between the Master and the slaves are maintained.
The host with the master role sends periodically reports states to vCenter. The slaves are informed that the Master is alive via heartbeats. The slaves monitors the state of their locally running VMs and any changes are transmitted to Master. The Slave sends a heartbeats to master and if master should fail, the re-election process occurs. vCenter knows if a new Master is elected, because it's the new master which contacts vCenter after the re-election process is finished.
The secondary channel through datastores is known as a Heartbeat Datastores. But this secondary network is not used in normal situations, only in case the primary network goes down. This secondary channel permits the Master to be aware of all Slave hosts and also the VMs running on those hosts. The Heartbeat datastores can also determine if host became isolated or network partitioned. The secondary channel can determine if host is failed (PSOD) or if it's just isolated.
And as I could read elsewhere, to configure HA you'll need at least 2 shared datastores …
A quick quote from Chad Sakac's blog:
The other major change is the use of BOTH networking AND storage as a mode for communication and maintaining state. This is likely the first thing people will see that has them saying “huh?” – during the vSphere beta, it did for me, as you need to have 2 shared datastores to configure VM HA – and the first time I saw that I knew something had changed.
One more thing: HA no longer uses DNS – it means there is no dependency on DNS or hosts files..
Update: A quick quote from Uptime blog concerning DNS:
Ever had DNS resolution cause you issues when using vSphere HA? With 5.0, all dependency on DNS for vSphere HA has been removed!
Source: Slideshare Presentation by Eric Sloof – The Master.. -:)
hari says
Nice article
mhdganji says
hi
sorry but it is not true
sent 2 days to figure a problem on ha agent installation
it only said unknown error
and all problem was because of dns
so vmware is lying
dns is still a amust in esxi 5
Vladan SEGET says
Marketing? Yes, I know, the DNS is a must… Without DNS there is no name resolutions…. But it’s not only for an ESXi, it’s for the whole local network.
Best
Vladan
mhdganji says
hi vladan
i know that
and my networks always have and had dns since 2000 till now
but the guys at vmware says the ha agent is not dependent on that
i had a problem
3 hosts and one did not enter the cluster
the error was very ridiculous ! just said “unknown installer error”
spent 2 days and at last i found that the hosts should be added and introduced by name to vcenter (i did that by ip and i think there should not be a problem) but it was
this is what i meant
thanks
Vladan SEGET says
I see… Well, that’s why I always add an ESXi to the vCenter via FQDN…. Just because it looks sexier… -:).
Some times we all learn the hard way…
best
Vladan
Chipsnt says
Hi mhdganji,
I’m also getting the same error,and i have tried adding the hosts using FQDN to no avail,one host is joing the HA cluster without a problem but the other is popping up the “unknown installer error”
saurabh says
Dear Sir, I have a quick doubt and would appreciate if you could kindly reply to it on my email. What will happen if the vcenter was the appliance and was working as a VM on the host which was the master host as well and then the master host fails. so now the only host which has both the vcenter and the master FDM agent has failed. I know the reelection would occur however the new master would not be able to contact the vcenter initially until the vcenter vm is restarted on some other host. As well as suppose at that point of time some of the protected machines which were supposed to be protected being powered on by the user. Now the list which the previous master had maintained didn’t have the name of the supposed to be protected vm which is now being powered on. what happens in this scenario. I know this is very rare scenario but would be grateful if you could kindly throw some light as it would help me understand the concept better.
mhdganji says
Excuse me Vladan ! Correct Me If I Am Wong !
As Far As I Know :
Nothing Happens ! FDM Agents Take Care of Re-Election. Be Aware That HA Agents On Hosts Are Not Dependent To Vcenter to Do Their Jobs.
saurabh says
Thanks for your response. However, I wanted to know the actual process flow how it really happens and not that it happens. Also kindly let me know the answer of the second part as well. Thanks in advance.
mhdganji says
So If You Want To Dive Deep Into That I Recommend This Book :
VMware vSphere 5 Clustering Technical Deepdive
http://www.amazon.com/vSphere-Clustering-Technical-Deepdive-ebook/dp/B005C1SARM
And Also You Can Find A Lot Here :
http://www.yellow-bricks.com/vmware-high-availability-deepdiv/