VMware Fault Tolerance (FT) is a very emblematic feature which provides an ultimate VM protection without any downtime for the application(s) running within the VM. You can lose the underlying host, but the VM which runs with her identical copy on another, host, is fully resilient. This post, VCP-DCV on vSphere 8.x Objective 1.4.5 – Identify use cases for fault tolerance, is part of a community study guide – VCP8-DCV Study Guide Page.
When a Secondary VM is called upon to replace its Primary VM because of a failure, the Secondary VM immediately takes over the Primary VM’s role with the entire state of the virtual machine preserved. Applications are already running, and data stored in memory does not need to be reentered or reloaded. Failover provided by vSphere HA restarts the virtual machines affected by a failure.
The protected virtual machine is called the Primary VM. The duplicate virtual machine, the Secondary VM, is created and runs on another host. The primary VM is continuously replicated to the secondary VM so that the secondary VM can take over at any point, thereby providing Fault Tolerant protection.
The Primary and Secondary VMs continuously monitor the status of one another to ensure that Fault Tolerance is maintained. A transparent failover occurs if the host running the Primary VM fails, or encounters an uncorrectable hardware error in the memory of the Primary VM, in which case the Secondary VM is immediately activated to replace the Primary VM. A new Secondary VM is started and Fault Tolerance redundancy is reestablished automatically. If the host running the Secondary VM fails, it is also immediately replaced. In either case, users experience no interruption in service and no loss of data.
A fault tolerant virtual machine and its secondary copy are not allowed to run on the same host. This restriction ensures that a host failure cannot result in the loss of both VMs.
Applications which must always be available, especially applications that have long-lasting client connections that users want to maintain during hardware failure.
- Custom Apps – Custom applications that have no other way of doing clustering.
- Too complicated solutions – Cases where high availability might be provided through custom clustering solutions, which are too complicated to configure and maintain.
- On-Demand FT – Another key use case for protecting a virtual machine with Fault Tolerance can be described as On-Demand Fault Tolerance. In this case, a virtual machine is adequately protected with vSphere HA during normal operation. During certain critical periods, you might want to enhance the protection of the virtual machine. For example, you might be running a quarter-end report which, if interrupted, might delay the availability of critical information. With vSphere Fault Tolerance, you can protect this virtual machine before running this report and then turn off or suspend Fault Tolerance after the report has been produced. You can use On-Demand Fault Tolerance to protect the virtual machine during a critical time period and return the resources to normal during non-critical operation.
Limits of FT
In a cluster configured to use Fault Tolerance, two limits are enforced independently.
das.maxftvmsperhost
The maximum number of fault tolerant VMs allowed on a host in the cluster. The default value is 4. There is no FT VMs per host maximum, you can use larger numbers if the workload performs well in FT VMs. You can deactivate checking by setting the value to 0.
das.maxftvcpusperhost
The maximum number of vCPUs aggregated across all fault tolerant VMs on a host. The default value is 8. There is no FT vCPU per host maximum, you can use larger numbers if the workload performs well. You can deactivate checking by setting the value to 0.
Licensing
The number of vCPUs supported by a single fault tolerant VM is limited by the level of licensing that you have purchased for vSphere. Fault Tolerance is supported as follows:
- vSphere Standard and Enterprise. Allows up to 2 vCPUs
- vSphere Enterprise Plus. Allows up to 8 vCPUs
vSphere FT requires a 10-Gbit network between ESXi hosts in the cluster, a dedicated 10-Gbit network exclusively for FT is recommended. vSphere FT supports up to 8 vCPUs on a single VM, which means vCenter Server instances of size Large or greater cannot be protected using vSphere FT.
Screenshot from VMware
Unsupported Features of FT
I thought it might be interesting to add which features are not supported with VMware FT.
- Snapshots – Snapshots must be removed or committed before Fault Tolerance can be enabled on a virtual machine. In addition, it is not possible to take snapshots of virtual machines on which Fault Tolerance is enabled.
Note: Disk-only snapshots created for vStorage APIs – Data Protection (VADP) backups are supported with Fault Tolerance. However, legacy FT does not support VADP.
- Storage vMotion – You cannot invoke Storage vMotion for virtual machines with Fault Tolerance turned on. To migrate the storage, you should temporarily turn off Fault Tolerance, and perform the storage vMotion action. When this is complete, you can turn Fault Tolerance back on.
- Linked clones – You cannot use Fault Tolerance on a virtual machine that is a linked clone, nor can you create a linked clone from an FT-enabled virtual machine.
Virtual Volume datastores. - Storage-based policy management – Storage policies are supported for vSAN storage.
- I/O filters.
- TPM.
- VBS enabled VMs.
Some Fault Tolerance configuration and failover issues
How to resolve FT problems.
- Hardware Virtualization Not Enabled – You must enable Hardware Virtualization (HV) before you use vSphere Fault Tolerance.
- Compatible Hosts Not Available for Secondary VM – If you power on a virtual machine with Fault Tolerance enabled and no compatible hosts are available for its Secondary VM, you might receive an error message.
- Secondary VM on Overcommitted Host Degrades Performance of Primary VM – If a Primary VM appears to be executing slowly, even though its host is lightly loaded and retains idle CPU time, check the host where the Secondary VM is running to see if it is heavily loaded.
- Increased Network Latency Observed in FT Virtual Machines – If your FT network is not optimally configured, you might experience latency problems with the FT VMs.
- Some Hosts Are Overloaded with FT Virtual Machines – You might encounter performance problems if your cluster’s hosts have an imbalanced distribution of FT VMs.
- Losing Access to FT Metadata Datastore – Access to the Fault Tolerance metadata datastore is essential for the proper functioning of an FT VM. Loss of this access can cause a variety of problems.
- Turning On vSphere FT for Powered-On VM Fails – If you try to turn on vSphere Fault Tolerance for a powered-on VM, this operation can fail.
- FT Virtual Machines not Placed or Evacuated by vSphere DRS – FT virtual machines in a cluster that is enabled with vSphere DRS do not function correctly if Enhanced vMotion Compatibility (EVC) is currently disabled.
- Fault-Tolerant Virtual Machine Failovers – A Primary or Secondary VM can fail over even though its ESXi host has not crashed. In such cases, virtual machine execution is not interrupted, but redundancy is temporarily lost. To avoid this type of failover, be aware of some of the situations when it can occur and take steps to avoid them.
Find other chapters on the main page of the guide – VCP8-DCV Study Guide Page.
More posts from ESX Virtualization:
- Homelab v 8.0 (NEW)
- vSphere 8.0 Page (NEW)
- Veeam Bare Metal Recovery Without using USB Stick (TIP)
- ESXi 7.x to 8.x upgrade scenarios
- A really FREE VPN that doesn’t suck
- Patch your ESXi 7.x again
- VMware vCenter Server 7.03 U3g – Download and patch
- Upgrade VMware ESXi to 7.0 U3 via command line
- VMware vCenter Server 7.0 U3e released – another maintenance release fixing vSphere with Tanzu
- What is The Difference between VMware vSphere, ESXi and vCenter
- How to Configure VMware High Availability (HA) Cluster
Stay tuned through RSS, and social media channels (Twitter, FB, YouTube)
Goran says
“Note: Disk-only snapshots created for vStorage APIs – Data Protection (VADP) backups are supported with Fault Tolerance. However, legacy FT does not support VADP.”
What does it mean!?
Vladan SEGET says
That you can’t backup some older FT based VMs via Veeamm or other traditional backup software. Only via storage snapshots of your array…..
https://kb.vmware.com/s/article/1016619
Hope that helps
Goran says
It is not so clear.
By Veeam kb it is possible https://www.veeam.com/kb1178
Also one Veeam customer open the ticket and here is answer
https://forums.veeam.com/vmware-vsphere-f24/veeam-and-ft-t54670.html
In VMware kb there is statement that snapshot is not supported but Veeam kb where claim that is supported ?!
Vladan SEGET says
It seems that VMware’s KB has not been updated for vSphere 8. They stating that “For vSphere 6.5 & 6.7: Disk-only snapshots created for vStorage APIs – Data Protection (VADP) backups are supported with Fault Tolerance”, so In my understanding it’s for 6.5-6.7 and moving forward (including vSphere 8), it is supported.
And Veeam’s KB is right about that. So yes, to me it should be supported now. I’ll update the post with a quick note.
Thanks for pointing this out.