A very fundamental feature of StarWind Virtual SAN is High Availability (HA) which maintains the data in sync. In this post, we'll talk about What is StarWind HA and how it works. The fact that StarWind HA only needs two hosts to stay in sync connected even without a switch (supported scenarios) makes this a perfect solution for ROBO.
Coupled with the fact that StarWind for vSphere now runs on Linux OS you're saving not only on physical hardware by eliminating shared storage, but even on OS licenses. With minimum hardware you can build a resilient storage solution protected by StarWind HA.
What Is StarWind HA and How it Works?
StarWind HA is basically a function which during the configuration phase creates an HA device on the first node and then on the second node. (Note: for StarWind Free, you can follow this QuickStart Guide at StarWind – Creating HA Device with StarWind Virtual SAN Free.
StarWind HA relies on redundant network links between the StarWind hosts to maintain storage resilience. Those links are used for monitoring and failures. If any of the nodes fail or stop processing requests properly, the failover is instantly initiated from the client OS/Hypervisor side. StarWind has also an internal heartbeat mechanism, which ensures proper storage path isolation in the event of synchronization network failures and prevents so-called storage “split-brain”.
They also use another mechanism to avoid “split-brain”. This mechanism is called “A node majority failover strategy” used when Heartbeat is not available. (however, for 2-node scenarios, you'll need 3rd node acting as a witness).
Here is the overview screen from the replication wizard.
Quote From StarWind:
Heartbeat – The Heartbeat failover strategy allows avoiding the “split-brain” scenario when the HA cluster nodes are unable to synchronize but continue to accept write commands from the initiators independently. It can occur when all synchronization and heartbeat channels disconnect simultaneously, and the partner nodes do not respond to the node’s requests. As a result, StarWind service assumes the partner nodes to be offline and continues operations on a single-node mode using data written to it.
If at least one heartbeat link is online, StarWind services can communicate with each other via this link. The device with the lowest priority will be marked as not synchronized and get subsequently blocked for the further read and write operations until the synchronization channel resumption. At the same time, the partner device on the synchronized node flushes data from the cache to the disk to preserve data integrity in case the node goes down unexpectedly. It is recommended to assign more independent heartbeat channels during the replica creation to improve system stability and avoid the “split-brain” issue. With the heartbeat failover strategy, the storage cluster will continue working with only one StarWind node available.
Node Majority – The Node Majority failover strategy ensures the synchronization connection without any additional heartbeat links. The failure-handling process occurs when the node has detected the absence of the connection with the partner. The main requirement for keeping the node operational is an active connection with more than half of the HA device’s nodes. Calculation of the available partners is based on their “votes”.
In case of a two-node HA storage, all nodes will be disconnected if there is a problem on the node itself, or in communication between them. Therefore, the Node Majority failover strategy requires the addition of the third Witness node which participates in the nodes count for the majority, but neither contains data on it nor is involved in processing clients’ requests. In case an HA device is replicated between 3 nodes, no Witness node is required. With Node Majority failover strategy, failure of only one node can be tolerated. If two nodes fail, the third node will also become unavailable to clients’ requests.
What Happens when One Node fails?
If you have a non-responding node, which can be down for various reasons (hardware failure, power failure etc…) the failover is immediate. Depending on the replication strategy you have configured, the running VMs placed on the remaining host keeps running.
Note: VMs which “died” with the failed host can be manually re-registered on the remaining host and powered back on, in the case you do not have VMware HA to automatically power On those VMs.
If you want automatic VM restart, then you'd need vSphere Essentials Plus instead of vSphere Essentials.
StarWind HA Architecture
From the architecture standpoint, the best is to have a look at this image where you'll see the individual physical NIC layout which is crucial for HA to be reliable.
StarWind VSAN free comes with Free Powershell scripts and supports both hyper-converged and “compute and storage separated” configurations. It is completely functional and allowed for production.
StarWind HA uses redundant network links between the StarWind hosts to ensure storage resilience. Like this, you have a fully fault-tolerant storage cluster with just two hosts.
It completely eliminates the need in physical shared storage since StarWind Virtual SAN mirrors the internal resources (internal disks, RAM or SSDs) between the servers. Once StarWind iSCSI targets are connected to all cluster nodes, the HA devices are treated as local storage by both hypervisors and clustered applications. Fault tolerance is achieved by providing multipath access to all storage nodes.
Final words
StarWind Virtual SAN eliminates the single point of failure (SPOF) for storage in virtualized infrastructure by using duplication (or triplication of data), caches and I/O controllers. All resources are “mirrored” between different physical hosts.
The shared storage becomes fault-tolerant (because backed by 2 or more hosts) and provides high availability to higher performance and low-cost.
More posts about StarWind on ESX Virtualization:
- Free StarWind iSCSI accelerator download
- VMware ESXi Free and StarWind – Two node setup for remote offices
- VMware vSphere and HyperConverged 2-Node Scenario from StarWind – Step By Step
- StarWind Storage Gateway for Wasabi Released
- How To Create NVMe-Of Target With StarWind VSAN
- Veeam 3-2-1 Backup Rule Now With Starwind VTL
- StarWind and Highly Available NFS
- StarWind VVOLS Support and details of integration with VMware vSphere
- StarWind VSAN on 3 ESXi Nodes detailed setup
- VMware VSAN Ready Nodes in StarWind HyperConverged Appliance
More posts from ESX Virtualization:
- How to Patch vCenter Server Appliance (VCSA) – [Guide]
- VCP6.7-DCV Objective 4.2 – Create and configure vSphere objects
- What is The Difference between VMware vSphere, ESXi and vCenter
- How to Configure VMware High Availability (HA) Cluster
- VMware Certification Changes in 2019
Stay tuned through RSS, and social media channels (Twitter, FB, YouTube)