Multiple vCPU Fault Tolerance on vSphere 6.0

It’s been a long wait, but it is finally over. The official support for multiple vCPU’s on a guest have been increased for VMware’s Fault Tolerance (FT). This feature, provides zero downtime, zero data loss and continuous availability for any application.

A little background on the history of FT with VMware.

They officially introduced this feature in vSphere version 4.0 back in 2009 and was instantly hailed as a breakthrough in zero downtime for guest operating systems and applications. This however did come with a price. There were only so many vm’s that you can protect per host and the overhead on the network and hosts were demanding.

I think back to when this feature became available and thought to myself – wow, there are going to be a lot of business lines that will want in on that magic. I was always sad when I told them the memory and vCPU limitations and those that did meet the guidelines were turned away since we exceeded the number of guest protections per cluster.

Number of virtual machines that can be protected is now based on how many vCPU’s are protected per host. The maximum remains at 4 guests or 8 FT protected vCPU’s (whichever comes first). These values are indicative of both primary and secondary virtual machines and vCPU’s. There is some overhead that is involved based on the workload and number of FT protected virtual machines. Generally, you can expect a 10-30% overhead increase. This overhead will primarily be on the network with a minimal CPU hit on each cluster node.

Reasons why it’s been at one vCPU for a while now

The limitations on how many vCPU’s can be protected lies in the lockstep mechanism that was used to keep the dormant node up to date and ready for an immediate takeover. This was known as the “Record-Replay” method and has been replaced with a new technology known as “Fast Checkpointing”. This new mechanism allows for multiple vCPU protection through the continuous copying/checkpointing of the virtual machine.

Some of the same rules apply:

To ensure a successful protection of virtual machines, you still need to abide by some of the basic rules for vSphere Fault Tolerance.

  • You still need to ensure that all machines protected are on a host running the same version of vSphere. In this case, version 6.0 (of course)
  • Dedicated virtual network VMkernel portgroups must be configured for FT logging.
  • 10GB Network links must be used.
  • As mentioned above, you will see a 10-30% network overhead increase (based on demand from the number of FT protected machines and workload).
  • vMotion is supported for both the primary and secondary nodes of a protected vm.

Additional Protection and options at the Storage Layer

In addition to the added processor limit, the new fault tolerant version also deviates from its old method of single storage point for both primary and secondary virtual machines. This new version separates the location of the virtual machine file on different storage volumes that further protects the machine from storage failures.

Fault tolerance on vSphere 6.0 now supports the use of thick and thin disk types. The previous version only supported eager zero thick.

Another great feature is the re-protect mechanism for those storage volumes that run out of space. vCenter will monitor the FT replica and spin up a new secondary VM on a new datastore.

Remaining Points

  • As before, svMotion is not possible with VMware Fault Tolerance running multiple vCPU’s.
  • Virtual Machines in vCloud Director, VSAN/VVOL’s and VMware Replication are not supported on SMP-FT machines.
  • VADP (vStorage API’s for Data Protection) and Snapshots are now supported on vSphere 6.0 FT!

Next-Generation Storage Symposium Recap

I got the chance to attend the Next-Generation Storage Symposium in San Jose yesterday that is organized by Tech Field Day and hosted by Steven Foskett. There were a number of vendors that spoke at the event and if I could put a theme on this event it would be centric around flash based storage. I am going to summarize on what each vendor brought to the table at this event in order to give you an overview of what was talked about.

Nexsan – Hybrid storage platform through SAN, NAS and Unified Storage Systems including their NS model that they referenced in their presentation.

Nimbusdata – The presentation revolved around their memory-based protocol storage systems and they discussed how simplistic the platform is and they emphasized a 10yr warranty on their platform.

Permabit – This company focuses on Enterprise Flash Array’s and cache solutions through their Albireo dedupication methods for SSD’s.

Purestorage -This is a pure flash-based storage platform that reads and writes to flash disks fundamentally different to take advantage of performance and sustain longevity.

Solidfire – They are focused on cloud-scale computing and the performance requirements behind it. Their point is that traditional storage platforms are not designed for this level of storage. Solidfire’s presentation focused on taking advantage of unique properties of these solid state drives.

Starboard – This company fell in line with the other presenters in that they have a hybrid storage platform that is focused on delivery at a certain price point. They also make note of cost savings from consolidating and mixing workloads with dynamic storage pooling.

Tegile – This presentation focused on their Hybrid storage system that utilizes spinning disk as well as flash based. They touched on their management capabilities with this technology around VDI as well.


Towards the end of the day, some of the Tech Field Day delegates hosted a number of sessions that had various panelists talk about a number of topics that pertained to the architecture of these next generation storage platforms.

Flash Storage overview – Comments from the moderator and panelists centered on the delivery mechanisms for each company that is focused on solid state as the primary tier for all levels. A strong argument was made from one of the attendees that some of these problems have been addressed at a higher layer and if these technologies have these same issues in mind when products are delivered.

Scaling storage for the future – Comments from the panelists revolved around how large scale environments can effectively be moved to the next level and maintain the service levels that the original deployment was designed to deliver.

Storage for the Virtual Infrastructure – Questions and statements in this panel talked about how these new storage platforms are going to affect how data is accessed and stored. One of the main highlights in this discussion were centered around how the storage layer is aware of the type of I/O that is being passed to it. VMware’s VAAI was called out specifically as well as other API’s that are coming to market.

Tech Field Day – Storage Field Day 2 Starts this Week!

Very excited that Storage Field Day 2 is amongst us and this week is shaping up to be a good one!

I will be heading to beautiful San Jose / Silicon Valley this week for Storage Field Day 2 on November 8th and 9th. I will be there with 10 other delegates and you can read more about it on the Tech Field Day page. I am really looking forward to visiting and getting into some deep technical discussions with the following companies that we will be visiting on Thursday and Friday:

  • Asigra
  • Nexgen Storage
  • Nimble Storage
  • Nimbus Data
  • Nutanix
  • Riverbed
  • Tintri
  • Virsto
  • Zerto

One of the great things about Tech Field Day is that these companies have engineers deliver the content and we really get to understand the product from a technical perspective and not just be presented with high level information.

Look for updated posts throughout the week and tune in at the link above for live video streaming from the meetings!