I recently had the pleasure of drinking beers (virtually sober!) and talking tech with the lead of IT infrastructure at a Fortune 100 insurance fund. He was telling me about his upcoming trial of Zerto on top of his existing purchase of Rubrik. Straight away I knew this was someone who I can hang out with. This person gets it!After discussing the benefits of using both solutions together I asked if he knew he could tweak the Recovery Time Objective (RTO) of Zerto to make it run faster. I must admit, it was a loaded question. Like many of you reading this, the response was “I didn’t know you could do that”. The good news is that you can, and it’s very easy to do. If I could take your RTO from hours to minutes would that be worth testing out? If so, read on.
Before I give you the instructions on how to change the RTO let me first give you a disclaimer and explain how it works. Zerto support might ask you to remove this tweak if you have issues to prove it’s not the cause, so use at your own risk.
By default, the RTO of Zerto replicating on-premises is as follows:
- 5 VMs powered on every
- 5 minutes per ESXi host with
- 10 VMs concurrently processed and
- 30 Volumes concurrently processed
Using this calculation, if I have 500 VMs replicating to 10 ESXi hosts, this equals 50 VMs per host. 50VMs booting in increments of 5 VMs every 5 minutes equals a 50-minute RTO, except that Zerto is only going to perform operations on 10 VMs and 30 volumes at a time. So, 50-minutes might not be attainable depending on how quickly the Zerto Virtual Manager (ZVM) and the vCenter can respond to the requests to create/edit/power on VMs and how many VMDKs each has. It’s so subjective to the performance and size of your environment that usually the only way to accurately know is to perform a failover test and find out for yourself.
This could still be a good RTO for many customers and you might not have any good reason to change it. But, what if your All Flash Array or Hyperconverged environment can run faster? What if your ZVM and vCenter have enough CPU/RAM to cope with more simultaneous operations? What if you just want to do a better job on the RTO?In 2011 when Zerto first GA’d these defaults were about right, and some customers even had to dial it down with older hardware which is why it was originally configurable. However, in 2017 I’d argue that the default settings might be underutilizing the performance available to you.
If we take the same example and we pump up the concurrency to enable Zerto to process all the VMs required in each 5-minute interval. Then we set it to recover 10 VMs every 5 minutes. We would see the RTO drop to 25 minutes. Want to go faster? How about if we did 20 VMs every 5 minutes? That 50-minute RTO is now 12.5 minutes. Big difference!
So how do you change this setting? The answer is hidden in your “C:\Program Files\Zerto\” folder on each ZVM. Within the folder, there is a file called tweaks.txt. Simply add the below 4 lines to your tweaks.txt on both ZVMs, save it, restart the ZVM service and that’s it! I thoroughly recommend raising the values in small increments and running a failover test to benchmark the result with no impact:
t_zertoMaxVMsPerHostBootBatch=15 t_zertoPowerOnDelaySec=300 t_ZvmMaximalDegreeOfConcurrency=10 t_ZvmMaximalDegreeOfConcurrencyVolumes=30
Using the below tweaks I reduced my RTO between 2 x 4 node Nutanix clusters running vSphere 6 with 200 VMs from 50 minutes to 12 minutes:
t_zertoMaxVMsPerHostBootBatch=50 t_zertoPowerOnDelaySec=300 t_ZvmMaximalDegreeOfConcurrency=50 t_ZvmMaximalDegreeOfConcurrencyVolumes=100
Hopefully, the name of each setting is self-explanatory, but any questions let me know in the comments section. Thanks for reading and have fun tweaking!
Joshua