Press "Enter" to skip to content

Scripting a Rubrik Recovery Plan using REST APIs & PowerShell

Joshua Stenhouse 7

Note: this post has been superseded by v2.0 of my Recovery Plan script:
https://virtuallysober.com/2018/01/24/testing-dr-scripting-a-rubrik-recovery-plan-v2-0/

Following hot on the heels of my first post on an “Introduction to Rubrik REST APIs using PowerShell & Swagger” I’d now like to show you how to easily automate the recovery and boot ordering of VMs as a Recovery Plan.

In the Rubrik HTML5 interface, you can easily recover any VM in just a few clicks with the VMs running on a whopping 30,000 IOPS per brik (Rubrik appliance) giving you a sub 1 minute RTO. However, at scale, clicking on each VM to recover can become tedious, hard to manage and it will always require human interaction. This is where using PowerShell to interact with REST APIs is going to make your life easier by automating the entire process for anything from 1 to 10,000+ VMs. Use cases include:

  • Disaster recovery and failover testing of VMs replicated between Rubrik clusters
  • Recovery from production storage outages in a controlled manner
  • Bring multi-VM applications online in a working state with pre-configured time delays between VMs
  • Automatically create temporary dev/test VMs on any frequency required
  • Interactive user-driven recovery with warning prompts or fully automated end to end
  • Pre/post recovery scripting along with VM name customization

The 2 Rubrik operations we will be using in the script are “LiveMount” and “InstantRecover”. LiveMount should typically be used for testing the recovery of VMs which should not be attached to production port groups or have any networking at all. The primary use case of InstantRecover is to connect the recovered VMs direct to production port groups with the original VM (if still in the inventory) automatically powered off, deprecated and renamed by Rubrik. If you want InstantRecover without the existing VM being deprecated, you can use a LiveMount with the right combination of parameters to achieve this.

To start, you’re going to need a list of VMs to recover. For this I’m using a simple CSV with each VM listed in the order it is to be booted/recovered with the following fields (mandatory fields in bold):

  • VMName (assumes unique VM names)
  • Action (LiveMount or InstantRecover)
  • DisableNetwork (TRUE or FALSE)
  • RemoveNetworkDevices (TRUE or FALSE)
  • PowerOn (TRUE or FALSE)
  • RunScriptsinLiveMount (TRUE or FALSE)
  • PreFailoverScript (leave empty or specify script)
  • PostFailoverScriptDelay (0-x seconds)
  • PostFailoverScript (leave empty or specify script)
  • NextVMFailoverDelay (0-x seconds, leave as 0 for no delay between VM boot requests)
  • PreFailoverUserPrompt (leave empty for no prompt, or enter custom text for user prompt)
  • PostFailoverUserPrompt (leave empty for no prompt, or enter custom text for user prompt)

RubrikRecoveryPlan

Don’t worry about creating this by hand as you can download a ready-made example at the end of this post. Once you’ve authenticated with the REST API using the example from my first post, the key commands you need are:

# Getting list of VMs
$VMListURL = $baseURL+"vmware/vm?limit=5000"
Try 
{
$VMListJSON = Invoke-RestMethod -Uri $VMListURL -TimeoutSec 100 -Headers $RubrikSessionHeader -ContentType $TypeJSON
$VMList = $VMListJSON.data
}
Catch 
{
Write-Host $_.Exception.ToString()
$error[0] | Format-List -Force
} 
# Getting VM ID
$VMID = $VMList | Where-Object {($_.name -eq $VMName)} | select -ExpandProperty id 
# Getting VM snapshot ID
$VMSnapshotURL = $baseURL+"vmware/vm/"+$VMID+"/snapshot"
Try 
{
$VMSnapshotJSON = Invoke-RestMethod -Uri $VMSnapshotURL -TimeoutSec 100 -Headers $RubrikSessionHeader -ContentType $TypeJSON
$VMSnapshot = $VMSnapshotJSON.data
}
Catch 
{
Write-Host $_.Exception.ToString()
$error[0] | Format-List -Force
}
# Selecting most recent VM snapshot to use for recovery operation
$VMSnapshotID = $VMSnapshot | Sort-Object -Descending date | select -ExpandProperty id -First 1

Now we have the VM ID, snapshot ID you can perform either operation required. Let’s start by looking at how to perform a LiveMount:

# Performing Live Mount by first specifying JSON parameters and URL required
$VMLMJSON =
"{
  ""vmName"": ""$VMName - LiveMount"",
  ""disableNetwork"": true,
  ""removeNetworkDevices"": false,
  ""powerOn"": true
}"
$VMLiveMountURL = $baseURL+"vmware/vm/snapshot/"+$VMSnapshotID+"/mount"
# POST to REST API URL with VM JSON
Try 
{
write-host "Starting LiveMount for VM:$VMName"
$VMLiveMountPOST = Invoke-RestMethod -Method Post -Uri $VMLiveMountURL -Body $VMLMJSON -TimeoutSec 100 -Headers $RubrikSessionHeader -ContentType $TypeJSON
}
Catch 
{
Write-Host $_.Exception.ToString()
$error[0] | Format-List -Force
} 

Here you can see how to take the same information to perform an InstantRecover operation:

# Performing Instant Recovery by first specifying JSON parameters and URL required
$VMIRJSON =
"{
  ""vmName"": ""$VMName"",
  ""removeNetworkDevices"": false
}"
$VMInstantRecoverURL = $baseURL+"vmware/vm/snapshot/"+$VMSnapshotID+"/instant_recover"
# POST to REST API URL with VM JSON
# Warning, connects the VM the production network, shuts down and renames the original VM if it exists as "Deprecated VMName Date Time"
Try 
{
write-host "Starting InstantRecover for VM:$VMName"
$VMInstantRecoverPOST = Invoke-RestMethod -Method Post -Uri $VMInstantRecoverURL -Body $VMIRJSON -TimeoutSec 100 -Headers $RubrikSessionHeader -ContentType $TypeJSON
}
Catch 
{
Write-Host $_.Exception.ToString()
$error[0] | Format-List -Force
} 

If we then take these commands and wrap them up in a simple script that combines the ability to prompt the user, run separate scripts, and wait time delays, you have a very powerful recovery plan! To hit the ground running you can download my example here:

RubrikRecoveryPlanv1.zip

To run this fully automated with no user interaction simply remove the prompts for user credentials at the start along with PreFailoverUserPrompt and PostFailoverUserPrompt then you’re good to go! If you found this script useful please like and share. Happy scripting,

Joshua

  1. Hi Virtually Sober, thank you for sharing great tutorials with the community.

    can rubrik offer near zero RPO? does it do journal based replication – will be grateful if you can share your expert insight on this in a blog post or in comment reply – Thank you

    best regards
    Ali

    • Joshua Stenhouse Joshua Stenhouse

      Hey Ali!

      Having worked at both Zerto and now Rubrik I can definitely answer your question. Rubrik offers best case RPOs of hourly for VMs using VADP/VM snaps and every 15 minutes for SQL databases using transaction log backups. The backups can then be replicated asynchronously as part of the backup policy (SLA Domain) to get the data offsite to another Rubrik cluster (physical or in AWS/Azure). You can recover to multiple points in time, but these are limited to the frequency on which you can take the backups. Once you have the backup offsite you can then leverage scripts such as the example I provide to orchestrate the recovery operation. I haven’t extended it to SQL databases yet but I’m thinking of adding it in.

      So Rubrik definitely isn’t near 0 RPO or journal based replication, in my opinion. If you need that then you most definitely should look at Zerto. Nobody does near 0 RPOs and journal based point in time better! For me it comes down to the use case. If the primary use case is backup with a copy of the backup offsite for DR then I’d use Rubrik. If the primary use case is replication and orchestration then I’d use Zerto. If I need both use cases then use both technologies together! They compliment each other very well in my testing. Self-managing backups and self-managing replication (at Zerto I really didn’t emphasize enough how important it is that you don’t schedule Zerto replication, you set the priority/QOS and it manages itself). At Rubrik this is a key tenant of the platform that is explained on every demo because it’s such a shift change from classic backup scheduling, just as Zerto is a shift change from replication scheduling.

      Any further questions let me know. Thanks,

      Joshua

  2. Rob Longdon Rob Longdon

    Firstly thanks for getting me started with Rubrik automation, sharing this is helpful. My question is with regard to a live mount config. I would like to set to network to an isolated DR test LAN. Is it possible to achieve this with the config parameters?

    If I look at a config output from a given snapshot detail I can see entries such as switchUuid and portgroupkey which suggests you could set a network switch and portgroup. Have you tried this and is there any gotchas.

    Thanks.

    • Joshua Stenhouse Joshua Stenhouse

      Hey Rob. You’re most welcome! Sorry for my tardy response, been away on honeymoon…

      I agree it looks like it should be there, but sadly it’s not possible in the Rubrik API today. You need to take the VM and customize it using PowerCLI to edit the port groups etc. I was going to write an example but there are some internal developments coming that might make this redundant. I do actually have a new version of this script anyway which contains both stop and start scripts to help you undo the test. Would this be useful as a start? Secondly, if you did want the script to edit NICs would you want it simple and just 1 port group for all NICs on the VM or more complex and configurable per VMNIC? Simple means I’d just add 1 column to the existing csv, complex means we’d need a whole separate csv just for VMNICs configs.

  3. Greg Schmidt Greg Schmidt

    Joshua, Do you have any scripts that help identify which servers/virtual machines are protected by Rubrik? Is there a way to identify servers/virtual machines that need to be added to the Rubrik protection ? Your guidance is appreciated. Cheers, Greg

    • Joshua Stenhouse Joshua Stenhouse

      Your wish is my command! I just put this together for you:
      https://virtuallysober.com/wp-content/uploads/2018/06/RubrikVMListv1.zip
      The script needs Rubrik CDM 4.1.1+ (as it expects mandatory TLS 1.2). If you’re on an earlier Rubrik build remove line 63. Extract to C:\RubrikVMListv1\, edit the .ps1 $RubrikCluster variable with any Rubrik node. Run it once and you’ll be prompted for creds which are saved securely. It will output the list to CSV, but you can do whatever you want with the info. Any questions/issues let me know.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Discover more from Virtually Sober

Subscribe now to keep reading and get access to the full archive.

Continue reading