Press "Enter" to skip to content

Rubrik Color-Coded VM Backup Reports

Joshua Stenhouse 1

They say imitation is the sincerest form of flattery and in this post, I’m going to share with you my scripts for Veeam style color-coded emails from Rubrik!

For the record I love Veeam. Back in 2010 there was no better solution for backing up VMware environments, it was my standard go-to for any VMware environment I deployed. Veeam has grown exponentially since because of its VM-level simplicity, but it still hasn’t made much headway into enterprise IT and its now old tech. No HTML5, bolted on APIs, Windows mgmt consoles everywhere, no deduplicated storage, not cloud-native. Even when I see it on a larger scale its typically alongside another backup product or process handling everything Veeam can’t.

So how do you get all the goodness of a modern solution like Rubrik managing not just VM backups, but SQL, Oracle, NAS, physical, while keeping a nice VM-level report with green, orange, and red, to clearly show whether a VM backup needs further investigation? Rubrik has some great built-in email reporting capabilities, but this hasn’t yet stretched to color coding of emails yet. The answer is PowerShell, some basic HTML code, and the REST API first architecture of Rubrik allowing us to pull any data we need.

The request for this report came from 5 customers who loved Rubrik but missed their Veeam email reports. Here’s the example they sent me to emulate:

Pretty colors! But it’s still missing some really important information like:

  • What was the backup success %?
  • Was the backup app or crash consistent?
  • Was the VM even powered on?
  • How many VMDKs were on the VM?
  • If the job is still on-going when do I get the email?
  • Do I have to look at 1 email per job?
  • What is the status of the backup infrastructure itself?

If you think about the above questions, then the basic color-coded email from Veeam doesn’t really tell me if the backup was of any use. To play devils advocate; what if the backup was crash consistent and I didn’t force app consistency, somebody powered off the VM, removed all the VMDKs, the job overran so I didn’t get the email when I expected, I had to look at 20 emails to find this 1 VM, or my repositories are now 99% full and the next backups might fail? The report could be green, but the backup useless, or subsequent backups are going to fail.

Even worse is that someone will typically take the report then go manually start a backup on each VM, one by one, to remediate it being out of compliance. We can certainly do better than that!

Using PowerShell Invoke-RestMethod, Send-MailMessage, some simple HTML, and the Rubrik REST APIs here is the equivalent from Rubrik:

Color-coded with way more useful information! I decided to throw everything into the table so you can reel it in by deleting the table columns of your choosing from the HTML code. It has the following features:

  • Pre-built PowerShell script ready to run a schedule in your environment today
  • Each VM goes green, orange, red (configurable) depending on the outcome of last backup and SLA compliance
  • Table headers and outcome changes color if any warnings, failures, or not meeting SLA
  • Generates 1 report across all protected VMs with the SLA assigned (no more 1 email per job)
  • Supports SMTP authentication and SSL, or straight SMTP relay, with multiple recipients
  • Specify separate email address for a consolidated list of all failures (so you can use 1 email for all reports, another just for failures to open a helpdesk ticket)
  • Shows VM consistency, tools, power status, VMDK count, total backups and OS
  • Includes the failure or warning message if a backup wasn’t successful
  • Total backup success % and other useful summary statistics
  • Exclude SLA domains if required
  • Includes Rubrik cluster health, node health, total space and utilization
  • Automatic remediation of non-compliant VMs with an on-demand backup (disabled by default), removing the manual process of remediation altogether

 To download your copy simply click on the zip file below:

Extract the script to C:\RubrikAdvancedReportingv1\ (or change the $ScriptDirectory variable on each script), edit the -Settings.ps1 file with your defaults. On first run the script will prompt for Rubrik credentials and store them securely in an XML file for subsequent headless runs. There are 2 versions of the email report within the zip file, to explain the difference:

VMwareVMs-BusinessSLA.ps1

  • Uses the $BusinessSLAInHours variable in the settings (default 24 hours) to determine compliance by checking if each VM has a backup within the period specified
  • It doesn’t matter if the VM has multiple backups within the period or if a backup failed last week, it just checks the last backup
  • Allows you to have lower RPOs on an SLA domain but not be held to that frequency for compliance/reporting purposes

VMwareVMs-ByActualSLA.ps1

  • Bypasses $BusinessSLAInHours from the settings file and instead gets the frequency in hours on the SLA assigned to the VM. I.E an SLA backing up every 5 hours means it looks for a backup within the last 5 hours
  • The frequency is then used to determine compliance if the last backup is within that frequency
  • Allows you to have VMs backing up hourly, daily, weekly etc, and determine individual compliance

I created both due to different customer requirements. All the columns/ordering can of course all be removed or changed to your wishes by simply editing the HTML in the script.

All feedback welcome and I hope you found this useful in your pursuit of simplifying backup with Rubrik. Happy scripting,

@JoshuaStenhouse

  1. Blake Blake

    I’m looking forward to this when I get back from vacation. I’m finding Rubrik Reporting wetting my appetite but still not completely helpful due to odd intricacies. I’m not wanting to get too deep into writing scripts but looks like I may have to. One typical report we set up with Commvault was a daily report across all jobs at all sites that shows any backups that failed or had warnings. The warnings in Rubrik aren’t usually actionable and i can’t filter out certain warnings so the report just isn’t useful yet with Rubrik. But perhaps after mastering how to write reports with Powershell I can be more specific about what I’m looking for. For example, don’t show a warning when it couldn’t contact RBS nor install it since I don’t have this on most VMs and don’t have auto-install turned on. These aren’t warnings to me but show up as warnings. Second, filter out when the pieces of jobs failed or were canceled by Rubrik but then the re-attempt succeed. I have no idea what to do with errors like this one which we frequently get and I’m confused by support’s response but the assure me these ‘cancelations’ are not to be worried about. I’d love to filter them out of my reports then:

    “Internal server error ‘Took running job instance CREATE_VMWARE_SNAPSHOT_794xxxx6-axx9-4xxb-8xx3-34xxxxxxxx03-vm-1105:::3 in state ACQUIRING away from node Some(cluster:::RVMHMxxxxxxxx6). Node Some(cluster:::RVMHMxxxxxxxx6) may have been in bad shape (disconnected, rebooted, crashed process, etc).’

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: