Back when I last managed a virtual infrastructure one of my key responsibilities was managing VM backups. My first daily task was to verify backups from the previous night and resolve any failures. Some days this was easy, some days it was a pain, but the one constant was that the process always needed my attention which meant I had less time for fun innovative projects. I was carrying out manual processes just to keep my job and the business safe, and I only had a couple hundred VMs. Imagine the burden when you have thousands of VMs, SQL DBs, hosts? This is when you have whole teams for this task. Some organizations even have backup teams on night shifts!
That was 2011 and sadly, this daily ritual continues for many IT organizations. It’s now 2018, we have self-driving cars, strategic shifts in IT to automation and public cloud, yet we still manually click, click, click.
If you’re stuck in this rut then I have good news for you. There’s a better way! First, you need to switch to Rubrik and ditch those legacy backup technologies that have been around since the 90s, or still use a legacy architecture of proxies, management servers, catalog databases, non HTML5 interfaces, bolt-on APIs, single points of failure, bottlenecks, multiple silos of monolithic dedupe storage and generally just suck.
Next, leverage the Rubrik REST APIs and start consuming infrastructure as code to fully automate this once manual task. Yes, you can create pretty reports in the Rubrik GUI, but you still have to read them. So to help you get started I’m going to share with you a pre-written PowerShell script library that automates the following:
- Queries the status of each protected object (VM, DB, NAS, Physical host etc), applies your business SLA (I.E 24 hours), then determines whether the object is compliant by verifying a backup exists within the timeframe
- Automatically remediates the object by taking an on-demand backup (optional, disabled by default)
- Sends an email to your IT helpdesk to create a ticket for each group of objects not meeting the business SLA (I.E 1 ticket for all VMs not meeting SLA, 1 for DBs etc)
- Sends an HTML report to a different email showing all objects
- Outputs a CSV for logging the results
How cool is that? No more manually checking a report, creating a ticket, and fixing the backup. Whether you have 100 VMs or 5000, this is exactly what should be automated in 2018 to increase the value of you and your business. It even tells you how long since the last backup, total backups, and lots of other useful info too. So where do you get this script library? Download it here:
RubrikAutoTicketingv1.zip (updated for CDM 4.1.1)
Unzip the file to “C:\RubrikAutoTicketingv1\” so you don’t have to edit the file location variables. No modules are required for running the scripts as they leverage the REST APIs of Rubrik natively. In the folder you’ll find the following files:
- RubrikAutoTicketingv1-Auth.ps1
– Run this first to securely store your Rubrik and SMTP credentials (cancel the SMTP prompt if creds are not required). - RubrikAutoTicketingv1-Settings.ps1
– Configure the variables in this script as per your environment, including Rubrik IP, email addresses, SMTP server, authorization, enabling/disabling email, CSV output, and auto on-demand snapshots. - RubrikAutoTicketingV1-xxxx.ps1
Configure the $ScriptDirectory variable at the start of each script (if you didn’t unzip to “C:\RubrikAutoTicketingv1\”) then schedule/run on the frequency required. Each script only reports on the object identified in the title, I.E VMwareVMs, NAS, SQL. If the Settings.ps1 file isn’t found the script cancels. If you want to override any setting from the global file then copy it to the individual script after the settings import section.
Here you can see a VM report:And a failure report on my physical host backups (due to the recent power outage in Boston where the host didn’t come back on):
Let me know if you found this useful and if you haven’t switched to Rubrik yet; don’t backup, go forward.
Joshua
Hey Joshua,
First off, Thanks for the code! This is something I was looking for. Second, is it possible to combine two different Rubrik cluster into one report? Also what about adding VM replication another Rubrik to the report? I know demanding, right? ;o)
I did find one issue with the RubrikAutoTicketingV1-VMwareVMs.ps1 script. The lines where you are parsing the date. It wasn’t working correctly. It appears it was cutting part of the time off.
Error:
Exception calling “ParseExact” with “3” argument(s): “String was not recognized as a valid DateTime.”
I just removed the -4 at the end of lines 208 and 212 and it’s working now.
OId:
$VMLatestSnapshot3 = $VMLatestSnapshot2.Substring(0,$VMLatestSnapshot2.Length-4)
New:
$VMLatestSnapshot3 = $VMLatestSnapshot2.Substring(0,$VMLatestSnapshot2.Length)
Now the script works without error and the dates / times show up in the report.
Thanks again,
Dustin
Thanks Dustin and both are good ideas. I’ll tackle the replication one first and I thought about multiple clusters, but decided the complexity wouldn’t be worth the effort considering you’d probably want 1 ticket per cluster.
As for the DateTime string. I’m confused as to why this change was needed or indeed worked. Please can you do me a favor and run it again, but output the below variables (before and after your change) to host and send them to me either in the comments or joshua@rubrik.com?
$VMLatestSnapshot1
$VMLatestSnapshot2
$VMLatestSnapshot3
$VMLatestSnapshot
I’m confused as to what the source date/time string being returned from the API is to mean that removing the last 4 characters breaks the conversion? In my Rubrik 4.1 cluster I have to do this for it to work. What version are you on? Thanks!
Having thought about it some more, now that I can’t be sure that I have to always have to trim 4 characters from the end of the date/time string, I’m going to make it more robust. My plan is to count the total characters above the first x needed, then trim that from the string. This should then compensate for whatever it is different in your environment.
Dustin, please can you try replacing the # Converting Snapshots section with the below and see if this also fixes the error?
##################################
# Converting Snapshots
##################################
# Converting latest into PowerShell datetime object, counting characters past 19 (required amount for conversion, subtracting the diff to ensure conversion works)
$VMLatestSnapshot2 = $VMLatestSnapshot1.Replace(“T”,” “).Replace(“Z”,” “).TrimEnd()
$VMSnapshotCharCount = $VMLatestSnapshot2 | Measure-Object -Character | Select -ExpandProperty Characters
$VMSnapshotCharSubtract = $VMSnapshotCharCount – 19
$VMLatestSnapshot3 = $VMLatestSnapshot2.Substring(0,$VMLatestSnapshot2.Length-$VMSnapshotCharSubtract)
$VMLatestSnapshot = ([datetime]::ParseExact($VMLatestSnapshot3,”yyyy-MM-dd HH:mm:ss”,$null))
# Converting oldest PowerShell datetime object, counting characters past 19 (required amount for conversion, subtracting the diff to ensure conversion works)
$VMOldestSnapshot2 = $VMOldestSnapshot1.Replace(“T”,” “).Replace(“Z”,” “).TrimEnd()
$VMOldestSnapshotCharCount = $VMOldestSnapshot2 | Measure-Object -Character | Select -ExpandProperty Characters
$VMOldestSnapshotCharSubtract = $VMOldestSnapshotCharCount – 19
$VMOldestSnapshot3 = $VMOldestSnapshot2.Substring(0,$VMOldestSnapshot2.Length-$VMOldestSnapshotCharSubtract)
$VMOldestSnapshot = ([datetime]::ParseExact($VMOldestSnapshot3,”yyyy-MM-dd HH:mm:ss”,$null))
I’ve added the above to the v1 scripts zip as a failsafe, with a few other bug fixes.
I am away this week. I will test the new code next week when I get back.
Thanks,
Dustin
Hey Joshua,
That new code works with no issues. Thanks!
Hi Joshua, thanks for this code, it’s brilliant.
I am wondering how I could change the Date format in the output (console or email) to the UK format.
Also, How could I tackle the problem, if I have an SLA that has a week in between snapshots, to show the “Out of SLA ” correctly? I am thinking to insert a check for “15d” in the SLA name as our daily SLAs all have this part in the name and then modify the $BusinessSLAInHours to 168 for those items.