This morning I woke up to Diana saying “there was a power cut on the street last night due to the wind, but don’t worry it’s back on”. I immediately jumped out of bed because this kind of power cut sounds lengthy and my UPS doesn’t go beyond 30 minutes.
The Hyperconverged Home Lab 2.0 wasn’t in a happy state so it must’ve rebooted, but the ESXi hosts were online so I could triage the VMs. However, the 2 main ESXi hosts in The Hyperconverged Home Lab 1.0 were not in a good place. No ping, totally dead. FYI all lab hosts are on ESXi 6.5.0 build 5969303 as of this post.
Time to break out the IPMI/ILO to see what was going on. Both motherboards were powered on, but it was ESXi that was failing to boot with “Error loading /a.b00 Fatal error: 15 (Not found)” on both hosts:
Based on Google searches it suggested I’d created vSphere support bundles (never), might have been midway through a VUM upgrade (I wasn’t) and none related to ESXi 6.5. The other alternative was a corrupt USB stick, but 2 hosts complaining of the same exact file? That’s not a USB corruption, that’s a VMware bug.
I took the USB stick from one of the dead hosts, plugged it into my laptop and it has the A.B00 file mentioned. But, it’s only in 1 partition and not the other! So, I copied it across, rebooted the ESXi host and I see a new error:
Now ata_liba.v00 is missing? Something fishy going on here. I repeat the process to copy the file, and I get a 3rd error:
Sensing a trend here I copied the remaining ata_pata.v0x files. I tried another reboot and voila:
For the 2nd host I looked at the 2 folders side by side and you can see its missing the first 8 files (definitely not deleted by me):
I copied the following files from D to E:
And the 2nd ESXi host booted too:
Not sure what is going on with ESXi here but I’m happy to just have them running again. I plan on upgrading the entire lab to vSphere 6.7 this xmas so I hope to never see this issue again, but if I do I’ll be ready!
same problem occurred on vSphere 6.7