Thursday, October 31, 2013

vSphere Snapshots Choke in Backup Exec, IDE vs SCSI

We've been fighting with Backup Exec 2012 and our vSphere environment for the last month or two. Most VMs would backup just fine but a few would randomly choke and the job would end with cryptic errors like:
Final error: 0xe0009578 - Unable to copy the virtual machine disk using the VMware VixDiskLib.
Final error category: Resource Errors
Followed by:
V-79-57344-38264 - Unable to copy the virtual machine disk using the VMware VixDiskLib. VixDiskLib_Read() reported the error: Unknown error
Ugh! Unknown error, how I hate thee! After much screwing around with different settings, migrations, updates, etc, etc I finally, and hesitantly called Symantec. In the end I was connected with Jason who is a first rate tech and obviously knows his stuff. Very grateful because I've not had good luck with Symantec support in the past.

He determined the errors were definitely not a Symantec problem. Backup Exec relies on the vCenter to snapshot the VM and then present the snapshot to BE to slurp up. In the end vCenter removes the snapshot and BE will eventually verify its data with vCenter at the end of the job. Debug logs from Backup Exec showed no indication of any problem. Everything seemed normal. But here's where we figured out something was amiss:
WARNING: "VMVCB::\\<machine path>.vmdk" is a corrupt file. This file cannot verify.
Ah. Everything is now pointing at vSphere. The data being presented by vCenter to BE looked corrupt to BE and it was reporting this in the Exceptions in the job log.

There was one machine that was corrupted every backup job and a few others that would randomly show up corrupted as well. The common factors were

  1. All were converted machines from an old Hyper-V environment
  2. All had IDE hard drives rather than SCSI
  3. All were important enough for me to be losing sleep over!
It didn't seem like IDE hard drives should cause this problem. It's all virtual so what does vSphere care? But to test it out I followed VMware's knowledge base article about editing the VMDK files to convert from IDE to SCSI. It's a simple if not hair-raising task. 

Then I ran a full backup of the virtual environment. This morning I came in to find one of the machines that I did not convert to SCSI choked. The VM that failed every time for the last month backed up flawlessly.

Now tonight I'll convert the rest over to SCSI and hopefully put this issue to bed finally!