Check for consistency in the VHD metadata

Today we will have a look how to check VHD metadata consistency in VHD chains.



The following script will check for consistency in the VHD metadata. Big thanks to Keith Petley!


XenServer 5.0 and above


Run the following script

VGNAME=${1:?"No VG name supplied"}

lvs $VGNAME --noheading -olv_name,lv_attr | grep VHD | while read LV ATTR
echo "Checking $LV"
   ACTIVE=$(echo $ATTR | grep -c '^....a.')
   [ $ACTIVE -eq 0 ] && lvchange -ay $VGNAME/$LV
   vhd-util check -n /dev/$VGNAME/$LV
   [ $ACTIVE -eq 0 ] && lvchange -an $VGNAME/$LV


Corruption will fall into 4 categories

1. The primary footer is corrupt – seen as a message like the following

primary and backup footers do not match /dev/VG_XenStorage-429666ec-8d46-5f66-0f24-3ff4b5e6f93a/VHD-b95b1c36-5be2-4c03-9956-8823cb24ab90 appears invalid; dumping metadata VHD Footer Summary:

This suggests that some of the VMs data prior to the footer may have also been corrupted. If the VHD was inflated when this occurred then the corruption may only have affected the "empty" blocks that the VM hadn't written yet. If the VHD was deflated then the VMs data is probably corrupt. This may be worth salvaging, especially if the VHD is inflated. Look for corruption in the VHD which physically follows this one on the PV (the Physical Volume which holds the VG, more on this later).

2. The BAT shows corrupt entries

block 185984 (offset 0x242bd01) clobbers block 188032 (offset 0x242be01) /dev/VG_XenStorage-429666ec-8d46-5f66-0f24-3ff4b5e6f93a/VHD-ba92e768-3277-48d9-a365-eaecb37a4b67 appears invalid; dumping metadata VHD Footer Summary:

This VHD should be considered lost. Without the BAT the 2MB blocks are in a "random" order. Also the corruption is likely to have spread to the first 2MB block. Most VMs write their filesystem metadata at the front of the disk, which will probably be the first 2MB extent as it is the first thing they write, so recovering the VHD will still not get the data back. Look for corruption in the VHD which preceds this one on the PV

3. vhd-util doesn't recognise the VHD format at all

The corruption has written over the entire VHD, destroying all VHD formatting. This VHD is completely lost – look for corruption both before and after this VHD on the PV

4. vhd-util check finds nothing wrong with the VHD metadata

The VM should run VM specific tests on the VDI (eg fdisk) after the SR is reattached. Pay special attention to any VHD where the underlying LV is made from more than one extent. Only the first and last extents will contain VHD metadata, those in the middle only VM data. If a "middle" extent is next to a corrupt VHD on the PV it too should be considered suspect.

Add comment

Security code