VMWare ESX: How to recover your VMFS partition table

You might wake up for a bad day after a power outage or a storage failure. You thought it was over, after you had succeeded to bring all your machines back up and running. In few minutes you found out that some of your virtual machines are missing. After a small investigation you found out that your VMware Datastores (VMFS) are empty when you try to browse them from VMware vCenter or VI client. If you face this problem there is a big chance that your VMFS Partition table for these LUNs or disks are missing.

To Check Your VMware ESX server VMFS Partition table follow the following procedure:

1- Connect to the VMware ESX server where is the missing datastore (VMFS) was connected using SSH. Make sure you have a root access.

2- Run the following command to find out your SAN devices: esxcfg-vmhbadevs

The output will look something like below:
vmhba0:0:0 /dev/cciss/c0d0
vmhba1:0:1 /dev/sda
vmhba1:0:2 /dev/sdb
vmhba1:4:2 /dev/sdc

3- If you know the SAN device that is holding the missing datastore (VMFS) then run the following command on that device to check its partition table, else run it on all the devices and check them one by one. (Hint: The command to show the partition table for all the devices is ‘fdisk -lu’)

fdisk -lu /dev/sda <== run this if you know that sda is the device holding the missing datastore (VMFS)

Output should look something like below for a LUN with the VMFS Partition table is missing:

Disk /dev/sda: 322.1 GB, 322122547200 bytes
255 heads, 63 sectors/track, 39162 cylinders, total 629145600 sectors
Units = sectors of 1 * 512 = 512 bytes
Disk /dev/sda doesn’t contain a valid partition table

or it could look something like below on some versions

Disk /dev/sde: 214.7 GB, 214748364800 bytes
255 heads, 63 sectors/track, 26108 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Device Boot Start End Blocks Id

System Where a normal working entry with a valid partition table will look something like below:
Disk /dev/sdb: 16.1 GB, 16106127360 bytes
255 heads, 63 sectors/track, 1958 cylinders, total 31457280 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sdb1 128 31455269 15727571 fb Unknown <== Its your partition table entry Notice

in the last example the line of the partition table:

‘/dev/sdb1 128 31455269 15727571 fb Unknown’ <== This means the partition table exist.

If you have figured out that your VMFS partition table is missing then follow the below steps, else if your partition table exist just as in the last sample then this is not the solution for your case. If you found out this is the case, and you have VMware support I highly recommend you call them to help you recovering your partition table. As any mistake with this procedure provided can get you to loose your data permanently. If your only option is to recover on your own then the below procedure should do the trick for you as I had tried it 3 time before :).

VMware ESX VMFS Recovery Procedure steps: After you had found out that the affected device is /dev/sda from the procedure above, now its time to fix it. The procedure below assume /dev/sda is the defective device, please make sure to replace that with what ever device is failing in your environement when executing the below commands. As well make sure you are connected to ssh as a root. and run the below procedure. Entered commands are marked in red.

[root@vmwaretest vmhba2]# fdisk /dev/sda <== To start the fdisk (partitioning utility)

Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won’t be recoverable. The number of cylinders for this disk is set to 39162. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help) n <== add a new partition

Command action

e extended

p primary partition (1-4)

Partition number (1-4): 1

First cylinder (1-39162, default 1): Hit Enter <== Take default

Using default value 1

Last cylinder or +size or +sizeM or +sizeK (1-39162, default 39162): Hit Entert <== Take default

Using default value 39162

Command (m for help): t Change a partition type

Selected partition 1

Hex code (type L to list codes): fb <== VMFS partiton Type

Changed system type of partition 1 to fb (Unknown)

Command (m for help): x <== Expert mode

Expert command (m for help): b <== Move beginning of data in a partition

Partition number (1-4): 1

New beginning of data (63-629137529, default 63): 128 <== The partition offset used for VMFS

Expert command (m for help): w <== Write table to disk and exit

The partition table has been altered!

Wohooo if that worked for you, then are almost done. To check out your work follow the below steps.

In VMware vCenter go to Configuration -> Storage (SCSI, SAN and NFS) and hit refresh If all went well, the storage volume should reappear and the data should be accessable.

Yay, you are done. I hope not many of you face this problem, though this is your savior if you do. As I had seen this recovery steps documented no where else on the web when I had the problem for the first time. I had promised my self to put a post about it for a while, but I had been slacking off. Here it’s finally posted. Please let me know if this method helped you. In addition, let me know if you had any problem with it. Thanks.

March 23, 2009

Eiad Al-Aqqad

Server Virtualization

VMware VI3

71 responses to “VMWare ESX: How to recover your VMFS partition table”

philler says:

August 13, 2009 at 5:01 pm

works like a charm…lifesaver.
PJ says:

August 19, 2009 at 5:49 pm

I had this exact issue happen with SAN storage LUNs showing invalid partition tables. This worked perfectly. Thanks.
DB says:

August 26, 2009 at 10:58 pm

Hey, that is great, I will remember this when I run into this problem. I have one that is hopefully easier. I have a VMFS volume that is missing an extent because the LUN contained it was deleted by a storage admin in error. The LUN and extent did not contain any data as it was brand new. It still shows up in our VMFS table of extents as bad or missing. I would like to remove it from the list. Is there any way that I can do this?
Jack in de box stift says:

September 2, 2009 at 9:59 am

This is awesome man. You saved us today. Now i can go home and eat chips and drink beer instead of pizza at work. This just saves me a headache, a lot of work and my marriage 🙂
Eiad says:

September 4, 2009 at 8:57 am

I am glad I helped you, but I hope the Pizza store next to you work will not sue me :).
guy says:

September 21, 2009 at 6:42 am

You made my day. damn data ‘s b…b..back!
wohoo!
Ray says:

October 6, 2009 at 8:23 am

Well, I have no idea if this is going to work. All I know is that I have to reboot now, and see if many hours of work lie ahead of me or not.
I have not seen any instantaneous results, but an error instead (error 16) Device or resource busy.
The kernal will still use the old table
The new table will be used at next reboot
syncing disks.

here goes

PS: don’t email me,as our mail server is on a VM on this system. 🙁
Ray says:

October 6, 2009 at 8:33 am

Rebooted now…
System came back for the moment (actual hw is dying I believe) want to get the VM’s off it ASAP.
Still seems to have corruption – which is the machine install = getting past that.

Nope, it’s frakked. hm. Not sure if the VM’s exist anymore after following the instructions, but if they are – anyone got any better ideas to post? I just want to get the VM’s off this unstable system prior to blowing it away.

There goes MS Exchange, 1x DC, Print Svr & Web Svr… 🙁
eiad says:

October 9, 2009 at 2:35 am

Hi Ray,

I am sorry that I did not see your comment as I was busy in a project. I hope stuff worked out well for you. Please post back if you still have the same problem & at least mention how did you get the VMs to disappear? If you have a valid vmware support you might want to give them a call as well, as recovering VMware VMFS partition is usually kept to be one of their secrets :). If you are running on an active-Passive storage make sure you are not in a Trespassing situation & try to get your LUNs to where they originally were when the LUNs has disappeared.

I am sorry I can’t give you a real sound advice at the moment, as you have not provided any info about your exact setup neither how did you end up with the problem. Not even the error you are getting.

If you post back your result that will be appreciated.
Costi says:

November 15, 2009 at 6:04 pm

Man….Thanx a lot….
You saved my life too…..
We love you !!
Eiad says:

November 17, 2009 at 2:28 am

Hi Costi,

I am glad it helped, and I am still waiting for my free lunch. Just kidding!!

Enjoy,
Eiad
stef says:

December 17, 2009 at 1:28 pm

hi,

thanks! it worked! on esx 4 you have to use “esxcfg-scsidevs -c” instead of “esxcfg-vmhbadevs”.

greetz
stef
Harry says:

December 29, 2009 at 4:22 pm

Hi Eiad,
worked, saved the day 🙂 Thanx
admin says:

December 30, 2009 at 3:28 pm

Hi Harry,

I am glad it did save your day.

Eiad
clem says:

January 3, 2010 at 4:38 am

Hi Brother,
from Cali, Colombia…you save my life and my company’s support contract…
admin says:

January 5, 2010 at 1:38 pm

Hi Clem,

I am glad I did & appreciate you taking the time to share it as that spirit me to share more & more.
Seb says:

January 7, 2010 at 5:41 pm

Thank you guy. Work like a charm. You save my night. Thanks…
Gianmaria Righele says:

February 28, 2010 at 10:21 am

I can’t believe, but it worked
I must assumed therefore VMware has something missing: an emergenzy option.
Many thanks from S. Bertilla Parish
Eiad says:

February 28, 2010 at 12:04 pm

Hi Bertilla,

I am glad it helped. I feel quite good saving one more person job :).

Enjoy,
Eiad
Erwin says:

April 15, 2010 at 9:28 pm

Thank you for providing this information. I was at my wits end when disaster stroke and your information helped me fix the problem. For ESX 4.0, the procedure is slightly different, by the way. Documented below for those in need…

[root@esxhost000001] esxcfg-scsidevs -l | grep 1108
naa.60060e8005643b000000643b00001108
Display Name: HP Fibre Channel Disk (naa.60060e8005643b000000643b00001108)
Devfs Path: /vmfs/devices/disks/naa.60060e8005643b000000643b00001108
vml.02006c000060060e8005643b000000643b000011084f50454e2d56
[root@s02esxhyp000001 sbin]# fdisk /vmfs/devices/disks/naa.60060e8005643b000000643b00001108
last_lba(): I don’t know how to handle files with mode 8180

The number of cylinders for this disk is set to 37596.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)

Command (m for help): p

Disk /vmfs/devices/disks/naa.60060e8005643b000000643b00001108: 309.2 GB, 309237841920 bytes
255 heads, 63 sectors/track, 37596 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System

Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-37596, default 1):
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-37596, default 37596):
Using default value 37596

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): fb
Changed system type of partition 1 to fb (VMware VMFS)

Command (m for help): x

Expert command (m for help): b
Partition number (1-4): 1
New beginning of data (63-603979739, default 63): 128

Expert command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.
Armin says:

May 16, 2010 at 5:19 pm

Thank You, my adrenalin output is back on normal level! Before I asked myself how to tell it to my customer…

In my case I clicked on Datastore > remove a datastore and didn’t mention that this removes the datastore from all Hosts on this LUN, I thought it will remove only the actual host.
Or in German: Datenspeicher entfernen entfernt den Zugriff aller Hosts auf die LUN.

P.S.: I needed to reboot one of the SAN connected ESX 3.5 Hosts before I can see the datastore again.

Armin
Mark says:

May 20, 2010 at 10:56 am

Will this work for ESXi 3.5?
admin says:

June 3, 2010 at 6:10 pm

Hi Mark,

To be honest I have not tried it on ESXi, but it should work if you try to run these from the unsupported mode.

Though as it say its unsupported, I would contact VMware support to execute it on your own behalf :).

Enjoy,
Eiad
Nika says:

July 7, 2010 at 11:39 am

Dude
U R a true genius. thanks a lot. U just added min 3 people to the “Saved By Me” list. You Rock. Thanks a lot from Georgia.
admin says:

July 17, 2010 at 5:56 am

Hi Nika,

I am glad that my “Saved by Me” list has grown. Welcome to the list & glad I was able to help.

I have been quite lately, but I have few tough reasons behind it. Though I am doing my best to keep up good posts coming.

Enjoy,
Eiad
Jaime says:

July 18, 2010 at 9:16 pm

Hi. You save my life yesterday. It works! Thanks a lot.
admin says:

July 19, 2010 at 1:28 pm

Hi Jaime,

You and other readers of my blog make me feel just great being able to help out.

Enjoy,
Eiad
saif says:

July 20, 2010 at 3:18 am

Excellent article.
“Good work guys :Eiad & Erwin”.

i have a question, why did it lost the partation and how we can avoide this in future.

please advise.

saif.
Eiad says:

July 23, 2010 at 11:47 am

Hi Saif,

Unfortunately the issue with VMFS partition table can be caused by many things, but from my own experience I have noticed its mostly caused by a problem with the storage configuration or a storage outage. I have seen it when ever a customer face a power outage & their storage & VMware Servers shutdown suddenly in improper manner. I have seen it as well, when ever an Active passive storage has been setup incorrectly causing a lot of path Thrashing. Another case I have met was one of my customers had changed the VMware multi-path configuration to fixed, where his storage is an EMC Clarion (Active/Passive storage) where MRU would have been the proper multi-path for their setup.

If you have not faced a power or storage problem lately, I would definitely check out the storage configuration & design.

I hope this help,
Eiad
Manish says:

December 6, 2010 at 2:32 am

thanks man this solution worked for me…
admin says:

December 13, 2010 at 4:07 pm

Hi Manish,

You are welcome.
Jason says:

December 17, 2010 at 8:47 am

Phew – such a life saver!

Thanks.
mgnu says:

January 20, 2011 at 9:45 am

Thank you very much!
Alex says:

January 20, 2011 at 10:53 am

Awesome article, saved me today.

Thanks so much!
admin says:

January 29, 2011 at 4:27 pm

You welcome Alex,
Eiad
admin says:

January 29, 2011 at 4:28 pm

You welcome Mgnu,
Eiad
admin says:

January 29, 2011 at 4:35 pm

Hi Jason,

Its just a great feeling when you know you have saved a life. Thanks for your feedback :).

Enjoy,
Eiad
John Cheng says:

February 5, 2011 at 1:30 pm

FWIW here’s a one liner to set the partition type to fb:

echo -e “n\np\n1\n\n\nt\nfb\nw” | fdisk /dev/sdb

here’s one adjusted (not tested!) to the recovery procedure described in the post:

echo -e “x\nb\n1\n128\nw” | fdisk /dev/sdb
John Cheng says:

February 5, 2011 at 1:39 pm

Another thing I wonder – if this happens, what is the corresponding log entry in /var/log/messages? This would also make a good nagios check too.
Randal says:

February 16, 2011 at 11:53 pm

Wow you pulled my bacon out of the fire with this.

I’ll verify it works in ESX 4.1 as well.

Now I have to go change my shorts.

And get Symantec Backup Exec 2010 running to back up Virtual Machine files.

Thx.
Eiad says:

February 23, 2011 at 3:15 pm

Hi Randal,

I am glad it was able to help & this is why this is blog is here for.

Haha, I know how it feel when your bacon is near the fire :).

Enjoy,
Eiad
Marek says:

April 16, 2011 at 9:04 am

wow! mann, you’ve saved my life! it worked, thanks!
admin says:

April 17, 2011 at 1:26 pm

Glad I was able to help Marek.

Regards,
Eiad
webvirgin says:

April 30, 2011 at 3:52 pm

After I try this solution, I went down to my knees (really!) and thanked god for helping me to send me to Eiad!

You saved my life also!

All my best wishes are with you,
Thank you!

Best Regards,
webvirgin!
Harry says:

May 3, 2011 at 2:25 pm

Hi,

Harry again 🙂
this saved another company’s ESXi farm today … so it can be done on ESXi machines (this was 4.1) – all you need to do is to enable the Remote TechSupport (SSH) service under Configuration/Security profile
Never expected I would need this TWICE in so short time interval …
Eiad says:

May 5, 2011 at 5:29 am

Hi Webvirgin,

Your kind words have made my heart melt, & encouraged me to keep blogging :).

Regards,
Eiad
Patrick says:

May 11, 2011 at 6:30 pm

Hi,

thanks a bunch for your post,
it saved an entire company this evening.

For future reference we saved a copy of every LUNs partition table on an usb stick with
sfdisk /dev/sdX -d > /mnt/usb/sdX-part.bak

So the next failure will be much less tricky.

Thanks again and best regards,
Patrick
Noel Moss says:

May 26, 2011 at 8:31 pm

This procedure saved my butt. Well done, sir.

Now I have to figure oput how the partition tables got clobbered in the first place.
Guenter says:

September 4, 2011 at 1:41 pm

Had this problem today. Only wanted to remove an iscsi device from the inventory und used delete instead. I wonder why there was no warning like “destroying all data – are you really sure?”.

Anyway – you fix even works with ESXi 5.0.0

If you ever come to germany, there’s a box of wine waiting for you !

Thanks a lot.
Alex says:

September 10, 2011 at 4:56 pm

Hi,

thanks a lot. That was a lifesaver! God bless you! A desperate move, but it worked like a charm.

Alex

Virtualization Team

VMWare ESX: How to recover your VMFS partition table

71 responses to “VMWare ESX: How to recover your VMFS partition table”

Leave a Reply