Public bug reported:

Binary package hint: lvm2

We have a script that automates the creation and removal of a LVM
snapshots on our VMware servers. Three times now we have had machines go
down when the snapshot was removed. I have logs showing the machine
going down immediately after the snapshot removal script (triggered and
logged by our backup software).

This has occurred on both an HP Pavilion Desktop (uniprocessor, single
disk) and a Sun Sunfire V60X (SMP, md raid1).

The crash leaves the LV in an inconsistent state with device nodes and
snapshot names completely out of sync. On all occasions I have been able
to recover the volume by following the steps below:

Ubuntu 6.06 LTS, LVM Hard Crash repair
--------------------------------------

observe kernel oops.
perform hard reset.
machine comes back up with md2 array dirty, starting background reconstruction.
fails on mounting partitions,boots to single.
login at console.
mount /usr
vi /etc/fstab
comment out snapshotted lvm partition (/vmware)
exit. System boots to multi user.
open ssh shell to system

**** some info before we begin

[EMAIL PROTECTED]:~# lvscan
  ACTIVE            '/dev/vg_sys/lv_tmp' [4.00 GB] inherit
  ACTIVE            '/dev/vg_sys/lv_swap' [4.00 GB] inherit
  ACTIVE            '/dev/vg_sys/lv_var' [1.00 GB] inherit
  ACTIVE            '/dev/vg_sys/lv_usr' [1.00 GB] inherit
  inactive Original '/dev/vg_sys/lv_vmware' [52.00 GB] contiguous
  inactive Snapshot '/dev/vg_sys/lv_vmware_snap' [5.00 GB] inherit

[EMAIL PROTECTED]:~# pvscan
  PV /dev/md2   VG vg_sys   lvm2 [67.33 GB / 340.00 MB free]
  Total: 1 [67.33 GB] / in use: 1 [67.33 GB] / in no VG: 0 [0   ]

[EMAIL PROTECTED]:~# cat /proc/mdstat
Personalities : [raid1]
md2 : active raid1 sda3[0] sdb3[1]
      70605568 blocks [2/2] [UU]
      [=====>...............]  resync = 26.8% (18967232/70605568) 
finish=18.0min speed=47688K/sec

md1 : active raid1 sda2[0] sdb2[1]
      979840 blocks [2/2] [UU]

md0 : active raid1 sda1[0] sdb1[1]
      96256 blocks [2/2] [UU]

unused devices: <none>

[EMAIL PROTECTED]:~# uname -a
Linux anvil 2.6.15-27-server #1 SMP Sat Sep 16 02:57:21 UTC 2006 i686 GNU/Linux

[EMAIL PROTECTED]:~# ls /dev/mapper/
control  vg_sys-lv_swap  vg_sys-lv_tmp  vg_sys-lv_usr  vg_sys-lv_var  
vg_sys-lv_vmware  vg_sys-lv_vmware-real

[EMAIL PROTECTED]:~# ls /dev/vg_sys/
lv_swap  lv_tmp  lv_usr  lv_var

** lets repair the system

* create some missing device nodes
[EMAIL PROTECTED]:~# vgmknodes

* fix up the device mapper mess
[EMAIL PROTECTED]:~# mv /dev/mapper/vg_sys-lv_vmware 
/dev/mapper/vg_sys-lv_vmware_snap 
[EMAIL PROTECTED]:~# mv /dev/mapper/vg_sys-lv_vmware-real 
/dev/mapper/vg_sys-lv_vmware

* check that our fs still exists
[EMAIL PROTECTED]:~# fsck /dev/mapper/vg_sys-lv_vmware
fsck 1.38 (30-Jun-2005)
e2fsck 1.38 (30-Jun-2005)
/dev/mapper/vg_sys-lv_vmware: recovering journal
/dev/mapper/vg_sys-lv_vmware: clean, 75/6815744 files, 10826547/13631488 blocks

* remove the snapshot
[EMAIL PROTECTED]:~# lvremove /dev/vg_sys/lv_vmware_snap
  Logical volume "lv_vmware_snap" successfully removed

* renable vmware lvm partition
[EMAIL PROTECTED]:~# vi /etc/fstab
[EMAIL PROTECTED]:~# touch /forcefsck
[EMAIL PROTECTED]:~# reboot
** system fscks, and boots normally.

** Affects: lvm2 (Ubuntu)
     Importance: Undecided
         Status: Unconfirmed

-- 
LVM Snapshot removal causes intermittent kernel oops
https://launchpad.net/bugs/71567

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to