[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

ceg Sun, 25 Apr 2010 02:45:57 -0700

** Description changed:

  Re-attaching parts of an array that have been running degraded
  separately and contain the same amount and conflicting
  changes, results in the assembly of a corrupt array.
  
  ----
  Using the latest beta-2 server ISO and following 
http://testcases.qa.ubuntu.com/Install/ServerRAID1
  
  Booting out of sync RAID1 array fails with ext3: It comes up as synced,
  but is corrupted.
  
       (According to comment #18: ext3 vs ext4 seems to be mere
  happenstance.)
  
  Steps to reproduce:
  
  1. in a kvm virtual machine, using 2 virtio qcow2 disks each 1768M in size, 
768M ram and 2 VCPUs, in the installer I create the md devices:
  /dev/md0: 1.5G, ext3, /
  /dev/md1: ~350M, swap
  
  Choose to boot in degraded mode. All other installer options are
  defaults
  
  2. reboot into Lucid install and check /proc/mdstat: ok, both disks show
  up and are in sync
  
  3. shutdown VM. remove 2nd disk, power on the VM and check /proc/mdstat:
  ok, boots degraded and mdstat shows the disk
  
  4. shutdown VM. reconnect 2nd disk and remove 1st disk, power on the VM
  and check /proc/mdstat: ok, boots degraded and mdstat shows the disk
  
  5. shutdown VM. reconnect 1st disk (so now both disks are connected, but
  out of sync), power on the VM
  
  Expected results:
  At this point it should boot degraded with /proc/mdstat showing it is syncing 
(recovering). This is how it works with ext4. Note that in the past one would 
have to 'sudo mdadm -a /dev/md0 /dev/MISSING-DEVICE' before syncing would 
occur. This no longer seems to be required.
  
  Actual results:
  Array comes up with both disks in the array and in sync.
  
  Sometimes there are error messages saying there are disk errors, and the
  boot continues to login, but root is mounted readonly and /proc/mdstat
  shows we are in sync.
  
  Sometimes fsck notices this and complains a *lot*:
  /dev/md0 contains a filesystem with errors
  Duplicate or bad block in use
  Multiply-claimed block(s) in inode...
  ...
  /dev/md0: File /var/log/boot.log (inode #68710, mod time Wed Apr  7 11:35:59 
2010) has multiply-claimed block(s), shared with 1 file(s):
   /dev/md0:     /var/log/udev (inode #69925, mod time Wed Apr  7 11:35:59 2010)
  /dev/md0:
  /dev/mdo0: UNEXPECTED CONSISTENCY; RUN fsck MANUALLY.
  
  The boot loops infinitely on this because the mountall reports that fsck
  terminated with status 4, then reports that '/' is a filesystem with
  errors, then tries again (and again, and again).
  
  See:
  http://iso.qa.ubuntu.com/qatracker/result/3918/286
  
  I filed this against 'linux'; please adjust as necessary.
  
  -----
  
- Fixing this would support safe hot-pluggable segmentation of arrays:
- (arrays are only --run degraded manually or if required and incomplete
- during boot)
+ Fixing:
  
- * --incremental should stop auto re-adding "removed" members (so
-   that --remove provides a manual means turn hot-plugging off)
- * When arrays are --run degraded missing members should be marked
-   "failed" but not "removed".
- * Always check for conflicting "failed" states in
-   superblocks, to detect conflicting changes.
-    + always report (console and --monitor event) if conflicting changes
-      are detected
-    + require --force with --add for a manual re-sync of conflicting
-      changes (unlike with resyncing an outdated device, in this case
-      changes will get lost)
- * To facilitate inspection --incremental should assemble array
-   components with conflicting changes into auxiliary devices with
-   mangled UUIDs (safe and easy diffing, merging, etc. even on desktop
-   level)
+ * When assembling, mdadm could check for conflicting "failed" states in the
+   superblocks of members to detect conflicting changes. On conflicts, i.e. if 
an
+   additional member claims an allready running member has failed:
+    + that member should not be added to the array
+    + report (console and --monitor event) that an alternative
+      version with conflicting changes has been detected "mdadm: not
+      re-adding /dev/≤member> to /dev/≤array> because constitutes an
+      alternative version containing conflicting changes"
+    + require and support --force with --add for manual re-syncing of
+      alternative versions (because unlike with re-syncing outdated
+      devices/versions, in this case changes will get lost).
+ 
+ Enhancement 1)
+   To facilitate easy inspection of alternative versions (i.e. for safe and
+   easy diffing, merging, etc.) --incremental could assemble array
+   components that contain alternative versions into temporary
+   auxiliary devices. 
+   (would require temporarily mangling the fs UUID to ensure there are no
+   duplicates in the system)
+ 
+ Enhancement 2)
+   Those that want to be able to disable hot-plugging of
+   segments with conflicting changes/alternative versions (after an
+   incidence with multiple versions connected at the same time occured)
+   will need some additional enhancements:
+    + A way to mark some raid members (segments) as containing
+      known alternative versions, and to mark them as such when an
+      incident occurs in which they come up after another
+      segment of the array is already running degraded.
+      (possibly a superblock marking itself as failed)
+    + An option like
+      "AUTO -SINGLE_SEGMENTS_WITH_KNOWN_ALTERNATIVE_VERSIONS"
+      to disable hotplug support for alternative versions once they came
+      up after some other version and got marked as containig an alternative 
version.


** Description changed:

  Re-attaching parts of an array that have been running degraded
  separately and contain the same amount and conflicting
- changes, results in the assembly of a corrupt array.
+ changes or use a write intent bitmap, results in the assembly of a corrupt 
array.
  
  ----
  Using the latest beta-2 server ISO and following 
http://testcases.qa.ubuntu.com/Install/ServerRAID1
  
  Booting out of sync RAID1 array fails with ext3: It comes up as synced,
  but is corrupted.
  
       (According to comment #18: ext3 vs ext4 seems to be mere
  happenstance.)
  
  Steps to reproduce:
  
  1. in a kvm virtual machine, using 2 virtio qcow2 disks each 1768M in size, 
768M ram and 2 VCPUs, in the installer I create the md devices:
  /dev/md0: 1.5G, ext3, /
  /dev/md1: ~350M, swap
  
  Choose to boot in degraded mode. All other installer options are
  defaults
  
  2. reboot into Lucid install and check /proc/mdstat: ok, both disks show
  up and are in sync
  
  3. shutdown VM. remove 2nd disk, power on the VM and check /proc/mdstat:
  ok, boots degraded and mdstat shows the disk
  
  4. shutdown VM. reconnect 2nd disk and remove 1st disk, power on the VM
  and check /proc/mdstat: ok, boots degraded and mdstat shows the disk
  
  5. shutdown VM. reconnect 1st disk (so now both disks are connected, but
  out of sync), power on the VM
  
  Expected results:
  At this point it should boot degraded with /proc/mdstat showing it is syncing 
(recovering). This is how it works with ext4. Note that in the past one would 
have to 'sudo mdadm -a /dev/md0 /dev/MISSING-DEVICE' before syncing would 
occur. This no longer seems to be required.
  
  Actual results:
  Array comes up with both disks in the array and in sync.
  
  Sometimes there are error messages saying there are disk errors, and the
  boot continues to login, but root is mounted readonly and /proc/mdstat
  shows we are in sync.
  
  Sometimes fsck notices this and complains a *lot*:
  /dev/md0 contains a filesystem with errors
  Duplicate or bad block in use
  Multiply-claimed block(s) in inode...
  ...
  /dev/md0: File /var/log/boot.log (inode #68710, mod time Wed Apr  7 11:35:59 
2010) has multiply-claimed block(s), shared with 1 file(s):
   /dev/md0:     /var/log/udev (inode #69925, mod time Wed Apr  7 11:35:59 2010)
  /dev/md0:
  /dev/mdo0: UNEXPECTED CONSISTENCY; RUN fsck MANUALLY.
  
  The boot loops infinitely on this because the mountall reports that fsck
  terminated with status 4, then reports that '/' is a filesystem with
  errors, then tries again (and again, and again).
  
  See:
  http://iso.qa.ubuntu.com/qatracker/result/3918/286
  
  I filed this against 'linux'; please adjust as necessary.
  
  -----
  
  Fixing:
  
  * When assembling, mdadm could check for conflicting "failed" states in the
-   superblocks of members to detect conflicting changes. On conflicts, i.e. if 
an
-   additional member claims an allready running member has failed:
-    + that member should not be added to the array
-    + report (console and --monitor event) that an alternative
-      version with conflicting changes has been detected "mdadm: not
-      re-adding /dev/≤member> to /dev/≤array> because constitutes an
-      alternative version containing conflicting changes"
-    + require and support --force with --add for manual re-syncing of
-      alternative versions (because unlike with re-syncing outdated
-      devices/versions, in this case changes will get lost).
+   superblocks of members to detect conflicting changes. On conflicts, i.e. if 
an
+   additional member claims an allready running member has failed:
+    + that member should not be added to the array
+    + report (console and --monitor event) that an alternative
+      version with conflicting changes has been detected "mdadm: not
+      re-adding /dev/≤member> to /dev/≤array> because constitutes an
+      alternative version containing conflicting changes"
+    + require and support --force with --add for manual re-syncing of
+      alternative versions (because unlike with re-syncing outdated
+      devices/versions, in this case changes will get lost).
  
  Enhancement 1)
-   To facilitate easy inspection of alternative versions (i.e. for safe and
-   easy diffing, merging, etc.) --incremental could assemble array
-   components that contain alternative versions into temporary
-   auxiliary devices. 
-   (would require temporarily mangling the fs UUID to ensure there are no
-   duplicates in the system)
+   To facilitate easy inspection of alternative versions (i.e. for safe and
+   easy diffing, merging, etc.) --incremental could assemble array
+   components that contain alternative versions into temporary
+   auxiliary devices.
+   (would require temporarily mangling the fs UUID to ensure there are no
+   duplicates in the system)
  
  Enhancement 2)
-   Those that want to be able to disable hot-plugging of
-   segments with conflicting changes/alternative versions (after an
-   incidence with multiple versions connected at the same time occured)
-   will need some additional enhancements:
-    + A way to mark some raid members (segments) as containing
-      known alternative versions, and to mark them as such when an
-      incident occurs in which they come up after another
-      segment of the array is already running degraded.
-      (possibly a superblock marking itself as failed)
-    + An option like
-      "AUTO -SINGLE_SEGMENTS_WITH_KNOWN_ALTERNATIVE_VERSIONS"
-      to disable hotplug support for alternative versions once they came
-      up after some other version and got marked as containig an alternative 
version.
+   Those that want to be able to disable hot-plugging of
+   segments with conflicting changes/alternative versions (after an
+   incidence with multiple versions connected at the same time occured)
+   will need some additional enhancements:
+    + A way to mark some raid members (segments) as containing
+      known alternative versions, and to mark them as such when an
+      incident occurs in which they come up after another
+      segment of the array is already running degraded.
+      (possibly a superblock marking itself as failed)
+    + An option like
+      "AUTO -SINGLE_SEGMENTS_WITH_KNOWN_ALTERNATIVE_VERSIONS"
+      to disable hotplug support for alternative versions once they came
+      up after some other version and got marked as containig an alternative 
version.

-- 
array with conflicting changes is assembled with data corruption/silent loss
https://bugs.launchpad.net/bugs/557429
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

Reply via email to