Public bug reported:

[Impact]

 * Users with disks/LUNs used for AIX operating system installations
   previously, which possibly undergone overwrites/corruption on the
   partition table, might hit kernel failures during partition scan
   of such disk/LUN, and possibly hang the system (seen with retries).

 * The Linux kernel should be robust to corrupted disk data, performing
   a better sanitization/checks and not failing.

 * The fix are a couple of simple logic changes to make the code
   of the AIX partition table parser more robust.

[Test Case]

 * Run the partition scan on the (trimmed) disk image of the AIX lun.
   (It's not provided here since it contains customer data), with this
   command:

   $ sudo losetup --find --show --partscan rlv_grkgld.1mb

 * On failure, the command hangs, and messages like these are printed
   to the console, depending on the kernel version (see tests below)

   [  270.506420] partition (null) (3 pp's found) is not contiguous

   [  270.597428] BUG: unable to handle kernel paging request at 
0000000000001000
   [  270.599525] IP: [<ffffffff81379d4d>] strnlen+0xd/0x40

 * On success, the command prints a loop device name, for example:

   /dev/loop0
 
[Regression Potential] 

 * Low. Both changes are simple improvements in logic.

 * This affects users which mount disks/LUNs from the AIX OS;
   it should only change behavior for users which relied on a
   uninitialized variables to work correctly during partition
   scan of those disks/LUNs which should be rare as the code
   is likely to fail as we observe in this scenario.
   
 * This has been tested on Cosmic, Bionic, Xenial, and Trusty.


[Other Info]

 * Patches will be sent to the kernel-team mailing list.

Bug Description:
---------------

We've recently received a disk image from an AIX LUN that when
attached on Linux displayed errors on console, then eventually
hung the system (specially if the SCSI bus was re-scanned, and
leading to another partition scan).

Apparently the LUN was originally installed with AIX and later
exercised with some I/O stress/overwrites which caused certain
bits to be wrong in just the right way for Linux to get a NULL
pointer and invalid data.

This is the test-case used ('--partscan' is the important bit).
  $ sudo losetup --show --find --partscan aix-lun.img

Since the original code is old, it affects several releases.
It's interesting to fix this on 14.04 and up, on which IBM
Power servers were initially supported (since they can run
AIX too, and possibly hit this due to an already used disk/LUN).

** Affects: linux (Ubuntu)
     Importance: Undecided
         Status: Incomplete

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1787281

Title:
  errors when scanning partition table of corrupted AIX disk

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1787281/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to