Test on Xenial

Before:

Last login: Wed Aug 15 19:29:38 2018 from 192.168.122.1
$ uname -a                    
Linux bionic 4.4.0-133-generic #159-Ubuntu SMP Fri Aug 10 07:31:43 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux
$ echo 9 | sudo tee /proc/sys/kernel/printk                                     
         
9                                                                               
                         
$ sudo losetup --find --show --partscan rlv_grkgld.1mb     
<hung>
[  309.251832] partition (null) (21 pp's found) is not contiguous
[  309.254557] partition (null) (2 pp's found) is not contiguous              
[  309.256893] partition (null) (1 pp's found) is not contiguous
...
[  309.335870] partition (null) (2 pp's found) is not contiguous
[  309.338133] partition (null) (64 pp's found) is not contiguous
[  309.339719] partition (null) (1 pp's found) is not contiguous
[  309.345218] BUG: unable to handle kernel paging request at 0000000000001040
[  309.347776] IP: [<ffffffff8140bb29>] strnlen+0x9/0x40        
[  309.349813] PGD 0                                             
[  309.350719] Oops: 0000 [#1] SMP                              
[  309.351987] Modules linked in: isofs kvm_intel kvm irqbypass input_leds 
joydev serio_raw sch_fq_codel
ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp 
libiscsi scsi_transport_i
scsi ip_tables x_tables autofs4 btrfs raid10 raid456 async_raid6_recov 
async_memcpy async_pq async_xor as
ync_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear psmouse floppy
[  309.367248] CPU: 1 PID: 979 Comm: losetup Not tainted 4.4.0-133-generic 
#159-Ubuntu
[  309.369461] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1ubuntu1 04/01/2014
[  309.372403] task: ffff880039a78000 ti: ffff88003b2f0000 task.ti: 
ffff88003b2f0000
[  309.375267] RIP: 0010:[<ffffffff8140bb29>]  [<ffffffff8140bb29>] 
strnlen+0x9/0x40
[  309.377806] RSP: 0018:ffff88003b2f3830  EFLAGS: 00010086     
[  309.379278] RAX: ffffffff81cd2b01 RBX: ffffffff8211eb6c RCX: 0000000000000000
[  309.381360] RDX: 0000000000001040 RSI: ffffffffffffffff RDI: 0000000000001040
[  309.384074] RBP: ffff88003b2f3830 R08: 000000000000ffff R09: 000000000000ffff
[  309.385894] R10: ffff88003b2f39e8 R11: 0000000000000251 R12: 0000000000001040
[  309.387906] R13: ffffffff8211ef40 R14: 00000000ffffffff R15: 0000000000000000
[  309.389937] FS:  00007f7e0163c740(0000) GS:ffff88003fc80000(0000) 
knlGS:0000000000000000
[  309.392496] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  309.394040] CR2: 0000000000001040 CR3: 0000000039a48000 CR4: 0000000000000670
[  309.396505] Stack:                                           
[  309.397306]  ffff88003b2f3868 ffffffff8140d8cb ffffffff8211eb6c 
ffffffff8211ef40
[  309.400370]  ffff88003b2f3980 ffffffff81d00816 ffffffff81d00816 
ffff88003b2f38d8
[  309.403210]  ffffffff8140f385 0000000000000092 0000000000000092 
000000000000000c
[  309.406231] Call Trace:                                       
[  309.407114]  [<ffffffff8140d8cb>] string.isra.4+0x3b/0xd0                    
      
[  309.408818]  [<ffffffff8140f385>] vsnprintf+0x135/0x530                      
                     
[  309.410869]  [<ffffffff8140f78d>] vscnprintf+0xd/0x30                        
    
[  309.412283]  [<ffffffff810dd5e3>] vprintk_emit+0x123/0x520                   
    
[  309.413983]  [<ffffffff810ddb69>] vprintk_default+0x29/0x40  
[  309.415495]  [<ffffffff811933c0>] printk+0x5a/0x76                           
[  309.417463]  [<ffffffff813e9571>] aix_partition+0x5d1/0x600                  
[  309.419595]  [<ffffffff811956ad>] ? pagecache_get_page+0x2d/0x1c0            
[  309.421218]  [<ffffffff813ecefb>] msdos_partition+0x86b/0x880                
[  309.423173]  [<ffffffff81252ef0>] ? blkdev_readpages+0x20/0x20               
[  309.424963]  [<ffffffff81195a39>] ? read_cache_page+0x19/0x20                
           
[  309.426515]  [<ffffffff813e6ced>] ? read_dev_sector+0x2d/0x90
[  309.428280]  [<ffffffff8140f819>] ? snprintf+0x49/0x60                       
[  309.429707]  [<ffffffff813ec690>] ? parse_solaris_x86+0x210/0x210
[  309.432081]  [<ffffffff813e80d0>] check_partition+0x140/0x220                
   
[  309.433845]  [<ffffffff813e75cb>] rescan_partitions+0xbb/0x2b0               
   
[  309.435421]  [<ffffffff8135118e>] ? security_capable+0x4e/0x70               
   
[  309.437006]  [<ffffffff813e2545>] __blkdev_reread_part+0x65/0x70
[  309.438598]  [<ffffffff813e2573>] blkdev_reread_part+0x23/0x40
[  309.440173]  [<ffffffff81591a98>] loop_reread_partitions+0x28/0x50
[  309.442811]  [<ffffffff81591e5a>] loop_set_status+0x39a/0x3d0
[  309.444381]  [<ffffffff81592150>] loop_set_status64+0x50/0x70
[  309.446146]  [<ffffffff815934d1>] lo_ioctl+0xf1/0x8b0        
[  309.447547]  [<ffffffff813e2dde>] blkdev_ioctl+0x25e/0x910    
[  309.449056]  [<ffffffff81235663>] ? __fd_install+0x33/0xe0   
[  309.451395]  [<ffffffff81253a7d>] block_ioctl+0x3d/0x50          
[  309.453275]  [<ffffffff8122aaff>] do_vfs_ioctl+0x2af/0x4b0    
[  309.455019]  [<ffffffff812267e4>] ? putname+0x54/0x60         
[  309.456434]  [<ffffffff81215d0f>] ? do_sys_open+0x1bf/0x2a0  
[  309.457943]  [<ffffffff8122ad79>] SyS_ioctl+0x79/0x90        
[  309.459886]  [<ffffffff8185424e>] entry_SYSCALL_64_fastpath+0x22/0xc1
[  309.462722] Code: 00 00 80 3f 00 55 48 89 e5 74 11 48 89 f8 48 83 c0 01 80 
38 00 75 f7 48 29 f8 5d c3
31 c0 5d c3 66 90 55 48 85 f6 48 89 e5 74 2e <80> 3f 00 74 29 48 8d 57 01 48 8d 
04 37 eb 0d 48 8d 4a 01 8
0 79                                                             
[  309.475971] RIP  [<ffffffff8140bb29>] strnlen+0x9/0x40        
[  309.477537]  RSP <ffff88003b2f3830>                             
[  309.478624] CR2: 0000000000001040                             
[  309.479703] ---[ end trace 4fe99281b613609c ]---


After:

$ uname -a
Linux bionic 4.4.0-133-generic #159+sf181954.1 SMP Wed Aug 15 14:18:05 -03 2018 
x86_64 x86_64 x86_64 GNU/Linux
$ echo 9 | sudo tee /proc/sys/kernel/printk                                     
         
9
$ sudo losetup --find --show --partscan rlv_grkgld.1mb     
/dev/loop0

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1787281

Title:
  errors when scanning partition table of corrupted AIX disk

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  [Impact]

   * Users with disks/LUNs used for AIX operating system installations
     previously, which possibly undergone overwrites/corruption on the
     partition table, might hit kernel failures during partition scan
     of such disk/LUN, and possibly hang the system (seen with retries).

   * The Linux kernel should be robust to corrupted disk data, performing
     a better sanitization/checks and not failing.

   * The fix are a couple of simple logic changes to make the code
     of the AIX partition table parser more robust.

  [Test Case]

   * Run the partition scan on the (trimmed) disk image of the AIX lun.
     (It's not provided here since it contains customer data), with this
     command:

     $ sudo losetup --find --show --partscan rlv_grkgld.1mb

   * On failure, the command hangs, and messages like these are printed
     to the console, depending on the kernel version (see tests below)

     [  270.506420] partition (null) (3 pp's found) is not contiguous

     [  270.597428] BUG: unable to handle kernel paging request at 
0000000000001000
     [  270.599525] IP: [<ffffffff81379d4d>] strnlen+0xd/0x40

   * On success, the command prints a loop device name, for example:

     /dev/loop0
   
  [Regression Potential] 

   * Low. Both changes are simple improvements in logic.

   * This affects users which mount disks/LUNs from the AIX OS;
     it should only change behavior for users which relied on a
     uninitialized variables to work correctly during partition
     scan of those disks/LUNs which should be rare as the code
     is likely to fail as we observe in this scenario.
     
   * This has been tested on Cosmic, Bionic, Xenial, and Trusty.

  
  [Other Info]

   * Patches will be sent to the kernel-team mailing list.

  Bug Description:
  ---------------

  We've recently received a disk image from an AIX LUN that when
  attached on Linux displayed errors on console, then eventually
  hung the system (specially if the SCSI bus was re-scanned, and
  leading to another partition scan).

  Apparently the LUN was originally installed with AIX and later
  exercised with some I/O stress/overwrites which caused certain
  bits to be wrong in just the right way for Linux to get a NULL
  pointer and invalid data.

  This is the test-case used ('--partscan' is the important bit).
    $ sudo losetup --show --find --partscan aix-lun.img

  Since the original code is old, it affects several releases.
  It's interesting to fix this on 14.04 and up, on which IBM
  Power servers were initially supported (since they can run
  AIX too, and possibly hit this due to an already used disk/LUN).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1787281/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

Reply via email to