[Kernel-packages] [Bug 1641078] Re: System cannot be booted up when root filesystem is on an LVM on two disks

bugproxy Fri, 11 Nov 2016 05:41:24 -0800

------- Comment From ma...@de.ibm.com 2016-11-11 08:24 EDT-------
(In reply to comment #2)
> (In reply to comment #1)
> The installation was on a FCP SCSI SAN volumes each with two active paths.
> Multipath was involved.  The system IPLed fine up to the point that we
> expanded the /root filesystem to span volumes.  At boot time,  the system
> was unable to locate the second segment of the /root filesystem.   The error
> message indicated this was due to lvmetad not being not active.


For the zfcp case, did you use the chzdev tool to activate the paths of your 
new additional LVM physical volume (PV)?
This is the only supported post-install method to (dynamically and) 
persistently activate zfcp-attached FCP LUNs. See also 
http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/ludd_t_fcp_wrk_addu.html.

> PV Volume information:
> physical_volumes {
>
>                pv0 {
>                        device = "/dev/sdb5"        # Hint only

>                pv1 {
>                        device = "/dev/sda"        # Hint only

This does not look very good, having single path scsi disk devices
mentioned by LVM. With zfcp-attached SCSI disks, LVM must be on top of
multipathing. Could you please double check if your installation with
LVM and multipathing does the correct layering? If not, this would be an
independent bug. See also [1, slide 28 "Multipathing for Disks ? LVM on
Top"].

> Additional testing has been done with CKD volumes and we see the same
> behavior.
> Because of this behavior, I do not
> believe the problem is related to SAN disk or multipath.   I think it is due
> to the system not being able to read the UUID on any PV in the VG other then
> the IPL disk.

For any disk device type, the initrd must contain all information how to 
enable/activate all paths of the entire block device dependency tree required 
to mount the root file system. An example for a dependency tree is in [1, slide 
37] and such example is independent of any particular Linux distribution.
I don't know how much automatic dependency tracking Ubuntu does for the user, 
especially regarding additional z-specific device activation steps ("setting 
online" as for DASD or zFCP). Potentially the user must take care of the 
dependency tree himself and ensure the necessary information lands in the 
initrd.

Once the dependency tree of the root-fs has changed (such as adding a PV to an 
LVM containing the root-fs as in your case), you must re-create the initrd with 
the following command before any reboot:
$ update-initramfs -u

On z Systems, this also contains the necessary step to re-write the boot record 
(using the zipl bootloader management tool) so it correctly points to the new 
initrd.
See also 
http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/ludd_t_fcp_wrk_on.html.

In your case on reboot, it only activated 2 paths to FCP LUN 0x4000400e00000000 
(I cannot determine the target port WWPN(s) from below output because it does 
not convey this info) from two different FCP devices 0.0.e300 and 0.0.e100.
>From attachment 113696:
[    6.666977] scsi host0: zfcp
[    6.671670] random: nonblocking pool is initialized
[    6.672622] qdio: 0.0.e300 ZFCP on SC 2cc5 using AI:1 QEBSM:0 PRI:1 TDD:1 
SIGA: W AP
[    6.722312] scsi host1: zfcp
[    6.724547] scsi 0:0:0:1074675712: Direct-Access     IBM      2107900        
  1.69 PQ: 0 ANSI: 5
[    6.725159] sd 0:0:0:1074675712: alua: supports implicit TPGS
[    6.725164] sd 0:0:0:1074675712: alua: device 
naa.6005076306ffd700000000000000000e port group 0 rel port 303
[    6.725287] sd 0:0:0:1074675712: Attached scsi generic sg0 type 0
[    6.728234] qdio: 0.0.e100 ZFCP on SC 2c85 using AI:1 QEBSM:0 PRI:1 TDD:1 
SIGA: W AP
[    6.747662] sd 0:0:0:1074675712: alua: transition timeout set to 60 seconds
[    6.747667] sd 0:0:0:1074675712: alua: port group 00 state A preferred 
supports tolusnA
[    6.747801] sd 0:0:0:1074675712: [sda] 209715200 512-byte logical blocks: 
(107 GB/100 GiB)
[    6.748652] sd 0:0:0:1074675712: [sda] Write Protect is off
[    6.749024] sd 0:0:0:1074675712: [sda] Write cache: enabled, read cache: 
enabled, doesn't support DPO or FUA
[    6.752076]  sda: sda1 sda2 < sda5 >
[    6.754107] sd 0:0:0:1074675712: [sda] Attached SCSI disk
[    6.760935] scsi 1:0:0:1074675712: Direct-Access     IBM      2107900        
  1.69 PQ: 0 ANSI: 5
[    6.761444] sd 1:0:0:1074675712: alua: supports implicit TPGS
[    6.761448] sd 1:0:0:1074675712: alua: device 
naa.6005076306ffd700000000000000000e port group 0 rel port 231
[    6.761514] sd 1:0:0:1074675712: Attached scsi generic sg1 type 0
[    6.787710] sd 1:0:0:1074675712: [sdb] 209715200 512-byte logical blocks: 
(107 GB/100 GiB)
[    6.787770] sd 1:0:0:1074675712: alua: port group 00 state A preferred 
supports tolusnA
[    6.788464] sd 1:0:0:1074675712: [sdb] Write Protect is off[    6.788728] sd 
1:0:0:1074675712: [sdb] Write cache: enabled, read cache: enabled, doesn't 
support DPO or FUA
[    6.790829]  sdb: sdb1 sdb2 < sdb5 >
[    6.792535] sd 1:0:0:1074675712: [sdb] Attached SCSI disk

REFERENCE

[1] http://www-05.ibm.com/de/events/linux-on-z/pdf/day2
/4_Steffen_Maier_zfcp-best-practices-2015.pdf

** Tags removed: bugnameltc-148452 severity-critical

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1641078

Title:
  System cannot be booted up when root filesystem is on an LVM on two
  disks

Status in linux package in Ubuntu:
  New

Bug description:
  ---Problem Description---
  LVMed root file system acrossing multiple disks cannot be booted up 
    
  ---uname output---
  Linux ntc170 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:47:15 UTC 2016 
s390x s390x s390x GNU/Linux
   
  ---Patches Installed---
  n/a
   
  Machine Type = z13 
   
  ---System Hang---
   cannot boot up the system after shutdown or reboot
   
  ---Debugger---
  A debugger is not configured
   
  ---Steps to Reproduce---
   Created root file system on an LVM and the LVM crosses two disks. After shut 
down or reboot the system, the system cannot be up. 
   
  Stack trace output:
   no
   
  Oops output:
   no
   
  System Dump Info:
    The system is not configured to capture a system dump.
   
  Device driver error code:
   Begin: Mounting root file system ... Begin: Running /scripts/local-top ...   
lvmetad is not active yet, using direct activation during sysinit 
    Couldn't find device with uuid 7PC3sg-i5Dc-iSqq-AvU1-XYv2-M90B-M0kO8V. 
   
  -Attach sysctl -a output output to the bug.

  More detailed installation description:

  The installation was on a FCP SCSI SAN volumes each with two active
  paths.  Multipath was involved.  The system IPLed fine up to the point
  that we expanded the /root filesystem to span volumes.  At boot time,
  the system was unable to locate the second segment of the /root
  filesystem.   The error message indicated this was due to lvmetad not
  being not active.

  Error message:   
         Begin: Running /scripts/local-block ...   lvmetad is not active yet, 
using direct activation during sysinit 
         Couldn't find device with uuid 7PC3sg-i5Dc-iSqq-AvU1-XYv2-M90B-M0kO8V 
          Failed to find logical volume "ub01-vg/root" 
          
  PV Volume information: 
  physical_volumes { 

                 pv0 { 
                         id = "L2qixM-SKkF-rQsp-ddao-gagl-LwKV-7Bw1Dz" 
                         device = "/dev/sdb5"        # Hint only 

                         status = ["ALLOCATABLE"] 
                         flags = [] 
                         dev_size = 208713728        # 99.5225 Gigabytes 
                         pe_start = 2048 
                         pe_count = 25477        # 99.5195 Gigabytes 
                 } 

                 pv1 { 
                         id = "7PC3sg-i5Dc-iSqq-AvU1-XYv2-M90B-M0kO8V" 
                         device = "/dev/sda"        # Hint only 

                         status = ["ALLOCATABLE"] 
                         flags = [] 
                         dev_size = 209715200        # 100 Gigabytes 
                         pe_start = 2048 
                         pe_count = 25599        # 99.9961 Gigabytes 

  
  LV Volume Information: 
  logical_volumes { 

                 root { 
                         id = "qWuZeJ-Libv-DrEs-9b1a-p0QF-2Fj0-qgGsL8" 
                         status = ["READ", "WRITE", "VISIBLE"] 
                         flags = [] 
                         creation_host = "ub01" 
                         creation_time = 1477515033        # 2016-10-26 
16:50:33 -0400 
                         segment_count = 2 

                         segment1 { 
                                 start_extent = 0 
                                 extent_count = 921        # 3.59766 Gigabytes 

                                 type = "striped" 
                                 stripe_count = 1        # linear 

                                 stripes = [ 
                                         "pv0", 0 
                                 ] 
                         } 
                         segment2 { 
                                 start_extent = 921 
                                 extent_count = 25344        # 99 Gigabytes 

                                 type = "striped" 
                                 stripe_count = 1        # linear 

                                 stripes = [ 
                                         "pv1", 0 
                                 ] 
                         } 
                 } 

  
  Additional testing has been done with CKD volumes and we see the same 
behavior.   Only the UUID of the fist volume in the VG can be located at boot, 
and the same message:  lvmetad is not active yet, using direct activation 
during sysinit 
  Couldn't find device with uuid xxxxxxxxxxxxxxxxx  is displayed for CKD disks. 
Just a different UUID is listed.   
  If the file /root file system only has one segment on the first volume,  CKD 
or SCSI  volumes, the system will IPL.  Because of this behavior, I do not 
believe the problem is related to SAN disk or multipath.   I think it is due to 
the system not being able to read the UUID on any PV in the VG other then the 
IPL disk.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1641078/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to     : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1641078] Re: System cannot be booted up when root filesystem is on an LVM on two disks

Reply via email to