------- Comment From ma...@de.ibm.com 2016-11-11 08:24 EDT------- (In reply to comment #2) > (In reply to comment #1) > The installation was on a FCP SCSI SAN volumes each with two active paths. > Multipath was involved. The system IPLed fine up to the point that we > expanded the /root filesystem to span volumes. At boot time, the system > was unable to locate the second segment of the /root filesystem. The error > message indicated this was due to lvmetad not being not active.
For the zfcp case, did you use the chzdev tool to activate the paths of your new additional LVM physical volume (PV)? This is the only supported post-install method to (dynamically and) persistently activate zfcp-attached FCP LUNs. See also http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/ludd_t_fcp_wrk_addu.html. > PV Volume information: > physical_volumes { > > pv0 { > device = "/dev/sdb5" # Hint only > pv1 { > device = "/dev/sda" # Hint only This does not look very good, having single path scsi disk devices mentioned by LVM. With zfcp-attached SCSI disks, LVM must be on top of multipathing. Could you please double check if your installation with LVM and multipathing does the correct layering? If not, this would be an independent bug. See also [1, slide 28 "Multipathing for Disks ? LVM on Top"]. > Additional testing has been done with CKD volumes and we see the same > behavior. > Because of this behavior, I do not > believe the problem is related to SAN disk or multipath. I think it is due > to the system not being able to read the UUID on any PV in the VG other then > the IPL disk. For any disk device type, the initrd must contain all information how to enable/activate all paths of the entire block device dependency tree required to mount the root file system. An example for a dependency tree is in [1, slide 37] and such example is independent of any particular Linux distribution. I don't know how much automatic dependency tracking Ubuntu does for the user, especially regarding additional z-specific device activation steps ("setting online" as for DASD or zFCP). Potentially the user must take care of the dependency tree himself and ensure the necessary information lands in the initrd. Once the dependency tree of the root-fs has changed (such as adding a PV to an LVM containing the root-fs as in your case), you must re-create the initrd with the following command before any reboot: $ update-initramfs -u On z Systems, this also contains the necessary step to re-write the boot record (using the zipl bootloader management tool) so it correctly points to the new initrd. See also http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/ludd_t_fcp_wrk_on.html. In your case on reboot, it only activated 2 paths to FCP LUN 0x4000400e00000000 (I cannot determine the target port WWPN(s) from below output because it does not convey this info) from two different FCP devices 0.0.e300 and 0.0.e100. >From attachment 113696: [ 6.666977] scsi host0: zfcp [ 6.671670] random: nonblocking pool is initialized [ 6.672622] qdio: 0.0.e300 ZFCP on SC 2cc5 using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W AP [ 6.722312] scsi host1: zfcp [ 6.724547] scsi 0:0:0:1074675712: Direct-Access IBM 2107900 1.69 PQ: 0 ANSI: 5 [ 6.725159] sd 0:0:0:1074675712: alua: supports implicit TPGS [ 6.725164] sd 0:0:0:1074675712: alua: device naa.6005076306ffd700000000000000000e port group 0 rel port 303 [ 6.725287] sd 0:0:0:1074675712: Attached scsi generic sg0 type 0 [ 6.728234] qdio: 0.0.e100 ZFCP on SC 2c85 using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W AP [ 6.747662] sd 0:0:0:1074675712: alua: transition timeout set to 60 seconds [ 6.747667] sd 0:0:0:1074675712: alua: port group 00 state A preferred supports tolusnA [ 6.747801] sd 0:0:0:1074675712: [sda] 209715200 512-byte logical blocks: (107 GB/100 GiB) [ 6.748652] sd 0:0:0:1074675712: [sda] Write Protect is off [ 6.749024] sd 0:0:0:1074675712: [sda] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 6.752076] sda: sda1 sda2 < sda5 > [ 6.754107] sd 0:0:0:1074675712: [sda] Attached SCSI disk [ 6.760935] scsi 1:0:0:1074675712: Direct-Access IBM 2107900 1.69 PQ: 0 ANSI: 5 [ 6.761444] sd 1:0:0:1074675712: alua: supports implicit TPGS [ 6.761448] sd 1:0:0:1074675712: alua: device naa.6005076306ffd700000000000000000e port group 0 rel port 231 [ 6.761514] sd 1:0:0:1074675712: Attached scsi generic sg1 type 0 [ 6.787710] sd 1:0:0:1074675712: [sdb] 209715200 512-byte logical blocks: (107 GB/100 GiB) [ 6.787770] sd 1:0:0:1074675712: alua: port group 00 state A preferred supports tolusnA [ 6.788464] sd 1:0:0:1074675712: [sdb] Write Protect is off[ 6.788728] sd 1:0:0:1074675712: [sdb] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA [ 6.790829] sdb: sdb1 sdb2 < sdb5 > [ 6.792535] sd 1:0:0:1074675712: [sdb] Attached SCSI disk REFERENCE [1] http://www-05.ibm.com/de/events/linux-on-z/pdf/day2 /4_Steffen_Maier_zfcp-best-practices-2015.pdf ** Tags removed: bugnameltc-148452 severity-critical -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1641078 Title: System cannot be booted up when root filesystem is on an LVM on two disks Status in linux package in Ubuntu: New Bug description: ---Problem Description--- LVMed root file system acrossing multiple disks cannot be booted up ---uname output--- Linux ntc170 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:47:15 UTC 2016 s390x s390x s390x GNU/Linux ---Patches Installed--- n/a Machine Type = z13 ---System Hang--- cannot boot up the system after shutdown or reboot ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Created root file system on an LVM and the LVM crosses two disks. After shut down or reboot the system, the system cannot be up. Stack trace output: no Oops output: no System Dump Info: The system is not configured to capture a system dump. Device driver error code: Begin: Mounting root file system ... Begin: Running /scripts/local-top ... lvmetad is not active yet, using direct activation during sysinit Couldn't find device with uuid 7PC3sg-i5Dc-iSqq-AvU1-XYv2-M90B-M0kO8V. -Attach sysctl -a output output to the bug. More detailed installation description: The installation was on a FCP SCSI SAN volumes each with two active paths. Multipath was involved. The system IPLed fine up to the point that we expanded the /root filesystem to span volumes. At boot time, the system was unable to locate the second segment of the /root filesystem. The error message indicated this was due to lvmetad not being not active. Error message: Begin: Running /scripts/local-block ... lvmetad is not active yet, using direct activation during sysinit Couldn't find device with uuid 7PC3sg-i5Dc-iSqq-AvU1-XYv2-M90B-M0kO8V Failed to find logical volume "ub01-vg/root" PV Volume information: physical_volumes { pv0 { id = "L2qixM-SKkF-rQsp-ddao-gagl-LwKV-7Bw1Dz" device = "/dev/sdb5" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 208713728 # 99.5225 Gigabytes pe_start = 2048 pe_count = 25477 # 99.5195 Gigabytes } pv1 { id = "7PC3sg-i5Dc-iSqq-AvU1-XYv2-M90B-M0kO8V" device = "/dev/sda" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 209715200 # 100 Gigabytes pe_start = 2048 pe_count = 25599 # 99.9961 Gigabytes LV Volume Information: logical_volumes { root { id = "qWuZeJ-Libv-DrEs-9b1a-p0QF-2Fj0-qgGsL8" status = ["READ", "WRITE", "VISIBLE"] flags = [] creation_host = "ub01" creation_time = 1477515033 # 2016-10-26 16:50:33 -0400 segment_count = 2 segment1 { start_extent = 0 extent_count = 921 # 3.59766 Gigabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 0 ] } segment2 { start_extent = 921 extent_count = 25344 # 99 Gigabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv1", 0 ] } } Additional testing has been done with CKD volumes and we see the same behavior. Only the UUID of the fist volume in the VG can be located at boot, and the same message: lvmetad is not active yet, using direct activation during sysinit Couldn't find device with uuid xxxxxxxxxxxxxxxxx is displayed for CKD disks. Just a different UUID is listed. If the file /root file system only has one segment on the first volume, CKD or SCSI volumes, the system will IPL. Because of this behavior, I do not believe the problem is related to SAN disk or multipath. I think it is due to the system not being able to read the UUID on any PV in the VG other then the IPL disk. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1641078/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp