Hi there! Cc:ing Wolfgang Karall, author of the 00-wait4hdds script. I am sorry for the long email.
On Mon, 01 Oct 2012 22:37:51 +0200, Martin Michlmayr wrote: > * Luca Capello <l...@pca.it> [2012-09-30 15:43]: >> Unfortunately, after the installation finished successfully, the >> machine did not reboot: the system light keeps blinking red/green >> and network does not work. > ... >> I am know lost and IMHO the only way to know what is going on is >> through the serial console, something I could probably do in the >> second week of October (the machine is not with me). > > I believe you ran into this issue: > http://comments.gmane.org/gmane.linux.ide/47799 [...] > http://forum.qnap.com/viewtopic.php?p=284721#p284592 > > Once you get serial console access, it would be great if you could > confirm if it's this issue. The serial console output focused me on the mdadm problem: --8<---------------cut here---------------start------------->8--- [ 5.437504] sd 1:0:0:0: [sdb] Attached SCSI disk [ 5.453383] sd 0:0:0:0: [sda] Attached SCSI disk [ 5.465525] sd 2:0:0:0: [sdc] Attached SCSI disk [ 5.478253] sd 3:0:0:0: [sdd] Attached SCSI disk Begin: Loading essential drivers ... done. Begin: Running /scripts/init-premount ... done. Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Loading[ 5.95505) [ 5.969412] xor: measuring software checksum speed [ 6.016317] arm4regs : 429.200 MB/sec [ 6.066322] 8regs : 377.200 MB/sec [ 6.116319] 32regs : 416.800 MB/sec [ 6.120494] xor: using function: arm4regs (429.200 MB/sec) [ 6.317494] raid6: int32x1 24 MB/s [ 6.486969] raid6: int32x2 35 MB/s [ 6.657528] raid6: int32x4 47 MB/s [ 6.827174] raid6: int32x8 42 MB/s [ 6.830917] raid6: using algorithm int32x4 (47 MB/s) [ 6.962270] md: raid6 personality registered for level 6 [ 6.967632] md: raid5 personality registered for level 5 [ 6.972932] md: raid4 personality registered for level 4 Success: loaded module raid456. done. Begin: Assembling all MD arrays ... [ 7.014576] mdadm: sending ioctl 1261 to a partition! [ 7.019684] mdadm: sending ioctl 1261 to a partition! [ 7.026122] mdadm: sending ioctl 1261 to a partition! [ 7.031254] mdadm: sending ioctl 1261 to a partition! [ 7.037407] mdadm: sending ioctl 1261 to a partition! [ 7.042452] mdadm: sending ioctl 1261 to a partition! [ 7.048782] mdadm: sending ioctl 1261 to a partition! [ 7.053831] mdadm: sending ioctl 1261 to a partition! *** glibc detected *** /sbin/mdadm: double free or corruption (out): 0x00089400 *** Aborted Failure: failed to assemble all arrays. done. [ 7.168150] device-mapper: uevent: version 1.0.3 [ 7.177894] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-de...@redhat.com Volume group "jem" not found Skipping volume group jem Unable to find LVM volume jem/root Volume group "jem" not found Skipping volume group jem Unable to find LVM volume jem/swap done. Begin: Waiting for root file system ... done. Gave up waiting for root device. Common problems: - Boot args (cat /proc/cmdline) - Check rootdelay= (did the system wait lnong enough?) - Check root= (did the system wait for the right device?) - Missing modules (cat /proc/modules; ls /dev) ALERT! /dev/mapper/jem-root does not exist. Dropping to a shell! BusyBox v1.17.1 (Debian 1:1.17.1-8) built-in shell (ash) Enter 'help' for a list of built-in commands. /bin/sh: can't access tty; job control turned off (initramfs) --8<---------------cut here---------------end--------------->8--- I flashed again d-i and did an upgrade to 6.0.6, which brought: ===== commit d0adc2b4c63970093c8fc89b35d414313bbd23c5 Author: root <r...@jem.pca.it> Date: Mon Oct 8 18:25:08 2012 +0200 committing changes in /etc after apt run Package changes: -base-files 6.0squeeze5 +base-files 6.0squeeze6 -debian-archive-keyring 2010.08.28 +debian-archive-keyring 2010.08.28+squeeze1 -dpkg 1.15.8.12 +dpkg 1.15.8.13 -libc-bin 2.11.3-3 -libc6 2.11.3-3 +libc-bin 2.11.3-4 +libc6 2.11.3-4 -libgc1c2 1:6.8-1.2 +libgc1c2 1:6.8-2 -linux-base 2.6.32-45 +linux-base 2.6.32-46 -linux-image-2.6.32-5-orion5x 2.6.32-45 -locales 2.11.3-3 +linux-image-2.6.32-5-orion5x 2.6.32-46 +locales 2.11.3-4 ===== At reboot, however, mdadm still segfaults... Going back to d-i I found another segfaulting while chrooting to install mdadm_3.2.5-3~bpo60+1: ===== ~ # mkdir /target ~ # mount /dev/mapper/jem-root /target/ ~ # mount /dev /target/dev -o bind ~ # mount /sys /target/sys -o bind ~ # mount /proc /target/proc -o bind ~ # chroot /target/ /bin/bash /bin/bash: symbol lookup error: /lib/libc.so.6: undefined symbol: , version GLIBC_2.4 ~ # chroot /target /bin/bash root@debian:/# ls Segmentation fault root@debian:/# ls bin dev home lost+found mnt proc sbin srv tmp var boot etc lib media opt root selinux sys usr root@debian:/# ===== At the next reboot mdadm does not segfault anymore, so at least this problem is fixed in 3.2.5-3~bpo60+1, it could be related to: <http://bugs.debian.org/621786> However, the array is still not assembled: --8<---------------cut here---------------start------------->8--- [ 5.444198] sd 1:0:0:0: [sdb] Attached SCSI disk [ 5.467753] sd 2:0:0:0: [sdc] Attached SCSI disk [ 5.472906] sd 0:0:0:0: [sda] Attached SCSI disk [ 5.484032] sd 3:0:0:0: [sdd] Attached SCSI disk Begin: Loading essential drivers ... done. Begin: Running /scripts/init-premount ... done. Begin: Mounting root file system ... Begin: Running /scripts/local-top ... Begin: Assembling all MD a! [ 6.003614] mdadm: sending ioctl 800c0910 to a partition! [ 6.009143] mdadm: sending ioctl 1261 to a partition! [ 6.014188] mdadm: sending ioctl 1261 to a partition! [ 6.020870] mdadm: sending ioctl 800c0910 to a partition! [ 6.026383] mdadm: sending ioctl 800c0910 to a partition! [ 6.031806] mdadm: sending ioctl 1261 to a partition! [ 6.036977] mdadm: sending ioctl 1261 to a partition! [ 6.043398] mdadm: sending ioctl 800c0910 to a partition! [ 6.048869] mdadm: sending ioctl 800c0910 to a partition! Failure: failed to assemble all arrays. done. [ 6.169818] device-mapper: uevent: version 1.0.3 [ 6.179502] device-mapper: ioctl: 4.15.0-ioctl (2009-04-01) initialised: dm-de...@redhat.com Volume group "jem" not found Skipping volume group jem Unable to find LVM volume jem/root --8<---------------cut here---------------end--------------->8--- I thus added Wolfgang's script, again experiencing random segfaults (and mdadm keeps segfaulting in d-i): ===== ~ # mkdir /target ~ # mount /dev/mapper/jem-root /target ~ # mount /dev /target/dev -o bind ~ # mount /sys /target/sys -o bind ~ # mount /proc /target/proc -o bind ~ # chroot /target /bin/bash root@debian:/# find /etc/initramfs-tools/scripts/ Segmentation fault root@debian:/# find /etc/initramfs-tools/scripts/ /etc/initramfs-tools/scripts/ /etc/initramfs-tools/scripts/local-bottom /etc/initramfs-tools/scripts/local-top /etc/initramfs-tools/scripts/nfs-top /etc/initramfs-tools/scripts/init-top /etc/initramfs-tools/scripts/init-top/00-wait4hdds /etc/initramfs-tools/scripts/init-premount /etc/initramfs-tools/scripts/local-premount /etc/initramfs-tools/scripts/nfs-bottom /etc/initramfs-tools/scripts/nfs-premount /etc/initramfs-tools/scripts/init-bottom root@debian:/# mkdir test root@debian:/# cd test/ root@debian:/test# zcat /boot/initrd.img-2.6.32-5-orion5x | cpio -i 15814 blocks root@debian:/test# cat etc/mdadm/mdadm.conf DEVICE partitions HOMEHOST <system> ARRAY /dev/md/0 metadata=1.2 UUID=a79d76f0:d98fecfb:5375bcd6:5fd506f0 name=jem:0 root@debian:/test# cat scripts/init-top/00-wait4hdds #!/bin/sh # # thanks to Wolfgang Karall <lists+debian-secur...@karall-edv.at> # <http://bugs.debian.org/689221> # <http://comments.gmane.org/gmane.linux.ide/47799> # <http://forum.qnap.com/viewtopic.php?p=284721#p284592> PREREQ="" prereqs() { echo "$PREREQ" } case $1 in # get pre-requisites prereqs) prereqs exit 0 ;; esac # wait 30 seconds for HDDs to spin up echo -n "START, waiting for HDDs" max=35 i=0 while [ "$i" -lt "$max" ]; do sleep 1 i=$((i+1)) echo -n ", $i" done echo ", DONE." root@debian:/test# [reboot] ===== Nothing changed, so it does not seems the same issue about disk spin-up Wolfgang experienced. One last try, the backports kernel: initramfs-tools_0.99~bpo60+1 linux-base_3.4~bpo60+1 linux-image-3.2.0-0.bpo.3-orion5x_3.2.23-1~bpo60+2 Except for the fact that all the mdadm errors are gone, the VG is not found: is thus an LVM problem? I found very strange that within d-i everything seems to work, though. Anyway, enough for today, I will have again access to the machine the night between Sunday 14th and Monday 15th. > I've no idea regarding the DHCP issue you mentioned. It now actually works from time to time, so it could be due to the network here (two DHCP servers and one not completely configurable). Thx, bye, Gismo / Luca
serial-console_2.6.32-45.log.gz
Description: Binary data
serial-console_2.6.32-46_00-wait4hdds.log.gz
Description: Binary data
serial-console_3.2.23-1_bpo60+2_00-wait4hdds.log.gz
Description: Binary data
pgpTJ505Ph9KU.pgp
Description: PGP signature