Thanks for your reply, TL; DR: not a debirf issue. Maybe a kernel issue. Maybe a hardware problem. Two patches attached. First one: just helped me debug the problem. The second one may be useful for debirf.
On Thu, Apr 12, 2018 at 08:30:23PM -0400, Daniel Kahn Gillmor wrote: > On Thu 2018-04-12 13:28:34 +0200, Tzafrir Cohen wrote: > > Update: a new version of the patch. It now works and supports all > > compressors supported by busybox (I tried bzip2, gzip, lzma, lzop and > > xz). lzma and xz fail. Others work. > > thanks for this work, Tzafrir! I'd be willing to incorporate this as a > workaround, but i'm also always leery of introducing new control knobs > in any software (we have to educate the users that they're there -- and > we have to maintain the knobs!) > > I'd feel more comfortable incorporating this workaround as a temporary > workaround if i knew that busybox was aiming to fix this. have you > reported the problem to busybox upstream? >From what I see, the problem seems to be that the initramfs extracted by the kernel (by the kernel, right?) is corrupted. When I compress it with xz or lzma, I get a corrupted rootfs.cxz (as verified by its md5sum). With others the archive (which is larger) managed to be properly extracted. But when I added an md5 checksum (see attachment), I got an error about "junk in compressed archive" and the test failed because the md5sum file was missing. My current rootfs is roughly 100MB (xz compressed). I tried repeating this with a minimal configuration with a Sid target system. As I needed a Sid version to build it I used the default "minimal" configuration. I likewise failed to boot due to a corrupt roofs.cxz (which resulted in missing files). Upon further testing we managed to mount the USB device itself (by insmod-ing the required kernel modules of the partially-extracted image). And them compared it to the original one. The md5 checksum was different. cmp -l , however, showed that it was only different in 48 different bytes. rootfs.cxz as extracted on the system had the following written to it at some point rughly at 3/4 of it (at around the 82,000,000th byte): 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 When we tried to extract rootfs.cgz (from the USB device) using busybox's gunzip and cpio but while still on the stage 1 system, there was no problem. So it's down to either a kernel problem or a hardware problem. My bet is on the latter. Kernels in question: * linux-image-4.9.0-6-amd64 4.9.82-1+deb9u3 * linux-image-4.15.0-2-amd64 4.15.11-1 Attached patches: * debirf-md5sum-check-for-rootfs.cxz.patch Create an md5 checksum or the (second-stage) rootfs.cxz at nested initramfs build time and check it at run time. * debirf-run-shell-in-case-of-an-error-in-nested-init.patch If you have any error in the nested init, spawn a shell to see the error message rather than let it hide behind a lengthy kernel panic message. And yes, busybox ash supports functions and trap. -- Tzafrir Cohen | VIM is http://tzafrir.org.il | a Mutt's tzaf...@cohens.org.il | best tzaf...@debian.org | friend
>From 159df2f77b35916a206264e548f1d1b14e432263 Mon Sep 17 00:00:00 2001 From: Tzafrir Cohen <tzafrir.co...@xorcom.com> Date: Wed, 11 Apr 2018 17:03:24 +0300 Subject: [PATCH] debirf: md5sum check for rootfs.cxz --- debirf | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/debirf b/debirf index bf380a3..1a48a3b 100755 --- a/debirf +++ b/debirf @@ -48,6 +48,9 @@ ROOT_WARNING=true # location of devices.tar.gz file export DEVICE_ARCHIVE=${DEVICE_ARCHIVE:-/usr/share/debirf/devices.tar.gz} +# If set, create a checksum of rootfs.cxz, and verify it at boot: +DEBIRF_VERIFY_ROOTFS_CXZ=${DEBIRF_VERIFY_ROOTFS_CXZ:-false} + # default package include/excludes DEBIRF_DEFAULT_PACKAGES=${DEBIRF_DEFAULT_PACKAGES:-/usr/share/debirf/packages} @@ -218,6 +221,13 @@ if (grep -q break=preunpack /proc/cmdline); then /bin/sh fi cd /newroot +if $DEBIRF_VERIFY_ROOTFS_CXZ; then + echo verifying rootfs + if ! (cd ..; md5sum -cs rootfs.cxz.md5); then + echo "Error: invalid checksum for rootfs.cxz." + /bin/sh + fi +fi echo unpacking rootfs... $DEBIRF_COMPRESS -d - < /rootfs.cxz | cpio -i if (grep -q break=bottom /proc/cmdline); then @@ -233,6 +243,10 @@ EOF msg "creating rootfs.cxz..." fakeroot_if_needed ln -sf /sbin/init "$DEBIRF_ROOT/init" pack_rootfs "$NEST_ROOT"/rootfs.cxz + if $DEBIRF_VERIFY_ROOTFS_CXZ; then + msg "creating rootfs.cxz checksum..." + (cd "$NEST_ROOT"; md5sum rootfs.cxz >rootfs.cxz.md5) + fi msg "creating wrapper cgz..." fakeroot_if_needed sh -c "cd $NEST_ROOT && find * | cpio --create -H newc" | gzip -9 > "$INITRD" -- 2.11.0
>From 0729272c98d7ec74af50f9738b810f012ef28d17 Mon Sep 17 00:00:00 2001 From: Tzafrir Cohen <tzafrir.co...@xorcom.com> Date: Mon, 16 Apr 2018 13:37:39 +0300 Subject: [PATCH] debirf: run shell in case of an error in nested init If there is any error in the init script of the first stage of the nested debirf, there is no error handling, and you typically get a panic screen after the kernel failed to run the second stage's init. This patch adds a simple error handler. It traps exit from the init script, as the only way the init script will exit is through an error (when run properly, it execs /sbin/init of the real system and thus does not exit). FIXME: $? is always 0, so the error message is still not good enough. --- debirf | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/debirf b/debirf index 1a48a3b..3b8bede 100755 --- a/debirf +++ b/debirf @@ -207,6 +207,14 @@ create_initrd_nested() { # create nest init cat > "$NEST_ROOT"/init <<EOF #!/bin/sh + +error_handler() { + echo "$0: Failed with status $?. Entering a debug shell." + /bin/sh +} +trap error_handler 0 +set -e + mkdir /proc mount -t proc proc /proc if (grep -q break=top /proc/cmdline); then -- 2.11.0