Thanks for your reply,

TL; DR: not a debirf issue. Maybe a kernel issue. Maybe a hardware
problem. Two patches attached. First one: just helped me debug the
problem. The second one may be useful for debirf.

On Thu, Apr 12, 2018 at 08:30:23PM -0400, Daniel Kahn Gillmor wrote:
> On Thu 2018-04-12 13:28:34 +0200, Tzafrir Cohen wrote:
> > Update: a new version of the patch. It now works and supports all
> > compressors supported by busybox (I tried bzip2, gzip, lzma, lzop and
> > xz). lzma and xz fail. Others work.
> 
> thanks for this work, Tzafrir!  I'd be willing to incorporate this as a
> workaround, but i'm also always leery of introducing new control knobs
> in any software (we have to educate the users that they're there -- and
> we have to maintain the knobs!)
> 
> I'd feel more comfortable incorporating this workaround as a temporary
> workaround if i knew that busybox was aiming to fix this.  have you
> reported the problem to busybox upstream?

>From what I see, the problem seems to be that the initramfs extracted by
the kernel (by the kernel, right?) is corrupted. When I compress it with
xz or lzma, I get a corrupted rootfs.cxz (as verified by its md5sum).
With others the archive (which is larger) managed to be properly
extracted. But when I added an md5 checksum (see attachment), I got an
error about "junk in compressed archive" and the test failed because the
md5sum file was missing.

My current rootfs is roughly 100MB (xz compressed). I tried repeating
this with a minimal configuration with a Sid target system. As I needed
a Sid version to build it I used the default "minimal" configuration. I
likewise failed to boot due to a corrupt roofs.cxz (which resulted in
missing files).

Upon further testing we managed to mount the USB device itself (by
insmod-ing the required kernel modules of the partially-extracted
image). And them compared it to the original one. The md5 checksum was
different.

cmp -l , however, showed that it was only different in 48 different
bytes. rootfs.cxz as extracted on the system had the following written
to it at some point rughly at 3/4 of it (at around the 82,000,000th
byte):

  2 0 0 0  0 0 0 0  0 0 0 0  0 0 0 0
  2 0 0 0  0 0 0 0  0 0 0 0  0 0 0 0
  2 0 0 0  0 0 0 0  0 0 0 0  0 0 0 0

When we tried to extract rootfs.cgz (from the USB device) using
busybox's gunzip and cpio but while still on the stage 1 system, there
was no problem.

So it's down to either a kernel problem or a hardware problem. My bet is
on the latter.

Kernels in question:
* linux-image-4.9.0-6-amd64  4.9.82-1+deb9u3
* linux-image-4.15.0-2-amd64 4.15.11-1

Attached patches:
* debirf-md5sum-check-for-rootfs.cxz.patch

  Create an md5 checksum or the (second-stage) rootfs.cxz at nested
  initramfs build time and check it at run time.


* debirf-run-shell-in-case-of-an-error-in-nested-init.patch

  If you have any error in the nested init, spawn a shell to see the
  error message rather than let it hide behind a lengthy kernel panic
  message. And yes, busybox ash supports functions and trap.

-- 
Tzafrir Cohen         | VIM is
http://tzafrir.org.il | a Mutt's
tzaf...@cohens.org.il |  best
tzaf...@debian.org    | friend
>From 159df2f77b35916a206264e548f1d1b14e432263 Mon Sep 17 00:00:00 2001
From: Tzafrir Cohen <tzafrir.co...@xorcom.com>
Date: Wed, 11 Apr 2018 17:03:24 +0300
Subject: [PATCH] debirf: md5sum check for rootfs.cxz

---
 debirf | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/debirf b/debirf
index bf380a3..1a48a3b 100755
--- a/debirf
+++ b/debirf
@@ -48,6 +48,9 @@ ROOT_WARNING=true
 # location of devices.tar.gz file
 export DEVICE_ARCHIVE=${DEVICE_ARCHIVE:-/usr/share/debirf/devices.tar.gz}
 
+# If set, create a checksum of rootfs.cxz, and verify it at boot:
+DEBIRF_VERIFY_ROOTFS_CXZ=${DEBIRF_VERIFY_ROOTFS_CXZ:-false}
+
 # default package include/excludes
 DEBIRF_DEFAULT_PACKAGES=${DEBIRF_DEFAULT_PACKAGES:-/usr/share/debirf/packages}
 
@@ -218,6 +221,13 @@ if (grep -q break=preunpack /proc/cmdline); then
   /bin/sh
 fi
 cd /newroot
+if $DEBIRF_VERIFY_ROOTFS_CXZ; then
+  echo verifying rootfs
+  if ! (cd ..; md5sum -cs rootfs.cxz.md5); then
+    echo "Error: invalid checksum for rootfs.cxz."
+    /bin/sh
+  fi
+fi
 echo unpacking rootfs...
 $DEBIRF_COMPRESS -d - < /rootfs.cxz | cpio -i
 if (grep -q break=bottom /proc/cmdline); then
@@ -233,6 +243,10 @@ EOF
     msg "creating rootfs.cxz..."
     fakeroot_if_needed ln -sf /sbin/init "$DEBIRF_ROOT/init"
     pack_rootfs "$NEST_ROOT"/rootfs.cxz
+    if $DEBIRF_VERIFY_ROOTFS_CXZ; then
+        msg "creating rootfs.cxz checksum..."
+        (cd "$NEST_ROOT"; md5sum rootfs.cxz >rootfs.cxz.md5)
+    fi
 
     msg "creating wrapper cgz..."
     fakeroot_if_needed sh -c "cd $NEST_ROOT && find * | cpio --create -H newc" | gzip -9 > "$INITRD"
-- 
2.11.0

>From 0729272c98d7ec74af50f9738b810f012ef28d17 Mon Sep 17 00:00:00 2001
From: Tzafrir Cohen <tzafrir.co...@xorcom.com>
Date: Mon, 16 Apr 2018 13:37:39 +0300
Subject: [PATCH] debirf: run shell in case of an error in nested init

If there is any error in the init script of the first stage of the
nested debirf, there is no error handling, and you typically get a
panic screen after the kernel failed to run the second stage's init.

This patch adds a simple error handler. It traps exit from the init
script, as the only way the init script will exit is through an error
(when run properly, it execs /sbin/init of the real system and thus does
not exit).

FIXME: $? is always 0, so the error message is still not good enough.
---
 debirf | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/debirf b/debirf
index 1a48a3b..3b8bede 100755
--- a/debirf
+++ b/debirf
@@ -207,6 +207,14 @@ create_initrd_nested() {
     # create nest init
     cat > "$NEST_ROOT"/init <<EOF
 #!/bin/sh
+
+error_handler() {
+	echo "$0: Failed with status $?. Entering a debug shell."
+	/bin/sh
+}
+trap error_handler 0
+set -e
+
 mkdir /proc
 mount -t proc proc /proc
 if (grep -q break=top /proc/cmdline); then
-- 
2.11.0

Reply via email to