It appears as sometimes it takes more time for Xen even start booting, mostly due to firmware and fetching large boot files by grub. In some jobs the current timeout is pretty close to the actual time needed, and sometimes (rarely for now) test fails due to timeout expiring in the middle of dom0 booting. This will be happening more often if the initramfs will grow (and with more complex tests). This has been observed on some dom0pvh-hvm jobs, at least on runners hw3 and hw11.
Switch to using expect (console.exp) for more robust test output handling. This allows waiting separately for Xen starting to boot and then for the test to complete. For now, set both of those to 120s, which pessimistically bumps timeout for the whole test to 240s (from 120s). Some messages use regex, use 'expect -re' for all of them for consistency, even though not all strictly need that (yet). Signed-off-by: Marek Marczykowski-Górecki <marma...@invisiblethingslab.com> --- Changes in v2: - replace previous "ci: increase timeout for hw tests" with changing how console is interacted with This needs a containers rebuild. --- automation/build/alpine/3.18-arm64v8.dockerfile | 1 +- automation/scripts/console.exp | 23 ++++++-- automation/scripts/qubes-x86-64.sh | 52 ++++-------------- 3 files changed, 32 insertions(+), 44 deletions(-) diff --git a/automation/build/alpine/3.18-arm64v8.dockerfile b/automation/build/alpine/3.18-arm64v8.dockerfile index 19fe46f8418f..b8482d5bf43f 100644 --- a/automation/build/alpine/3.18-arm64v8.dockerfile +++ b/automation/build/alpine/3.18-arm64v8.dockerfile @@ -48,3 +48,4 @@ RUN apk --no-cache add \ # qubes test deps openssh-client \ fakeroot \ + expect \ diff --git a/automation/scripts/console.exp b/automation/scripts/console.exp index 31ce97b91b63..d1689fa5bf7f 100755 --- a/automation/scripts/console.exp +++ b/automation/scripts/console.exp @@ -28,21 +28,34 @@ if {[info exists env(UBOOT_CMD)]} { send "$env(UBOOT_CMD)\r" } +if {[info exists env(BOOT_MSG)]} { + expect -re "$env(BOOT_MSG)" +} + +if {[info exists env(WAKEUP_CMD)]} { + expect -re "$env(SUSPEND_MSG)" + + # keep it suspended a bit, then wakeup + sleep 30 + + system "$env(WAKEUP_CMD)" +} + if {[info exists env(LOG_MSG)]} { expect { - "$env(PASSED)" { - expect "$env(LOG_MSG)" + -re "$env(PASSED)" { + expect -re "$env(LOG_MSG)" exit 0 } - "$env(LOG_MSG)" { - expect "$env(PASSED)" + -re "$env(LOG_MSG)" { + expect -re "$env(PASSED)" exit 0 } } } expect { - "$env(PASSED)" { + -re "$env(PASSED)" { exit 0 } } diff --git a/automation/scripts/qubes-x86-64.sh b/automation/scripts/qubes-x86-64.sh index 8e78b7984e98..0eac410f4168 100755 --- a/automation/scripts/qubes-x86-64.sh +++ b/automation/scripts/qubes-x86-64.sh @@ -1,6 +1,6 @@ #!/bin/sh -set -ex +set -ex -o pipefail # One of: # - "" PV dom0, PVH domU @@ -263,52 +263,26 @@ cp -f binaries/xen $TFTP/xen cp -f binaries/bzImage $TFTP/vmlinuz cp -f binaries/dom0-rootfs.cpio.gz $TFTP/initrd-dom0 -# start logging the serial; this gives interactive console, don't close its -# stdin to not close it; the 'cat' is important, plain redirection would hang -# until somebody opens the pipe; opening and closing the pipe is used to close -# the console -mkfifo /tmp/console-stdin -cat /tmp/console-stdin |\ -ssh $CONTROLLER console | tee smoke.serial | sed 's/\r//' & - # start the system pointing at gitlab-ci predefined config ssh $CONTROLLER gitlabci poweron -trap "ssh $CONTROLLER poweroff; : > /tmp/console-stdin" EXIT +trap "ssh $CONTROLLER poweroff" EXIT if [ -n "$wait_and_wakeup" ]; then - # wait for suspend or a timeout - until grep "$wait_and_wakeup" smoke.serial || [ $timeout -le 0 ]; do - sleep 1; - : $((--timeout)) - done - if [ $timeout -le 0 ]; then - echo "ERROR: suspend timeout, aborting" - exit 1 - fi - # keep it suspended a bit, then wakeup - sleep 30 - ssh $CONTROLLER wake + export SUSPEND_MSG="$wait_and_wakeup" + export WAKEUP_CMD="ssh $CONTROLLER wake" fi -set +x -until grep "^Welcome to Alpine Linux" smoke.serial || [ $timeout -le 0 ]; do - sleep 1; - : $((--timeout)) -done -set -x - -tail -n 100 smoke.serial - -if [ $timeout -le 0 ]; then - echo "ERROR: test timeout, aborting" - exit 1 -fi +export PASSED="${passed}" +export BOOT_MSG="Latest ChangeSet: " +export LOG_MSG="\nWelcome to Alpine Linux" +export TEST_CMD="ssh $CONTROLLER console" +export TEST_LOG="smoke.serial" +export TEST_TIMEOUT="$timeout" +./automation/scripts/console.exp | sed 's/\r\+$//' +TEST_RESULT=$? if [ -n "$retrieve_xml" ]; then nc -w 10 "$SUT_ADDR" 8080 > tests-junit.xml </dev/null fi -sleep 1 - -(grep -q "^Welcome to Alpine Linux" smoke.serial && grep -q "${passed}" smoke.serial) || exit 1 -exit 0 +exit "$TEST_RESULT" -- git-series 0.9.1