I forgot to reply to one important part below

On Wed, 2 Oct 2024, Stefano Stabellini wrote:
> On Wed, 2 Oct 2024, Marek Marczykowski-Górecki wrote:
> > Check if xen.efi is bootable with an XTF dom0.
> > 
> > The TEST_TIMEOUT is set in the script to override project-global value.
> > Setting it in the gitlab yaml file doesn't work, as it's too low
> > priority
> > (https://docs.gitlab.com/ee/ci/variables/#cicd-variable-precedence).
> > 
> > The multiboot2+EFI path is tested on hardware tests already.
> > 
> > Signed-off-by: Marek Marczykowski-Górecki <marma...@invisiblethingslab.com>
> > ---
> > This requires rebuilding debian:bookworm container.
> > 
> > The TEST_TIMEOUT issue mentioned above applies to xilix-* jobs too. It's
> > not clear to me why the default TEST_TIMEOUT is set at the group level
> > instead of in the yaml file, so I'm not adjusting the other places.
> 
> Let me premise that now that we use "expect" all successful tests will
> terminate as soon as the success condition is met, without waiting for
> the test timeout to expire.
> 
> There is a CI/CD variable called TEST_TIMEOUT set at the
> gitlab.com/xen-project level. (There is also a check in console.exp in
> case TEST_TIMEOUT is not set so that we don't run into problems in case
> the CI/CD variable is removed accidentally.) The global TEST_TIMEOUT is
> meant to be a high value to account for slow QEMU tests running
> potentially on our slowest cloud runners.
> 
> However, for hardware-based tests such as the xilinx-* jobs, we know
> that the timeout is supposed to be less than that. The test is running
> on real hardware which is considerably faster than QEMU running on our
> slowest runners. Basically, the timeout depends on the runner more than
> the test. So we override the TEST_TIMEOUT variable for the xilinx-* jobs
> providing a lower timeout value.
> 
> The global TEST_TIMEOUT is set to 1500.
> The xilinx-* timeout is set to 120 for ARM and 1000 for x86.
> 
> You are welcome to override the TEST_TIMEOUT value for the
> hardware-based QubesOS tests. At the same time, given that on success
> the timeout is not really used, it is also OK to leave it like this.
 
 
> > ---
> >  automation/build/debian/bookworm.dockerfile |  1 +
> >  automation/gitlab-ci/test.yaml              |  7 ++++
> >  automation/scripts/qemu-smoke-x86-64-efi.sh | 44 +++++++++++++++++++++
> >  3 files changed, 52 insertions(+)
> >  create mode 100755 automation/scripts/qemu-smoke-x86-64-efi.sh
> > 
> > diff --git a/automation/build/debian/bookworm.dockerfile 
> > b/automation/build/debian/bookworm.dockerfile
> > index 3dd70cb6b2e3..061114ba522d 100644
> > --- a/automation/build/debian/bookworm.dockerfile
> > +++ b/automation/build/debian/bookworm.dockerfile
> > @@ -46,6 +46,7 @@ RUN apt-get update && \
> >          # for test phase, qemu-smoke-* jobs
> >          qemu-system-x86 \
> >          expect \
> > +        ovmf \
> >          # for test phase, qemu-alpine-* jobs
> >          cpio \
> >          busybox-static \
> > diff --git a/automation/gitlab-ci/test.yaml b/automation/gitlab-ci/test.yaml
> > index 8675016b6a37..74fd3f3109ae 100644
> > --- a/automation/gitlab-ci/test.yaml
> > +++ b/automation/gitlab-ci/test.yaml
> > @@ -463,6 +463,13 @@ qemu-smoke-x86-64-clang-pvh:
> >    needs:
> >      - debian-bookworm-clang-debug
> >  
> > +qemu-smoke-x86-64-gcc-efi:
> > +  extends: .qemu-x86-64
> > +  script:
> > +    - ./automation/scripts/qemu-smoke-x86-64-efi.sh pv 2>&1 | tee 
> > ${LOGFILE}
> > +  needs:
> > +    - debian-bookworm-gcc-debug
> 
> Given that the script you wrote (thank you!) can also handle pvh, can we
> directly add a pvh job to test.yaml too?
> 
> 
> >  qemu-smoke-riscv64-gcc:
> >    extends: .qemu-riscv64
> >    script:
> > diff --git a/automation/scripts/qemu-smoke-x86-64-efi.sh 
> > b/automation/scripts/qemu-smoke-x86-64-efi.sh
> > new file mode 100755
> > index 000000000000..e053cfa995ba
> > --- /dev/null
> > +++ b/automation/scripts/qemu-smoke-x86-64-efi.sh
> > @@ -0,0 +1,44 @@
> > +#!/bin/bash
> > +
> > +set -ex -o pipefail
> > +
> > +# variant should be either pv or pvh
> > +variant=$1
> > +
> > +# Clone and build XTF
> > +git clone https://xenbits.xen.org/git-http/xtf.git
> > +cd xtf && make -j$(nproc) && cd -
> > +
> > +case $variant in
> > +    pvh) k=test-hvm64-example    extra="dom0-iommu=none dom0=pvh" ;;
> > +    *)   k=test-pv64-example     extra= ;;
> > +esac
> > +
> > +mkdir -p boot-esp/EFI/BOOT
> > +cp binaries/xen.efi boot-esp/EFI/BOOT/BOOTX64.EFI
> > +cp xtf/tests/example/$k boot-esp/EFI/BOOT/kernel
> > +
> > +cat > boot-esp/EFI/BOOT/BOOTX64.cfg <<EOF
> > +[global]
> > +default=test
> > +
> > +[test]
> > +options=loglvl=all console=com1 noreboot console_timestamps=boot $extra
> > +kernel=kernel
> > +EOF
> > +
> > +cp /usr/share/OVMF/OVMF_CODE.fd OVMF_CODE.fd
> > +cp /usr/share/OVMF/OVMF_VARS.fd OVMF_VARS.fd
> > +
> > +rm -f smoke.serial
> > +export TEST_CMD="qemu-system-x86_64 -nographic -M q35,kernel-irqchip=split 
> > \
> > +        -drive if=pflash,format=raw,readonly=on,file=OVMF_CODE.fd \
> > +        -drive if=pflash,format=raw,file=OVMF_VARS.fd \
> > +        -drive file=fat:rw:boot-esp,media=disk,index=0,format=raw \
> > +        -m 512 -monitor none -serial stdio"
> > +
> > +export TEST_LOG="smoke.serial"
> > +export PASSED="Test result: SUCCESS"
> > +export TEST_TIMEOUT=120

Although this works, I would prefer keeping the TEST_TIMEOUT overrides
in test.yaml for consistency. However, it might be better not to
override it (or to override to a higher timeout value), as successful
tests will terminate immediately anyway. We need to be cautious about
setting TEST_TIMEOUT values too low, as using a slow runner (like a
small, busy cloud instance) can lead to false positive failures. This
issue occurred frequently with ARM tests when we temporarily moved from
a fast ARM server to slower ARM cloud instances a couple of months ago.

On the other hand, adjusting TEST_TIMEOUT for non-QEMU hardware-based
tests is acceptable since those tests rely on real hardware
availability, which is unlikely to become suddenly slower.

Reply via email to