On Wed, Mar 29, 2017 at 07:01:38PM -0700, Ed Swierk wrote: > Parts of qemu's block code have changed a lot in recent months but are > not well exercised by current tests. > > Subtle bugs have crept in causing assertion failures, hangs and other > crashes in a variety of situations: immediately on start, on first > guest activity, on external snapshot create or commit, on qmp quit > command. > > Reproducing these bugs has proved tricky, as each may occur only with > a specific combination of qemu version, block device type (virtio-blk > or virtio-scsi) and iothread enabled or not. In some cases the bug > occurs only after several external snapshot operations. And in some > cases the bug only manifests when a guest is accessing the block > device simultaneously. > > I've written an iotest (number 176, for now) that attempts to cover > many of these configurations. Currently it only exercises the external > snapshot create and commit lifted from iotest 118. The new iotest does > this repeatedly in each of 16 combinations: > - no guest / guest > - virtio-blk / virtio-scsi > - no iothread / iothread > - single / repeated external snapshot create+commit
Thanks Ed! This is has a lot of potential. I see three different issues that can be discussed separately: 1. Urgent 2.9 bug fix for `ctx->external_disable_cnt > 0' failed assertion. I believe you've already started a separate email thread about it called "Assertion failure taking external snapshot with virtio drive + iothread". 2. QEMU 2.8 stable hang. Less urgent but worth understanding, perhaps via git-bisect against QEMU 2.9. 3. Minor iotest enhancements. Please send a separate patch series. 4. How to automate tests with real Linux guests? This is a complex topic and probably what we should discuss in this email thread. The buildroot + busybox approach is good for a small set of sanity tests. There was a similar attempt here: https://github.com/stsquad/qemu-jeos Building from source becomes a challenge when other people want to add software to test other areas of QEMU. The process also requires attention to maintain the image over time (e.g. as host build environments change). There are image builder tools like virt-builder and mkosi for building bootable virtual machine images based on standard Linux distros: http://libguestfs.org/virt-builder.1.html https://github.com/systemd/mkosi This eliminates the build-from-source hassles and gives us a full Linux guest environment. Booting is very fast with mkosi so the advantage to custom building a minimal image is negligible. My suggestion is: Let's pick an image builder tool like virt-builder and keep a single build script per guest architecture (e.g. build-test-os-x86_64.sh). All tests for that architecture run against the same disk image. It's easy to add additional software to the disk image by modifying the build script. A Makefile ensures that the image file gets rebuilt if the build script has changed. > > I made some minor changes to the test infrastructure so the new iotest > can deal gracefully with qemu hanging--the test script itself > shouldn't hang. And in all failure modes the test needs to expose > enough console output and other information to diagnose the problem. > > The main departure from existing iotests is running a real guest. I > used buildroot to generate a small (~4 MB) Linux kernel with built-in > initrd containing a busybox-based userland. After the iotest launches > qemu, the guest loops writing to the block device, while the test > performs snapshot operations. > > I ran the new iotest on 3 qemu versions: 2.7.1, stable-2.8-staging and > 2.9.0-rc2. The latter two fail several test cases, all > iothread-enabled. Only 2.7.1 passes all the cases. > > Here is the code for the new iotest (I didn't dare email patches with > a 4 MB blob): > https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.7 > https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.8 > https://github.com/skyportsystems/qemu-1/commits/eswierk-iotests-2.9 > > And here is the buildroot I used to generate the guest Linux kernel+initrd: > https://github.com/skyportsystems/buildroot-1/commits/qemu-iotests > > Please check out the code and try the new test--particularly anyone > who can also help figure out these failures. (Note that since half the > test cases use an iothread, /dev/kvm must be readable and writable.) > > * stable-2.8-staging > - guest, virtio-blk, iothread, single snapshot create+commit: hang on > quit (intermittent) > - guest, virtio-blk, iothread, repeated snapshot create+commit: hang > after 1 iteration > - guest, virtio-scsi, iothread, single snapshot create+commit: hang on > quit (intermittent) > - guest, virtio-scsi, iothread, repeated snapshot create+commit: hang > after 1 iteration > > * 2.9.0-rc2 > - guest, virtio-blk, iothread, single snapshot create+commit: > "include/block/aio.h:457: aio_enable_external: Assertion > `ctx->external_disable_cnt > 0' failed." after snapshot create > - guest, virtio-blk, iothread, repeated snapshot create+commit: same as above > - guest, virtio-scsi, iothread, single snapshot create+commit: same as above > - guest, virtio-scsi, iothread, repeated snapshot create+commit: same as above > - no guest, virtio-blk, iothread, repeated snapshot create+commit: same as > above > - no guest, virtio-scsi, iothread, single snapshot create+commit: same as > above > - no guest, virtio-scsi, iothread, repeated snapshot create+commit: > same as above > > --Ed >
signature.asc
Description: PGP signature