> The community block I/O test suite is qemu-iotests: > http://git.kernel.org/?p=linux/kernel/git/hch/qemu-iotests.git;a=summary > If you have tests that you'd like to contribute, please put them into > that framework so other developers can run them as part of their > regular testing.
Hi Stefan, What I described is not a qemu-io test case. I also use qemu-io, which is very helpful, but I observed that qemu-io has several limitations in discovering elusive bugs: B1) qemu-io cannot trigger many race condition bugs, because it does not fully control the timing of events. For example, qemu-io cannot test this scenario: three concurrent writes a, b, and c are processed by bdrv_aio_writev() in the order of Pa, Pb, and Pc; their writes are actually persisted on disk in another order of Wc, Wa, and Wb; and finally their callbacks are invoked in yet another order of Vb, Vc, and Va. Some race condition bugs may exist in the code (e.g., inappropriate locking), because it does not anticipate these orders of events are possible. This is just one example. In theory, there can be 100 concurrent reads or writes, and their events can happen in an arbitrary permutation order. It is nearly impossible to manually generating test cases for all of them. B2) Even if a race condition bug is triggered by chance, its behavior depends on subtle event timing that is hard to repeat and hence hard to debug. B3) With qemu-io, it is hard to test code paths that handle I/O failures. For example, a disk write may fail due to disk media error. Because these errors are rare, the failure handling code paths may never be tested, which for example may contain a null pointer bug that can crash the entire VM or gradually leaks resources (e.g., memory) due to incomplete cleanup. B4) qemu-io requires manually creating test cases, which is not only time consuming but also leads a low coverage in testing. This is because many bugs happen in scenarios that the developers do not anticipate, and hence do not know how to create test cases in the first place. The FVD patch includes a new testing framework that addresses the above issues. This testing framework is orthogonal to FVD and can be used to test other block device drivers as well. This testing framework includes two components that can be used both separately and in a combination T1) To address the problems of B1- B3, I implemented an emulated disk in block/sim.c, which allows a full control of event timings, either manually or automatically. Given the three concurrent writes example above, their 9 events (Pa, Pb, Pc, Wa, Wb, Wc, Va, Vb, and Vc) can be precisely controlled to be executed in any given order. Moreover, the emulated disk can inject disk I/O errors in a controlled manner. For example, it can fail a specific read or write to test how the code handles that, or it can even fail as many as 90% of the reads/writes to test if the code has resource leaks. qemu-io is extended with a module qemu-io-sim.c to work with the emulated disk block/sim.c, so that the tester can use the qemu-io console to manually control the order of events or fail disk reads or writes. T2) The solution in T1 still does not address the problem of B3), i.e., manually generating test cases is time consuming and has a low coverage. This problem is solved by a new testing tool called qemu-test. qemu-test can 1) automatically generate an unlimited number of randomized test cases that, e.g., execute 1,000 concurrent disk reads or writes on overlapping disk regions; 2) automatically generate the corresponding anticipated correct results, automatically run the tests, and automatically compare the actual test results with the anticipated correct results. Once it discovers a difference, which indicates a bug, it halts testing and waits for the developer to debug. The randomized test cases created by qemu-test are controlled by a pseudo random number generator, and hence the behavior is completely repeatable. Therefore, once a bug is triggered, it can be precisely repeated for an unlimited number of times to facilitate debugging, even if this bug happens extremely rare in real runs of a VM. qemu-test is fully automated. Once started, it can continuously run, e.g., for months to test an enormous number of test cases. The implementation of qemu-test is actually not that complicated. It opens two virtual disks, the so-called truth image and test image, respectively. The truth image is served by a trivial synchronous block device driver so that its behavior is guaranteed to be correct. The test image is served a real block device driver (e.g., FVD or QCOW2) that we want to test. qemu-test submits the same sequence of disk I/O requests (which is randomly generated) to the truth image and the test image, and expect that the two images’ contents never diverge. Otherwise, it indicates a bug in the test image’s block device driver. qemu-test works with the emulated disk block/sim.c so that it can randomize event timings in a controlled manner and can inject disk I/O errors randomly. I found qemu-test extremely powerful in discovering elusive bugs that I never anticipated, and using qemu-test is effortless. Whenever I completed some major code upgrade, I simply started qemu-test in the evening and came back in the morning to collect bugs, if any. Debugging them is also easy because the bugs are precisely repeatable even if they are hard to trigger. As for the QCOW2 bug I mentioned previously, it can be triggered by test-qcow2.sh. A faster way to trigger it is to bypass those correct test runs by executing the commands below: dd if=/dev/zero of=/var/ramdisk/truth.raw count=0 bs=1 seek=1155683840 dd if=/dev/zero of=/var/ramdisk/zero-500M.raw count=0 bs=1 seek=609064448 ./qemu-img create -f qcow2 -b /var/ramdisk/zero-500M.raw /var/ramdisk/test.qcow2 1155683840 ./qemu-test --seed=116579177 --truth=/var/ramdisk/truth.raw --test=/var/ramdisk/test.qcow2 --verify_write=true --compare_before=false --compare_after=true --round=100000 --parallel=100 --io_size=10485760 --fail_prob=0 --cancel_prob=0 --instant_qemubh=true As for the FVD patch that includes the new testing framework, I tried to post it on the mailing list twice but it always got bounced back, either because the message is too big or because of a Notes client configuration issue. Until I figure it out, please down the FVD patch from https://researcher.ibm.com/researcher/files/us-ctang/FVD-01-14-2011.patch . Best regards, ChunQiang (CQ) Tang, Ph.D. Homepage: http://www.research.ibm.com/people/c/ctang