Resend because qemu-devel was dropped from CC. Thanks for pointing it out Kevin.
On Sat, Jan 15, 2011 at 12:25 PM, Stefan Hajnoczi <stefa...@gmail.com> wrote: > On Sat, Jan 15, 2011 at 3:28 AM, Chunqiang Tang <ct...@us.ibm.com> wrote: >> T1) To address the problems of B1- B3, I implemented an emulated disk in >> block/sim.c, which allows a full control of event timings, either manually >> or automatically. Given the three concurrent writes example above, their 9 >> events (Pa, Pb, Pc, Wa, Wb, Wc, Va, Vb, and Vc) can be precisely >> controlled to be executed in any given order. Moreover, the emulated disk >> can inject disk I/O errors in a controlled manner. For example, it can >> fail a specific read or write to test how the code handles that, or it can >> even fail as many as 90% of the reads/writes to test if the code has >> resource leaks. qemu-io is extended with a module qemu-io-sim.c to work >> with the emulated disk block/sim.c, so that the tester can use the qemu-io >> console to manually control the order of events or fail disk reads or >> writes. > > block/blkdebug.c already provides fault injection and is used in > qemu-iotests test 026. Using blkdebug it is possible to test specific > error paths in image formats. We should look at merging random > failures ("fail as many as 90% of the reads/writes") into blkdebug. > >> T2) The solution in T1 still does not address the problem of B3), i.e., >> manually generating test cases is time consuming and has a low coverage. >> This problem is solved by a new testing tool called qemu-test. qemu-test >> can 1) automatically generate an unlimited number of randomized test cases >> that, e.g., execute 1,000 concurrent disk reads or writes on overlapping >> disk regions; 2) automatically generate the corresponding anticipated >> correct results, automatically run the tests, and automatically compare >> the actual test results with the anticipated correct results. Once it >> discovers a difference, which indicates a bug, it halts testing and waits >> for the developer to debug. The randomized test cases created by >> qemu-test are controlled by a pseudo random number generator, and hence >> the behavior is completely repeatable. Therefore, once a bug is triggered, >> it can be precisely repeated for an unlimited number of times to >> facilitate debugging, even if this bug happens extremely rare in real runs >> of a VM. qemu-test is fully automated. Once started, it can continuously >> run, e.g., for months to test an enormous number of test cases. >> >> The implementation of qemu-test is actually not that complicated. It opens >> two virtual disks, the so-called truth image and test image, respectively. >> The truth image is served by a trivial synchronous block device driver so >> that its behavior is guaranteed to be correct. The test image is served a >> real block device driver (e.g., FVD or QCOW2) that we want to test. >> qemu-test submits the same sequence of disk I/O requests (which is >> randomly generated) to the truth image and the test image, and expect that >> the two images’ contents never diverge. Otherwise, it indicates a bug in >> the test image’s block device driver. qemu-test works with the emulated >> disk block/sim.c so that it can randomize event timings in a controlled >> manner and can inject disk I/O errors randomly. > > block/blkverify.c already provides I/O verification. It mirrors > writes to a raw file and compares the contents of read blocks to > detect data integrity issues. That's the same approach you have > described. > >> I found qemu-test extremely powerful in discovering elusive bugs that I >> never anticipated, and using qemu-test is effortless. Whenever I completed >> some major code upgrade, I simply started qemu-test in the evening and >> came back in the morning to collect bugs, if any. Debugging them is also >> easy because the bugs are precisely repeatable even if they are hard to >> trigger. > > Here are the unique features you've described beyond what qemu-io, > blkdebug, and blkverify do: > > 1. New functionality > * Control over ordering of I/O request submission and completion. > * Random I/O generator (probably as new qemu-io command). > > 2. Enhancements to existing code: > * Random chance of failing I/O in blkdebug. > > Do you agree with this or are there other unique features which are > beyond small enhancements to existing code? > > I think the best strategy is to consolidate these as incremental > patches that can be reviewed and merged independently. > >> As for the FVD patch that includes the new testing framework, I tried to >> post it on the mailing list twice but it always got bounced back, either >> because the message is too big or because of a Notes client configuration >> issue. Until I figure it out, please down the FVD patch from >> https://researcher.ibm.com/researcher/files/us-ctang/FVD-01-14-2011.patch > > I'll send you my git-send-email config off-list. > > Stefan >