On 18/10/2017 19:27, Jeff Cody wrote: > On Wed, Oct 18, 2017 at 06:39:26PM +0200, Paolo Bonzini wrote: >> On 18/10/2017 18:19, Jeff Cody wrote: >>>>> Here is what we need from common.rc for this series: >>>>> >>>>> _rm_test_img >>>>> _cleanup_nbd >>>>> _cleanup_vxhs >>>>> _cleanup_rbd >>>>> _cleanup_sheepdog >>>>> _cleanup_protocols >>>>> _cleanup_test_img >>>>> >>>>> They all have a common theme (cleanup), so I could move them all to a >>>>> common.cleanup (naming suggestion?) file (which would need to be included >>>>> by >>>>> common.rc, as well). >>>>> >>>>> Would this be a strong enough delineation to overcome your concerns? >>>> >>>> A great start. Which of these are actually needed by the tests (and >>>> hence by common.rc) and why? >>> >>> Some tests are written such that they do intermediate cleanups between >>> multiple internal sub-tests for varying reasons, and so use those cleanup >>> functions as part of their testing. The function _cleanup_test_img >>> effectively calls all the other functions I listed, so they are really all >>> required for the tests, if they choose to call _cleanup_test_img. >>> >>> And for 'check' to tear everything down to a clean state, it also needs to >>> use the cleanup functions for everything that is not just a file/directory. >> >> Do these tests really need the "cleanup protocols" part, because the few >> that have more than one _cleanup_test_img (059, 066, 070, 084, 146, 171) >> are either file-only or non-raw, so they should be able to just rebuild >> the format on top of the same image. >> > > Maybe not, but that could just be the nature of what bugs we are testing for > at this time. These specific tests may not need to, but it is not > inconceivable to have a test that involves bringing up and tearing down > a protocol based server some arbitrary number times, to test that specific > behavior.
Right, but such a test should probably be protocol-specific; it can just use "file" and invoke QEMU_NBD on its own for example, similar to what tests 058 and 162 already do. For example, it's very different if you delete and recreate a raw image _and_ rerun qemu-nbd, or only do the latter. >> Maybe that's where the separation lies---protocol vs. format, where >> cleaning up the "file" protocol need not do anything because it's done >> when removing the test directory. If that's the case, it'd be nice >> because it might also make it much easier to tackle the issue with >> parallel tests. > > On final exit, yes, a test needs not remember to remove all of its mouse > droppings. But as far as not needing to remove images in intermediate > stages of a given test, I think that assumes too much. For instance, > qemu-img _should_ be able to rebuild a format on top of the same image. But > maybe a test wants to see if that specific functionality actually works as > intended, and compares removing and creating an image to rebuilding on top > of an image, etc. Right, but let's draw a line, does such a test need to support multiple protocols? For example: - 083 and 143 for example are very much NBD-specific; 083 uses "_supported_proto nbd", while 143 probably should be using either "file" or "nbd". - only 058 and 162 are running qemu-nbd manually, and they actually start a _new_ NBD protocol server In addition not many tests do not use _make_test_img, as seen with "git grep -LE '_make_test_img|bin/env python' [0-9][0-9][0-9]". They are: - 064, 070 and 135 which use the sample image and thus _should_ be using file - 049, 082, 111 which is creating images with "qemu-img create"; 049 and 111 might actually use nfs too. 150 is using "qemu-img convert", which is also creation more or less - 083 is NBD-specific because it uses the fault injector - 113 probably should be more generic than just NBD - 128 is pretty unique in that it creates a Linux device mapper device (!) - 143 probably should be using either "file" or "nbd". - 162 is also a bit unique in that it tests null-co:// and disregards IMGFMT/IMGPROTO So it does seem that handling the protocol in "check" is a good approximation. And by the way, the only existing uses of "_supported_proto" are _supported_proto file _supported_proto file nfs _supported_proto file sheepdog rbd nfs _supported_proto generic _supported_proto nbd _supported_proto nfs so another thing to do might be to change _supported_proto to a feature-based test: - file/sheepdog/rbd/nfs are those that support resizing (aka "truncate") - file/nfs are generally those that support "qemu-img create" (plus others that should be "generic" actually, including 090, 103, 107); 150 is "file" because it needs "map" in addition to "create". - nfs is used in test 173 to test protocol-based images with relative filename backing strings; it probably can run on file too, anyway that's a third important discriminating feature - nbd is used in a bunch of random tests that can be made more generic (094, 113, 119, 123). Only 083 is NBD-specific because it uses the NBD fault injector, and it actually doesn't use the protocol that is created by the caller. Since you are doing a lot of whole-directory moves, "_supported" tags could become a separate configuration file, e.g. ---- # all generic, no need to include it #[001] [025] fmt=raw qcow2 qed # specify feature proto=+resize [143] fmt=generic proto=nbd start_protocol=no [150] fmt=raw qcow2 proto=+create +map ---- The few tests using start_protocol=no would have to ensure parallel-safe operation themselves. This would also enable a more consistent handling across shell and Python tests. So, this is why I was wondering whether patches 3/4 kinda paint ourselves in the corner. So, looking at the patches: - 1, 2, 7, 8, 9 are definitely good ideas, and should be done _before_ an eventual/hypothetical Python rewrite of "check". - for 5, 6 I think we should be using shell job control instead in "check" ('set -m') #! /bin/sh set -m # Start a job which leaves two processes behind. By starting it # in the background, we can get the leader process's pid in $! # That pid is also the process group id of the whole job. sh -c 'echo subshell pid is $$; sleep 10 | sleep 15 &' & pgrp=$! wait echo '$! is '$pgrp', killing all processes in that group:' pgrep -g $pgrp -a kill -TERM -$pgrp sleep 1 echo Leftover processes have been killed: ps axo pid,ppid,pgrp,stat,tty,comm|grep sleep (Python can do the same by setting preexec_fn=os.setpgrp when calling Subprocess.popen; then you can kill the entire group with os.killpg) - 10 depends on everything before. SAD! ;) It's a pity to lose the cleanup in patch 4, but I think it's better in the long term if we have a clear idea of the tests' tasks vs. the harness's. Thanks, Paolo