qemu-iotests #141 is relying on the test being able to operate on a block job it just started before further progress is being made on this block job. This fails regularly on some hosts because the time slice is just 100ms and it often takes longer than that to start the additional processes required to trigger the operation. It's particularly easy to reproduce under 100% CPU load.
I originally noticed and analysed this during 2.6 hard freeze. Eventually the legacy rate limiting code currently used by the block jobs will be replaced by the refactorings to use BlockBackends which have their own rate limiting implementation. There was some hope [1] this would land in 2.7, but since it's not in master yet (at least as of commit a01aef5d) I prepared an alternative fix that can go into 2.7. Sascha Silbe (1): Improve block job rate limiting for small bandwidth values block/commit.c | 13 +++++-------- block/mirror.c | 4 +++- block/stream.c | 12 ++++-------- include/qemu/ratelimit.h | 43 ++++++++++++++++++++++++++++++++++--------- 4 files changed, 46 insertions(+), 26 deletions(-) [1] mid:20160408123115.gh4...@noname.redhat.com "Re: [Qemu-devel] [Qemu-block] [PATCH 6/7] qemu-iotests: 141: reduce likelihood of race condition on systems with fast IO" by Kevin Wolf <kw...@redhat.com> on 2016-04-08. -- 1.9.1