On 05/26/22 04:18, Eric Blake wrote: > Demonstrate the bug where an aligned write can be lost if it races > between the read and write of a RMW unaligned write. > --- > > Sending this out for review of the test; it fails (which it is > supposed to as long as I don't fix the blocksize filter), but I hope > it starts passing once I also patch the filter. In the process, I ran > into a libnbd bug where we were using PyByteArray_AsString() > incorrectly, and segfaulting (rather than diagnosing type mismatch) on > something as simple as nbd.Buffer.from_bytearray(b'1'*16) that looks > like it should work; so I posted a patch for that. > > tests/Makefile.am | 2 + > tests/test-blocksize-sharding.sh | 161 +++++++++++++++++++++++++++++++ > 2 files changed, 163 insertions(+) > create mode 100755 tests/test-blocksize-sharding.sh > > diff --git a/tests/Makefile.am b/tests/Makefile.am > index b6d30c0a..67e7618b 100644 > --- a/tests/Makefile.am > +++ b/tests/Makefile.am > @@ -1376,11 +1376,13 @@ TESTS += \ > test-blocksize.sh \ > test-blocksize-extents.sh \ > test-blocksize-default.sh \ > + test-blocksize-sharding.sh \ > $(NULL) > EXTRA_DIST += \ > test-blocksize.sh \ > test-blocksize-extents.sh \ > test-blocksize-default.sh \ > + test-blocksize-sharding.sh \ > $(NULL) > > # blocksize-policy filter test. > diff --git a/tests/test-blocksize-sharding.sh > b/tests/test-blocksize-sharding.sh > new file mode 100755 > index 00000000..09f12051 > --- /dev/null > +++ b/tests/test-blocksize-sharding.sh > @@ -0,0 +1,161 @@ > +#!/usr/bin/env bash > +# nbdkit > +# Copyright (C) 2018-2022 Red Hat Inc. > +# > +# Redistribution and use in source and binary forms, with or without > +# modification, are permitted provided that the following conditions are > +# met: > +# > +# * Redistributions of source code must retain the above copyright > +# notice, this list of conditions and the following disclaimer. > +# > +# * Redistributions in binary form must reproduce the above copyright > +# notice, this list of conditions and the following disclaimer in the > +# documentation and/or other materials provided with the distribution. > +# > +# * Neither the name of Red Hat nor the names of its contributors may be > +# used to endorse or promote products derived from this software without > +# specific prior written permission. > +# > +# THIS SOFTWARE IS PROVIDED BY RED HAT AND CONTRIBUTORS ''AS IS'' AND > +# ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, > +# THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A > +# PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL RED HAT OR > +# CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, > +# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT > +# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF > +# USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND > +# ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, > +# OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT > +# OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF > +# SUCH DAMAGE. > + > +# Demonstrate a fix for a bug where blocksize could lose aligned writes > +# run in parallel with unaligned writes > + > +source ./functions.sh > +set -e > +set -x > + > +requires_plugin eval > +requires_nbdsh_uri > + > +files='blocksize-sharding.img blocksize-sharding.tmp' > +rm -f $files > +cleanup_fn rm -f $files > + > +# Script a server that requires 16-byte aligned requests, and which delays > +# 4s after reads if a witness file exists. Couple it with the delay filter > +# that delays 2s before writes. If an unaligned and aligned write overlap, > +# and can execute in parallel, we would have this timeline: > +# > +# T1 aligned write 1's to 0/16 T2 unaligned write 2's to 4/12 > +#t=0 blocksize: next->pwrite(0, 16) blocksize: next->pread(0, 16) > +# delay: wait 2s delay: next->pread(0, 16) > +# ... eval: read 0's, wait 4s > +#t=2 delay: next->pwrite(0, 16) ... > +# eval: write 1's ... > +# return ... > +#t=4 return 0's (now stale) > +# blocksize: next->pwrite(0, 16) > +# delay: wait 2s > +#t=6 delay: next->pwrite(0, 16) > +# eval: write stale RMW buffer > +# > +# leaving us with a sharded 0000222222222222 (T1's write is lost). > +# But as long as the blocksize filter detects the overlap, we should end > +# up with either 1111222222222222 (aligned write completed first), or with > +# 1111111111111111 (unaligned write completed first), either taking 8s, > +# but with no sharding. > +# > +# We also need an nbdsh script that kicks off parallel writes. > +export script=' > +import os > +import time > + > +witness = os.getenv("witness") > + > +def touch(path): > + open(path, "a").close() > + > +# First pass: check that two aligned operations work in parallel > +# Total time should be closer to 2 seconds, rather than 4 if serialized > +print("sanity check") > +ba1 = bytearray(b"1"*16) > +ba2 = bytearray(b"2"*16) > +buf1 = nbd.Buffer.from_bytearray(ba1) > +buf2 = nbd.Buffer.from_bytearray(ba2) > +touch(witness) > +start_t = time.time() > +h.aio_pwrite(buf1, 0) > +h.aio_pwrite(buf2, 0) > + > +while h.aio_in_flight() > 0: > + h.poll(-1) > +end_t = time.time() > +os.unlink(witness) > + > +# assert h.pread(16, 0) in [b"1"*16, b"2"*16] > +print(h.pread(16,0)) > +t = end_t - start_t > +print(t) > +assert t <= 3 > + > +# Next pass: try to kick off unaligned first > +print("unaligned first") > +h.zero(16, 0) > +ba3 = bytearray(b"3"*12) > +ba4 = bytearray(b"4"*16) > +buf3 = nbd.Buffer.from_bytearray(ba3) > +buf4 = nbd.Buffer.from_bytearray(ba4) > +touch(witness) > +start_t = time.time() > +h.aio_pwrite(buf3, 4) > +h.aio_pwrite(buf4, 0) > + > +while h.aio_in_flight() > 0: > + h.poll(-1) > +end_t = time.time() > +os.unlink(witness) > + > +# assert h.pread(16, 0) in [b"4"*4 + b"3"*12, b"4"*16] > +print(h.pread(16,0)) > +t = end_t - start_t > +print(t) > + > +# Next pass: try to kick off aligned first > +print("aligned first") > +ba5 = bytearray(b"5"*16) > +ba6 = bytearray(b"6"*12) > +buf5 = nbd.Buffer.from_bytearray(ba5) > +buf6 = nbd.Buffer.from_bytearray(ba6) > +h.zero(16, 0) > +touch(witness) > +start_t = time.time() > +h.aio_pwrite(buf5, 0) > +h.aio_pwrite(buf6, 4) > + > +while h.aio_in_flight() > 0: > + h.poll(-1) > +end_t = time.time() > +os.unlink(witness) > + > +assert h.pread(16, 0) in [b"5"*4 + b"6"*12, b"5"*16] > +t = end_t - start_t > +print(t) > +' > + > +# Now run everything > +truncate --size=16 blocksize-sharding.img > +export witness="$PWD/blocksize-sharding.tmp" > +nbdkit -U - --filter=blocksize --filter=delay eval delay-write=2 \ > + config='ln -sf "$(realpath "$3")" $tmpdir/$2' \ > + img="$PWD/blocksize-sharding.img" tmp="$PWD/blocksize-sharding.tmp" \ > + get_size='echo 16' block_size='echo 16 64K 1M' \ > + thread_model='echo parallel' \ > + zero='dd if=/dev/zero skip=$4 count=$3 iflag=count_bytes,skip_bytes' \ > + pread=' > + dd if=$tmpdir/img skip=$4 count=$3 iflag=count_bytes,skip_bytes > + if [ -f $tmpdir/tmp ]; then sleep 4; fi ' \ > + pwrite='dd of=$tmpdir/img seek=$4 conv=notrunc oflag=seek_bytes' \ > + --run 'nbdsh -u "$uri" -c "$script"' >
Looks OK to me. I've made a reasonable effort to review the sequence diagram, the python script, and *parts* of the nbdkit command line (I didn't try to dig down into every detail there). Reviewed-by: Laszlo Ersek <ler...@redhat.com> _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs