On May 21 2022, "Richard W.M. Jones" <rjo...@redhat.com> wrote: > On Sat, May 21, 2022 at 01:21:11PM +0100, Nikolaus Rath wrote: >> Hi, >> >> How does the blocksize filter take into account writes that end-up >> overlapping due to read-modify-write cycles? >> >> Specifically, suppose there are two non-overlapping writes handled >> by two different threads, that, due to blocksize requirements, >> overlap when expanded. I think there is a risk that one thread may >> partially undo the work of the other here. >> >> Looking at the code, it seems that writes of unaligned heads and >> tails are protected with a global lock., but writes of aligned data >> can occur concurrently. > > I agree. > > Assuming the underlying plugin is NBDKIT_THREAD_MODEL_PARALLEL and no > other filters impose thread model limits, the blocksize filter does > not limit the thread model, so the thread model of nbdkit would also > be NBDKIT_THREAD_MODEL_PARALLEL. > > That means that two writes either on different connections or > pipelined on the same connection could happen at the same time. > “blocksize_pwrite” would be called concurrently for the two requests. > >> However, does this not miss the case where there is one unaligned >> write that overlaps with an aligned one? >> >> For example, with blocksize 10, we could have: >> >> Thread 1: receives write request for offset=0, size=10 >> Thread 2: receives write request for offset=4, size=16 >> Thread 1: acquires lock, reads bytes 0-4 >> Thread 2: does aligned write (no locking needed), writes bytes 0-10 >> Thread 1: writes bytes 0-10, overwriting data from Thread 2 > > I believe this analysis is correct. (CC'd to Eric who knows a lot > more about this.) > > However I don't think it's a bug. If a client doesn't want writes to > squash each other, then it shouldn't send overlapping requests. I bet > the same thing happens with an SSD.
But the requests are not overlapping from the client point of view. They only become overlapping when the server applies its read-modify-write operation to align them to the blocksize. I think you elsewhere said that the blocksize reported by the NBD server is only a preferred blocksize, so I'd be surprised if not following this "preference" results in data corruption. > NBD_CMD_FLAG_FUA is provided for clients that wish to ensure that a > write has been committed before sending another request. > > Do you have an example of a client which sends overlapping requests > and depends on particular behaviour of the server? You may be able to > get it to work by using nbdkit-noparallel-filter which can be used to > serialize nbdkit. I'm working with the kernel's NBD client, and it would explain all the mysterious data corruption issues that I've seen with the S3 plugin. But I have not yet confirmed definitely that this is the root cause. For now, I'll avoid the blocksize filter and instead do the read-modify-write in the plugin with proper locking. If that fixes it, then I think we can conclude that the kernel is sending such requests (but, as I said above, I would not consider them overlapping nor would I consider this a bug). Best, -Nikolaus -- GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F »Time flies like an arrow, fruit flies like a Banana.« _______________________________________________ Libguestfs mailing list Libguestfs@redhat.com https://listman.redhat.com/mailman/listinfo/libguestfs