On Tue, Mar 25, 2025 at 05:06:35PM +0100, Hanna Czenczek wrote: > Polling in I/O functions can lead to nested read_from_fuse_export()
"Polling" means several different things. "aio_poll()" or "nested event loop" would be clearer. > calls, overwriting the request buffer's content. The only function > affected by this is fuse_write(), which therefore must use a bounce > buffer or corruption may occur. > > Note that in addition we do not know whether libfuse-internal structures > can cope with this nesting, and even if we did, we probably cannot rely > on it in the future. This is the main reason why we want to remove > libfuse from the I/O path. > > I do not have a good reproducer for this other than: > > $ dd if=/dev/urandom of=image bs=1M count=4096 > $ dd if=/dev/zero of=copy bs=1M count=4096 > $ touch fuse-export > $ qemu-storage-daemon \ > --blockdev file,node-name=file,filename=copy \ > --export \ > fuse,id=exp,node-name=file,mountpoint=fuse-export,writable=true \ > & > > Other shell: > $ qemu-img convert -p -n -f raw -O raw -t none image fuse-export > $ killall -SIGINT qemu-storage-daemon > $ qemu-img compare image copy > Content mismatch at offset 0! > > (The -t none in qemu-img convert is important.) > > I tried reproducing this with throttle and small aio_write requests from > another qemu-io instance, but for some reason all requests are perfectly > serialized then. > > I think in theory we should get parallel writes only if we set > fi->parallel_direct_writes in fuse_open(). In fact, I can confirm that > if we do that, that throttle-based reproducer works (i.e. does get > parallel (nested) write requests). I have no idea why we still get > parallel requests with qemu-img convert anyway. > > Also, a later patch in this series will set fi->parallel_direct_writes > and note that it makes basically no difference when running fio on the > current libfuse-based version of our code. It does make a difference > without libfuse. So something quite fishy is going on. > > I will try to investigate further what the root cause is, but I think > for now let's assume that calling blk_pwrite() can invalidate the buffer > contents through nested polling. > > Cc: qemu-sta...@nongnu.org > Signed-off-by: Hanna Czenczek <hre...@redhat.com> > --- > block/export/fuse.c | 24 +++++++++++++++++++++--- > 1 file changed, 21 insertions(+), 3 deletions(-) > > diff --git a/block/export/fuse.c b/block/export/fuse.c > index 465cc9891d..a12f479492 100644 > --- a/block/export/fuse.c > +++ b/block/export/fuse.c > @@ -301,6 +301,12 @@ static void read_from_fuse_export(void *opaque) > goto out; > } > > + /* > + * Note that polling in any request-processing function can lead to a > nested > + * read_from_fuse_export() call, which will overwrite the contents of > + * exp->fuse_buf. Anything that takes a buffer needs to take care that > the > + * content is copied before potentially polling. > + */ > fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf); It seems safer to allocate a fuse_buf per request instead copying the data buffer only for write requests. Other request types might be affected too (e.g. nested reads of different sizes). I guess later on in this series a per-request fuse_buf will be introduced anyway, so it doesn't matter what we do in this commit. > > out: > @@ -624,6 +630,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, > const char *buf, > size_t size, off_t offset, struct fuse_file_info *fi) > { > FuseExport *exp = fuse_req_userdata(req); > + void *copied; > int64_t length; > int ret; > > @@ -638,6 +645,14 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, > const char *buf, > return; > } > > + /* > + * Heed the note on read_from_fuse_export(): If we poll (which any > blk_*() > + * I/O function may do), read_from_fuse_export() may be nested, > overwriting > + * the request buffer content. Therefore, we must copy it here. > + */ > + copied = blk_blockalign(exp->common.blk, size); > + memcpy(copied, buf, size); > + > /** > * Clients will expect short writes at EOF, so we have to limit > * offset+size to the image length. > @@ -645,7 +660,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, > const char *buf, > length = blk_getlength(exp->common.blk); > if (length < 0) { > fuse_reply_err(req, -length); > - return; > + goto free_buffer; > } > > if (offset + size > length) { > @@ -653,19 +668,22 @@ static void fuse_write(fuse_req_t req, fuse_ino_t > inode, const char *buf, > ret = fuse_do_truncate(exp, offset + size, true, > PREALLOC_MODE_OFF); > if (ret < 0) { > fuse_reply_err(req, -ret); > - return; > + goto free_buffer; > } > } else { > size = length - offset; > } > } > > - ret = blk_pwrite(exp->common.blk, offset, size, buf, 0); > + ret = blk_pwrite(exp->common.blk, offset, size, copied, 0); > if (ret >= 0) { > fuse_reply_write(req, size); > } else { > fuse_reply_err(req, -ret); > } > + > +free_buffer: > + qemu_vfree(copied); > } > > /** > -- > 2.48.1 > >
signature.asc
Description: PGP signature