On Tue, Mar 25, 2025 at 05:06:35PM +0100, Hanna Czenczek wrote:
> Polling in I/O functions can lead to nested read_from_fuse_export()

"Polling" means several different things. "aio_poll()" or "nested event
loop" would be clearer.

> calls, overwriting the request buffer's content.  The only function
> affected by this is fuse_write(), which therefore must use a bounce
> buffer or corruption may occur.
> 
> Note that in addition we do not know whether libfuse-internal structures
> can cope with this nesting, and even if we did, we probably cannot rely
> on it in the future.  This is the main reason why we want to remove
> libfuse from the I/O path.
> 
> I do not have a good reproducer for this other than:
> 
> $ dd if=/dev/urandom of=image bs=1M count=4096
> $ dd if=/dev/zero of=copy bs=1M count=4096
> $ touch fuse-export
> $ qemu-storage-daemon \
>     --blockdev file,node-name=file,filename=copy \
>     --export \
>     fuse,id=exp,node-name=file,mountpoint=fuse-export,writable=true \
>     &
> 
> Other shell:
> $ qemu-img convert -p -n -f raw -O raw -t none image fuse-export
> $ killall -SIGINT qemu-storage-daemon
> $ qemu-img compare image copy
> Content mismatch at offset 0!
> 
> (The -t none in qemu-img convert is important.)
> 
> I tried reproducing this with throttle and small aio_write requests from
> another qemu-io instance, but for some reason all requests are perfectly
> serialized then.
> 
> I think in theory we should get parallel writes only if we set
> fi->parallel_direct_writes in fuse_open().  In fact, I can confirm that
> if we do that, that throttle-based reproducer works (i.e. does get
> parallel (nested) write requests).  I have no idea why we still get
> parallel requests with qemu-img convert anyway.
> 
> Also, a later patch in this series will set fi->parallel_direct_writes
> and note that it makes basically no difference when running fio on the
> current libfuse-based version of our code.  It does make a difference
> without libfuse.  So something quite fishy is going on.
> 
> I will try to investigate further what the root cause is, but I think
> for now let's assume that calling blk_pwrite() can invalidate the buffer
> contents through nested polling.
> 
> Cc: qemu-sta...@nongnu.org
> Signed-off-by: Hanna Czenczek <hre...@redhat.com>
> ---
>  block/export/fuse.c | 24 +++++++++++++++++++++---
>  1 file changed, 21 insertions(+), 3 deletions(-)
> 
> diff --git a/block/export/fuse.c b/block/export/fuse.c
> index 465cc9891d..a12f479492 100644
> --- a/block/export/fuse.c
> +++ b/block/export/fuse.c
> @@ -301,6 +301,12 @@ static void read_from_fuse_export(void *opaque)
>          goto out;
>      }
>  
> +    /*
> +     * Note that polling in any request-processing function can lead to a 
> nested
> +     * read_from_fuse_export() call, which will overwrite the contents of
> +     * exp->fuse_buf.  Anything that takes a buffer needs to take care that 
> the
> +     * content is copied before potentially polling.
> +     */
>      fuse_session_process_buf(exp->fuse_session, &exp->fuse_buf);

It seems safer to allocate a fuse_buf per request instead copying the
data buffer only for write requests. Other request types might be
affected too (e.g. nested reads of different sizes).

I guess later on in this series a per-request fuse_buf will be
introduced anyway, so it doesn't matter what we do in this commit.

>  
>  out:
> @@ -624,6 +630,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, 
> const char *buf,
>                         size_t size, off_t offset, struct fuse_file_info *fi)
>  {
>      FuseExport *exp = fuse_req_userdata(req);
> +    void *copied;
>      int64_t length;
>      int ret;
>  
> @@ -638,6 +645,14 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, 
> const char *buf,
>          return;
>      }
>  
> +    /*
> +     * Heed the note on read_from_fuse_export(): If we poll (which any 
> blk_*()
> +     * I/O function may do), read_from_fuse_export() may be nested, 
> overwriting
> +     * the request buffer content.  Therefore, we must copy it here.
> +     */
> +    copied = blk_blockalign(exp->common.blk, size);
> +    memcpy(copied, buf, size);
> +
>      /**
>       * Clients will expect short writes at EOF, so we have to limit
>       * offset+size to the image length.
> @@ -645,7 +660,7 @@ static void fuse_write(fuse_req_t req, fuse_ino_t inode, 
> const char *buf,
>      length = blk_getlength(exp->common.blk);
>      if (length < 0) {
>          fuse_reply_err(req, -length);
> -        return;
> +        goto free_buffer;
>      }
>  
>      if (offset + size > length) {
> @@ -653,19 +668,22 @@ static void fuse_write(fuse_req_t req, fuse_ino_t 
> inode, const char *buf,
>              ret = fuse_do_truncate(exp, offset + size, true, 
> PREALLOC_MODE_OFF);
>              if (ret < 0) {
>                  fuse_reply_err(req, -ret);
> -                return;
> +                goto free_buffer;
>              }
>          } else {
>              size = length - offset;
>          }
>      }
>  
> -    ret = blk_pwrite(exp->common.blk, offset, size, buf, 0);
> +    ret = blk_pwrite(exp->common.blk, offset, size, copied, 0);
>      if (ret >= 0) {
>          fuse_reply_write(req, size);
>      } else {
>          fuse_reply_err(req, -ret);
>      }
> +
> +free_buffer:
> +    qemu_vfree(copied);
>  }
>  
>  /**
> -- 
> 2.48.1
> 
> 

Attachment: signature.asc
Description: PGP signature

Reply via email to