On Tue, 20 Jul 2021 16:37:01 +0200 Greg Kurz <gr...@kaod.org> wrote: > On Tue, 20 Jul 2021 13:26:50 +0200 > Christian Schoenebeck <qemu_...@crudebyte.com> wrote: > > > On Dienstag, 20. Juli 2021 08:27:45 CEST Petr Pisar wrote: > > > V Mon, Jul 19, 2021 at 03:39:53PM -0500, Paul Eggert napsal(a): > > > > On 7/19/21 7:54 AM, Christian Schoenebeck wrote: > > > > > POSIX compliant applications must always expect that read() / > > > > > write() functions might read/write less bytes than requested > > > > > > > > Although that's true in general, it's not true for regular files. The > > > > POSIX spec for 'read' > > > > <https://pubs.opengroup.org/onlinepubs/9699919799/functions/pread.html> > > > > says, "The value returned may be less than /nbyte/ if the number of > > > > bytes left in the file is less than /nbyte/, if the /read/() request was > > > > interrupted by a signal, or if the file is a pipe or FIFO or special > > > > file and has fewer than /nbyte/ bytes immediately available for > > > > reading." > > > > > > > > So, regular files shouldn't get short reads unless there's an EOF or a > > > > signal. > > > > > > What does gaurantee there is no signal sent to the process? > > > > > > -- Petr > > > > Well, that's one point, but I cannot deny that Paul has a valid argument > > there > > as well. > > > > However it is common practice to make applications capable to deal with > > short > > reads independent of any prerequisites like specific file types. And like I > > said in my previous email, as far as the Linux kernel is concerned, it > > clearly > > sais that applications must be capable of short reads at any time and > > independent of a specific file type. BSD is yet a another story though. > > > > And BTW it is actually not QEMU responsible for this particular behaviour, > > but > > rather the stock Linux kernel's 9p client that exposes this behaviour to > > applications: https://github.com/torvalds/linux/tree/master/fs/9p > > > > I agree that nothing can be done at the QEMU level to fix that : the virtio-9p > device is simply filling the buffer sized by the client at mount time. It > doesn't > know anything about the count argument passed to read() by the application. > > So I had a look at the 9p client code in linux and we have : > > static ssize_t > v9fs_file_read_iter(struct kiocb *iocb, struct iov_iter *to) > { > struct p9_fid *fid = iocb->ki_filp->private_data; > int ret, err = 0; > > p9_debug(P9_DEBUG_VFS, "count %zu offset %lld\n", > iov_iter_count(to), iocb->ki_pos); > > if (iocb->ki_filp->f_flags & O_NONBLOCK) > ret = p9_client_read_once(fid, iocb->ki_pos, to, &err); > else > ret = p9_client_read(fid, iocb->ki_pos, to, &err); > if (!ret) > return err; > > iocb->ki_pos += ret; > return ret; > } > > p9_client_read_once() sends a single request and propagates > short reads to the caller, while p9_client_read() implements > a full read, i.e. loops on p9_client_read_once() until all the > requested data was read. > > strace on tar shows that tar is setting the O_NONBLOCK flag: > > openat(AT_FDCWD, "register.h", > O_RDONLY|O_NOCTTY|O_NONBLOCK|O_NOFOLLOW|O_CLOEXEC) = 4 > ... > read(4, "/*\n * Copyright 2006-2018 Thoma"..., 9728) = 4096 > > This explains why tag is getting short reads. > > Looking more closely at what POSIX says about O_NONBLOCK: > > When attempting to read a file (other than a pipe or FIFO) that supports > non-blocking reads and has no data currently available: > > - If O_NONBLOCK is set, read() shall return -1 and set errno to [EAGAIN]. > > - If O_NONBLOCK is clear, read() shall block the calling thread until > some data becomes available. > > - The use of the O_NONBLOCK flag has no effect if there is some data > available. > > and > > [EAGAIN] > The file is neither a pipe, nor a FIFO, nor a socket, the O_NONBLOCK flag > is set for the file descriptor, and the thread would be delayed in the read > operation. > > The case of the reported issue is thus "O_NONBLOCK is set and some data > is available", which should lead O_NONBLOCK to be ignored, i.e. switch > to a full read instead of propagating the short read IIUC. > > Makes sense ? >
I was thinking to something like that (not tested yet): --- a/fs/9p/vfs_file.c +++ b/fs/9p/vfs_file.c @@ -389,8 +389,22 @@ v9fs_file_read_iter(struct kiocb *iocb, struct iov_iter *t> p9_debug(P9_DEBUG_VFS, "count %zu offset %lld\n", iov_iter_count(to), iocb->ki_pos); - if (iocb->ki_filp->f_flags & O_NONBLOCK) + if (iocb->ki_filp->f_flags & O_NONBLOCK) { + size_t count = iov_iter_count(to); + ret = p9_client_read_once(fid, iocb->ki_pos, to, &err); + if (!ret) + return err; + + /* + * POSIX requires to ignore O_NONBLOCK if some data is + * already available. + */ + if (ret != count) { + iocb->ki_pos += ret; + ret = p9_client_read(fid, iocb->ki_pos, to, &err); + } + } else ret = p9_client_read(fid, iocb->ki_pos, to, &err); if (!ret) > Cc'ing Dominique and v9fs-developer for greater audience. > > > Independent of 9p, you may encounter short reads with network mounted file > > systems in general as well. > > > > The rationale behind this exposed behaviour is to allow each application to > > decide whether they want to consume the partial data currently available > > and > > (potentially) reduce the app's overall execution time, or rather to wait > > for > > the full amount of data to become available by calling read() again. > > > > Was there a specific reason in the past for tar to switch from gnulib's > > (short-read capable) full_read() to safe_read() in 1999? > > > > Best regards, > > Christian Schoenebeck > > > > > > > > _______________________________________________ > V9fs-developer mailing list > v9fs-develo...@lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/v9fs-developer