Paul Eggert wrote on Mon, Jul 27, 2020 at 11:38:34AM -0700: > On 7/27/20 6:33 AM, Dominique Martinet wrote: > >It's interesting to see an application actually does open regular files > >with O_NONBLOCK; when this was discussed in the pull request[2] > >basically the agreement was that apps should either just not be setting > >nonblock (which ought to be no-op on regular files), > > O_NONBLOCK is needed there because the file might be a FIFO or a special > file, and 'tar' doesn't want to hang forever trying to open the file. And > 'tar' shouldn't stat the file first to see whether the file is a FIFO or a > special file, as that (a) would be slower and (b) would introduce a race > condition.
But we do stat before, and have that race condition already, because of the 0-blocks sparse file optimization at least ? Looking at the code, for the path I'm looking at (with `strace -k -e trace=%file tar -cf /tmp/foo somefile`), there is in dump_file0 a fstatat call followed by subfile_open that does the actual open and the flags are passed down there at a point where we have the stat struct. It would be trivial to have different flags based on filetypes. Now I can understand not wanting to workaround filesystem quirks in all cases, but the feature is really useful for synthetic filesystems so just retrying reads as the patch does is fine for me. > >or would handle > >short reads so it would probably be fine⢠> >(But I guess life always proves us wrong, sorry for the trouble) > > Yes, POSIX requires that you don't get a short read when you're reading a > regular file (unless it's at EOF or you're interrupted by a signal), and I > assume 'tar' and other programs use this to avoid an unnecessary read of 0 > bytes at end of file. Hm, POSIX.1-2008 apparently states The value returned may be less than nbyte if the number of bytes left in the file is less than nbyte, if the read() request was interrupted by a signal, or if the file is a pipe or FIFO or special file and has fewer than nbyte bytes immediately available for reading. So even if the file is regular, it can legitimately be interrupted by a signal. In practice, linux traps signals (hence the "new" task_killable state that lets non-catchable signals interrupt a syscall but softer signals get queued and sent after the syscall) so this particular point doesn't happen on local filesystems, but I don't understand this as POSIX requires full reads or writes. There also have been bugs regularily, e.g. I've seen lustre return short reads multiple times over the years. They've always done their best to fix these (and I'd agree with a file opened with plain flags I would also consider this a bug), but for network filesystems I don't think it's safe to assume you'll always get full reads even if you consider POSIX requires it -- bugs happen and network filesystems are hard. Well; I don't know. I'm open to discussion on how to improve 9p as well in parallel, but I think the safe approach would be to work on both. So far, my stance on 9p hasn't changed: the usecase is valid and I'm looking for better ideas to permit short reads occasionally with less fale positives like we have here (maybe combine the O_NONBLOCK flag with a mount option like nfs used to ?) -- Dominique