On Wed, 27 Jan 2021 15:09:50 +0100 Miklos Szeredi <mszer...@redhat.com> wrote:
> On Wed, Jan 27, 2021 at 2:49 PM Greg Kurz <gr...@kaod.org> wrote: > > > > On Wed, 27 Jan 2021 11:34:52 +0100 > > Miklos Szeredi <mszer...@redhat.com> wrote: > > > > Another solution specifically for O_CREAT without O_EXCL would be to > > > turn it into an exclusive create. > > > > Would this added O_EXCL then appear on the client side, e.g. to > > guest userspace doing fcntl(F_GETFL) ? > > No. Guest kernel keeps track of open flags. > That was my impression as well as I didn't find a FUSE_GETFL request. Thanks for confirming that ! > > > If that fails with EEXIST then try > > > the normal open path (open with O_PATH, fstat, open proc symlink). If > > > > open(O_PATH | O_NOFOLLOW) + fstatat(AT_EMPTY_PATH|AT_SYMLINK_NOFOLLOW) > > would indeed allow to filter out anything that isn't a directory and > > to safely open the proc symlink. > > > > > that fails with ENOENT, then retry the whole thing a certain number of > > > > Indeed someone could have unlinked the file in the meantime, in which > > case the open(O_PATH | O_NOFOLLOW) would fail, but if it succeeds then > > we cannot hit ENOENT anymore AFAICT. > > Right. > > > > times. If it still fails then somebody is definitely messing with us > > > and we can fail the request with EIO. > > > > > > > Not sure what the retry+timeout is trying to mitigate here... why not > > returning EIO right away ? > > The semantics of O_CREATE are that it can fail neither because the > file exists nor because it doesn't. This doesn't matter if the > exported tree is not modified outside of a single guest, because of > locking provided by the guest kernel. > Wrong. O_CREAT can legitimately fail with ENOENT if one element of the pathname doesn't exist. And even if pathname only has one element, you can still have O_CREAT to fail the same way if the path of the parent directory is removed. cat>enoent.c<<EOF #include <sys/types.h> #include <sys/stat.h> #include <fcntl.h> #include <unistd.h> int main(int argc, char **argv) { mkdir("foo", 0777); chdir("foo"); rmdir("../foo"); open("bar", O_CREAT); } EOF make enoent strace ./enoent [...] mkdir("foo", 0777) = 0 chdir("foo") = 0 rmdir("../foo") = 0 openat(AT_FDCWD, "bar", O_RDONLY|O_CREAT, 0117130) = -1 ENOENT (No such file or directory) > However if we want to support shared access to a tree then O_CREAT > semantics should work even in the face of races due to external > modification of the tree. I.e. following race: > Yeah, handling shared access is where the fun starts :) > virtiofsd: open(foo, O_CREAT | O_EXCL) -> EEXIST > other task: unlink(foo) > virtiofsd: open(foo, O_PATH | O_NOFOLLOW) -> ENOENT > > To properly support the above the O_CREAT | O_EXCL open would need to > be retried. > But in this case, it seems fine to return ENOENT since the guest userspace cannot really assume it never happens. > Thanks, > Miklos >