On Thu, 28 Jan 2021 15:00:58 +0100 Miklos Szeredi <mszer...@redhat.com> wrote:
> On Thu, Jan 28, 2021 at 1:15 PM Greg Kurz <gr...@kaod.org> wrote: > > > > On Wed, 27 Jan 2021 16:52:56 +0100 > > Miklos Szeredi <mszer...@redhat.com> wrote: > > > > > On Wed, Jan 27, 2021 at 4:47 PM Miklos Szeredi <mszer...@redhat.com> > > > wrote: > > > > > > > > On Wed, Jan 27, 2021 at 4:35 PM Greg Kurz <gr...@kaod.org> wrote: > > > > > > > > > > On Wed, 27 Jan 2021 16:22:49 +0100 > > > > > Miklos Szeredi <mszer...@redhat.com> wrote: > > > > > > > > > > > On Wed, Jan 27, 2021 at 4:09 PM Greg Kurz <gr...@kaod.org> wrote: > > > > > > > > > > > > > > On Wed, 27 Jan 2021 15:09:50 +0100 > > > > > > > Miklos Szeredi <mszer...@redhat.com> wrote: > > > > > > > > The semantics of O_CREATE are that it can fail neither because > > > > > > > > the > > > > > > > > file exists nor because it doesn't. This doesn't matter if the > > > > > > > > exported tree is not modified outside of a single guest, > > > > > > > > because of > > > > > > > > locking provided by the guest kernel. > > > > > > > > > > > > > > > > > > > > > > Wrong. O_CREAT can legitimately fail with ENOENT if one element > > > > > > > > > > > > Let me make my statement more precise: > > > > > > > > > > > > O_CREAT cannot fail with ENOENT if parent directory exists > > > > > > throughout > > > > > > the operation. > > > > > > > > > > > > > > > > True, but I still don't see what guarantees guest userspace that the > > > > > parent directory doesn't go away... I must have missed something. > > > > > Please elaborate. > > > > > > > > Generally there's no guarantee, however there can be certain > > > > situations where the caller can indeed rely on the existence of the > > > > parent (e.g. /tmp). > > > > > > Example from the virtiofs repo: > > > > > > https://gitlab.com/virtio-fs/ireg/-/blob/master/ireg.c#L315 > > > https://gitlab.com/virtio-fs/ireg/-/blob/master/ireg.c#L382 > > > > > > In that case breaking O_CREAT would be harmless, since no two > > > instances are allowed anyway, so it would just give a confusing error. > > > But it is breakage and it probably wouldn't be hard to find much worse > > > breakage in real life applications. > > > > > > > Ok, I get your point : a guest userspace application can't expect > > to hit ENOENT when doing open("/some_dir/foo", O_CREAT) even if > > someone else is doing unlink("/some_dir/foo"), which is a different > > case than somebody doing rmdir("/some_dir"). > > > > But we still have a TOCTOU : the open(O_CREAT|O_EXCL) acts as > > the check to use open(O_PATH) and retry+timeout can't fix it > > reliably. > > Right. > > > A possible fix for the case where the race comes from the > > client itself would be to serialize FUSE requests that > > create/remove paths in the same directory. I don't know > > enough the code yet to assess if it's doable though. > > > > Then this would leave the case where something else on > > the host is racing with virtiofsd. IMHO this belongs to > > the broader family of "bad things the host can do > > in our back". This requires a bigger hammer than > > adding band-aids here and there IMHO. So in the > > scope of this patch, I don't think we should retry > > and timeout, but just return whatever errno that > > makes sense. > > I never suggested a timeout, that would indeed be nonsense. > Yeah sorry for that, by timeout I was lazily expressing "retry a bit and bail out if it doesn't work". > Just do a simple retry loop with a counter. I'd set counter to a > small number (5 or whatever), so that basically any accidental races > are cared for, and the only case that would trigger the EIO is if the > file was constantly removed and recreated (and even in that case it > would be pretty difficult to trigger). This would add only minimal > complexity or overhead. > I still don't like the counter thing very much but I can't think of anything better that _works_ in all cases in the short term... so be it. > The proper solution might be adding O_REGULAR, and it actually would > be useful for other O_CREAT users, since it's probably what they want > anyway (but existing semantics can't be changed). > Yeah only the kernel can handle this race gracefully and O_REGULAR would be great, but it might take some time until this percolates up to QEMU. > Thanks, > Miklos >