On Mon, Sep 21, 2020 at 5:24 PM Pavel Begunkov <asml.sile...@gmail.com> wrote: > > > > On 22/09/2020 02:51, Andy Lutomirski wrote: > > On Mon, Sep 21, 2020 at 9:15 AM Pavel Begunkov <asml.sile...@gmail.com> > > wrote: > >> > >> On 21/09/2020 19:10, Pavel Begunkov wrote: > >>> On 20/09/2020 01:22, Andy Lutomirski wrote: > >>>> > >>>>> On Sep 19, 2020, at 2:16 PM, Arnd Bergmann <a...@arndb.de> wrote: > >>>>> > >>>>> On Sat, Sep 19, 2020 at 6:21 PM Andy Lutomirski <l...@kernel.org> > >>>>> wrote: > >>>>>>> On Fri, Sep 18, 2020 at 8:16 AM Christoph Hellwig <h...@lst.de> wrote: > >>>>>>> On Fri, Sep 18, 2020 at 02:58:22PM +0100, Al Viro wrote: > >>>>>>>> Said that, why not provide a variant that would take an explicit > >>>>>>>> "is it compat" argument and use it there? And have the normal > >>>>>>>> one pass in_compat_syscall() to that... > >>>>>>> > >>>>>>> That would help to not introduce a regression with this series yes. > >>>>>>> But it wouldn't fix existing bugs when io_uring is used to access > >>>>>>> read or write methods that use in_compat_syscall(). One example that > >>>>>>> I recently ran into is drivers/scsi/sg.c. > >>>>> > >>>>> Ah, so reading /dev/input/event* would suffer from the same issue, > >>>>> and that one would in fact be broken by your patch in the hypothetical > >>>>> case that someone tried to use io_uring to read /dev/input/event on > >>>>> x32... > >>>>> > >>>>> For reference, I checked the socket timestamp handling that has a > >>>>> number of corner cases with time32/time64 formats in compat mode, > >>>>> but none of those appear to be affected by the problem. > >>>>> > >>>>>> Aside from the potentially nasty use of per-task variables, one thing > >>>>>> I don't like about PF_FORCE_COMPAT is that it's one-way. If we're > >>>>>> going to have a generic mechanism for this, shouldn't we allow a full > >>>>>> override of the syscall arch instead of just allowing forcing compat > >>>>>> so that a compat syscall can do a non-compat operation? > >>>>> > >>>>> The only reason it's needed here is that the caller is in a kernel > >>>>> thread rather than a system call. Are there any possible scenarios > >>>>> where one would actually need the opposite? > >>>>> > >>>> > >>>> I can certainly imagine needing to force x32 mode from a kernel thread. > >>>> > >>>> As for the other direction: what exactly are the desired bitness/arch > >>>> semantics of io_uring? Is the operation bitness chosen by the io_uring > >>>> creation or by the io_uring_enter() bitness? > >>> > >>> It's rather the second one. Even though AFAIR it wasn't discussed > >>> specifically, that how it works now (_partially_). > >> > >> Double checked -- I'm wrong, that's the former one. Most of it is based > >> on a flag that was set an creation. > >> > > > > Could we get away with making io_uring_enter() return -EINVAL (or > > maybe -ENOTTY?) if you try to do it with bitness that doesn't match > > the io_uring? And disable SQPOLL in compat mode? > > Something like below. If PF_FORCE_COMPAT or any other solution > doesn't lend by the time, I'll take a look whether other io_uring's > syscalls need similar checks, etc. > > > diff --git a/fs/io_uring.c b/fs/io_uring.c > index 0458f02d4ca8..aab20785fa9a 100644 > --- a/fs/io_uring.c > +++ b/fs/io_uring.c > @@ -8671,6 +8671,10 @@ SYSCALL_DEFINE6(io_uring_enter, unsigned int, fd, u32, > to_submit, > if (ctx->flags & IORING_SETUP_R_DISABLED) > goto out; > > + ret = -EINVAl; > + if (ctx->compat != in_compat_syscall()) > + goto out; > +
This seems entirely reasonable to me. Sharing an io_uring ring between programs with different ABIs seems a bit nutty. > /* > * For SQ polling, the thread will do all submissions and completions. > * Just return the requested submit count, and wake the thread if > @@ -9006,6 +9010,10 @@ static int io_uring_create(unsigned entries, struct > io_uring_params *p, > if (ret) > goto err; > > + ret = -EINVAL; > + if (ctx->compat) > + goto err; > + I may be looking at a different kernel than you, but aren't you preventing creating an io_uring regardless of whether SQPOLL is requested? > /* Only gets the ring fd, doesn't install it in the file table */ > fd = io_uring_get_fd(ctx, &file); > if (fd < 0) { > -- > Pavel Begunkov