On Tue, Dec 8, 2020 at 8:11 AM Stefan Hajnoczi <stefa...@redhat.com> wrote:
>
> On Thu, Oct 22, 2020 at 05:29:16PM +0100, Fam Zheng wrote:
> > On Tue, 2020-10-20 at 09:34 +0800, Zhenyu Ye wrote:
> > > On 2020/10/19 21:25, Paolo Bonzini wrote:
> > > > On 19/10/20 14:40, Zhenyu Ye wrote:
> > > > > The kernel backtrace for io_submit in GUEST is:
> > > > >
> > > > >         guest# ./offcputime -K -p `pgrep -nx fio`
> > > > >             b'finish_task_switch'
> > > > >             b'__schedule'
> > > > >             b'schedule'
> > > > >             b'io_schedule'
> > > > >             b'blk_mq_get_tag'
> > > > >             b'blk_mq_get_request'
> > > > >             b'blk_mq_make_request'
> > > > >             b'generic_make_request'
> > > > >             b'submit_bio'
> > > > >             b'blkdev_direct_IO'
> > > > >             b'generic_file_read_iter'
> > > > >             b'aio_read'
> > > > >             b'io_submit_one'
> > > > >             b'__x64_sys_io_submit'
> > > > >             b'do_syscall_64'
> > > > >             b'entry_SYSCALL_64_after_hwframe'
> > > > >             -                fio (1464)
> > > > >                 40031912
> > > > >
> > > > > And Linux io_uring can avoid the latency problem.
> >
> > Thanks for the info. What this tells us is basically the inflight
> > requests are high. It's sad that the linux-aio is in practice
> > implemented as a blocking API.

it is.

> >
> > Host side backtrace will be of more help. Can you get that too?
>
> I guess Linux AIO didn't set the BLK_MQ_REQ_NOWAIT flag so the task went
> to sleep when it ran out of blk-mq tags. The easiest solution is to move
> to io_uring. Linux AIO is broken - it's not AIO :).

Agree!
>
> If we know that no other process is writing to the host block device
> then maybe we can determine the blk-mq tags limit (the queue depth) and
> avoid sending more requests. That way QEMU doesn't block, but I don't
> think this approach works when other processes are submitting I/O to the
> same host block device :(.
>
> Fam's original suggestion of invoking io_submit(2) from a worker thread
> is an option, but I'm afraid it will slow down the uncontended case.
>
> I'm CCing Glauber in case he battled this in the past in ScyllaDB.

We have, and a lot. I don't recall seeing this particular lock, but
XFS would block us all the time
if it had to update metadata to submit the operation, lock inodes, etc.

The work we did at the time was in fixing those things in the kernel
as much as we could.
But the API is just like that...

>
> Stefan

Reply via email to