On Tue, Jul 1, 2014 at 10:31 PM, Stefan Hajnoczi <stefa...@gmail.com> wrote: > On Tue, Jul 1, 2014 at 3:53 PM, Ming Lei <tom.leim...@gmail.com> wrote: >> On Mon, Jun 30, 2014 at 4:08 PM, Stefan Hajnoczi <stefa...@redhat.com> wrote: >>> >>> Try: >>> $ perf record -e syscalls:* --tid <iothread-tid> >>> ^C >>> $ perf script # shows the trace log >>> >>> The difference between syscalls in QEMU 2.0 and qemu.git/master could >>> reveal the problem. >> >> The difference is that there are tons of write() and rt_sigprocmask() >> in qemu.git/master, I guess it is related coroutinue. >> >> For linux-aio, the coroutinue shouldn't be necessary because >> io_submit() won't block at most of times for O_DIRECT read/write. > > You're forgetting about image formats and the other QEMU block layer > features like I/O throttling. They do require coroutines.
I mean from linux-aio view, io_submit() won't block most of times, like your previous implementation of dataplane. > > Are you sure it's the extra syscall overhead? Any ideas for avoiding them? Yes, I am sure, and it can be felt obviously when running perf to trace system call, :-) Let me provide some data when running randread(bs 4k, libaio) from VM for 10sec: 1), qemu.git/master - write(): 731K - rt_sigprocmask(): 417K - read(): 21K - ppoll(): 10K - io_submit(): 5K - io_getevents(): 4K 2), qemu 2.0 - write(): 9K - read(): 28K - ppoll(): 16K - io_submit(): 12K - io_getevents(): 10K > The sigprocmask can probably be optimized away since the thread's > signal mask remains unchanged most of the time. > > I'm not sure what is causing the write(). I am investigating it... Thanks, -- Ming Lei