On Fri, 15 Nov 2019 02:10:50 +0100 Christian Schoenebeck <qemu_...@crudebyte.com> wrote:
> I'm currently reading up on how client requests (T messages) are currently > dispatched in general by 9pfs, to understand where potential inefficiencies > are that I am encountering. > > I mean 9pfs is pretty fast on raw I/O (read/write requests), provided that > the > message payload on guest side was chosen large enough (e.g. > trans=virtio,version=9p2000.L,msize=4194304,...), where I already come close > to my test disk's therotical maximum performance on read/write tests. But > obviously these are huge 9p requests. > > However when there are a large number of (i.e. small) 9p requests, no matter > what the actual request type is, then I am encountering severe performance > issues with 9pfs and I try to understand whether this could be improved with > reasonable effort. > Thanks for doing that. This is typically the kind of effort I never dared starting on my own. > If I understand it correctly, each incoming request (T message) is dispatched > to its own qemu coroutine queue. So individual requests should already be > processed in parallel, right? > Sort of but not exactly. The real parallelization, ie. doing parallel processing with concurrent threads, doesn't take place on a per-request basis. A typical request is broken down into several calls to the backend which may block because the backend itself calls a syscall that may block in the kernel. Each backend call is thus handled by its own thread from the mainloop thread pool (see hw/9pfs/coth.[ch] for details). The rest of the 9p code, basically everything in 9p.c, is serialized in the mainloop thread. Cheers, -- Greg > Best regards, > Christian Schoenebeck > >