I'm currently reading up on how client requests (T messages) are currently dispatched in general by 9pfs, to understand where potential inefficiencies are that I am encountering.
I mean 9pfs is pretty fast on raw I/O (read/write requests), provided that the message payload on guest side was chosen large enough (e.g. trans=virtio,version=9p2000.L,msize=4194304,...), where I already come close to my test disk's therotical maximum performance on read/write tests. But obviously these are huge 9p requests. However when there are a large number of (i.e. small) 9p requests, no matter what the actual request type is, then I am encountering severe performance issues with 9pfs and I try to understand whether this could be improved with reasonable effort. If I understand it correctly, each incoming request (T message) is dispatched to its own qemu coroutine queue. So individual requests should already be processed in parallel, right? Best regards, Christian Schoenebeck