On Fri, Nov 04, 2016 at 06:30:47PM +0000, Ketan Nilangekar wrote: > > On Nov 4, 2016, at 2:52 AM, Stefan Hajnoczi <stefa...@gmail.com> wrote: > >> On Thu, Oct 20, 2016 at 01:31:15AM +0000, Ketan Nilangekar wrote: > >> 2. The idea of having multi-threaded epoll based network client was to > >> drive more throughput by using multiplexed epoll implementation and > >> (fairly) distributing IOs from several vdisks (typical VM assumed to have > >> atleast 2) across 8 connections. > >> Each connection is serviced by single epoll and does not share its context > >> with other connections/epoll. All memory pools/queues are in the context > >> of a connection/epoll. > >> The qemu thread enqueues IO request in one of the 8 epoll queues using a > >> round-robin. Responses are also handled in the context of an epoll loop > >> and do not share context with other epolls. Any synchronization code that > >> you see today in the driver callback is code that handles the split IOs > >> which we plan to address by a) implementing readv in libqnio and b) > >> removing the 4MB limit on write IO size. > >> The number of client epoll threads (8) is a #define in qnio and can easily > >> be changed. However our tests indicate that we are able to drive a good > >> number of IOs using 8 threads/epolls. > >> I am sure there are ways to simplify the library implementation, but for > >> now the performance of the epoll threads is more than satisfactory. > > > > By the way, when you benchmark with 8 epoll threads, are there any other > > guests with vxhs running on the machine? > > > > Yes. Infact the total througput with around 4-5 VMs scales well to saturate > around 90% of available storage throughput of a typical pcie ssd device. > > > In a real-life situation where multiple VMs are running on a single host > > it may turn out that giving each VM 8 epoll threads doesn't help at all > > because the host CPUs are busy with other tasks. > > The exact number of epolls required to achieve optimal throughput may be > something that can be adjusted dynamically by the qnio library in subsequent > revisions. > > But as I mentioned today we can change this by simply rebuilding qnio with a > different value for the #define
In QEMU there is currently work to add multiqueue support to the block layer. This enables true multiqueue from the guest down to the storage backend. virtio-blk already supports multiple queues but they are all processed from the same thread in QEMU today. Once multiple threads are able to process the queues it would make sense to continue down into the vxhs block driver. So I don't think implementing multiple epoll threads in libqnio is useful in the long term. Rather, a straightforward approach of integrating with the libqnio user's event loop (as described in my previous emails) would simplify the code and allow you to take advantage of full multiqueue support in the future. Stefan
signature.asc
Description: PGP signature