Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support

Stefan Hajnoczi Mon, 07 Nov 2016 02:24:05 -0800

On Fri, Nov 04, 2016 at 06:30:47PM +0000, Ketan Nilangekar wrote:
> > On Nov 4, 2016, at 2:52 AM, Stefan Hajnoczi <stefa...@gmail.com> wrote:
> >> On Thu, Oct 20, 2016 at 01:31:15AM +0000, Ketan Nilangekar wrote:
> >> 2. The idea of having multi-threaded epoll based network client was to 
> >> drive more throughput by using multiplexed epoll implementation and 
> >> (fairly) distributing IOs from several vdisks (typical VM assumed to have 
> >> atleast 2) across 8 connections. 
> >> Each connection is serviced by single epoll and does not share its context 
> >> with other connections/epoll. All memory pools/queues are in the context 
> >> of a connection/epoll.
> >> The qemu thread enqueues IO request in one of the 8 epoll queues using a 
> >> round-robin. Responses are also handled in the context of an epoll loop 
> >> and do not share context with other epolls. Any synchronization code that 
> >> you see today in the driver callback is code that handles the split IOs 
> >> which we plan to address by a) implementing readv in libqnio and b) 
> >> removing the 4MB limit on write IO size.
> >> The number of client epoll threads (8) is a #define in qnio and can easily 
> >> be changed. However our tests indicate that we are able to drive a good 
> >> number of IOs using 8 threads/epolls.
> >> I am sure there are ways to simplify the library implementation, but for 
> >> now the performance of the epoll threads is more than satisfactory.
> > 
> > By the way, when you benchmark with 8 epoll threads, are there any other
> > guests with vxhs running on the machine?
> > 
> 
> Yes. Infact the total througput with around 4-5 VMs scales well to saturate 
> around 90% of available storage throughput of a typical pcie ssd device.
> 
> > In a real-life situation where multiple VMs are running on a single host
> > it may turn out that giving each VM 8 epoll threads doesn't help at all
> > because the host CPUs are busy with other tasks.
> 
> The exact number of epolls required to achieve optimal throughput may be 
> something that can be adjusted dynamically by the qnio library in subsequent 
> revisions. 
> 
> But as I mentioned today we can change this by simply rebuilding qnio with a 
> different value for the #define


In QEMU there is currently work to add multiqueue support to the block
layer.  This enables true multiqueue from the guest down to the storage
backend.

virtio-blk already supports multiple queues but they are all processed
from the same thread in QEMU today.  Once multiple threads are able to
process the queues it would make sense to continue down into the vxhs
block driver.

So I don't think implementing multiple epoll threads in libqnio is
useful in the long term.  Rather, a straightforward approach of
integrating with the libqnio user's event loop (as described in my
previous emails) would simplify the code and allow you to take advantage
of full multiqueue support in the future.

Stefan

signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support

Reply via email to