Hi Stefan, 1. You suggestion to move the failover implementation to libqnio is well taken. Infact we are proposing that service/network failovers should not be handled in gemu address space at all. The vxhs driver will know and talk to only a single virtual IP. The service behind the virtual IP may fail and move to another node without the qemu driver noticing it.
This way the failover logic will be completely out of qemu address space. We are considering use of some of our proprietary clustering/monitoring services to implement service failover. 2. The idea of having multi-threaded epoll based network client was to drive more throughput by using multiplexed epoll implementation and (fairly) distributing IOs from several vdisks (typical VM assumed to have atleast 2) across 8 connections. Each connection is serviced by single epoll and does not share its context with other connections/epoll. All memory pools/queues are in the context of a connection/epoll. The qemu thread enqueues IO request in one of the 8 epoll queues using a round-robin. Responses are also handled in the context of an epoll loop and do not share context with other epolls. Any synchronization code that you see today in the driver callback is code that handles the split IOs which we plan to address by a) implementing readv in libqnio and b) removing the 4MB limit on write IO size. The number of client epoll threads (8) is a #define in qnio and can easily be changed. However our tests indicate that we are able to drive a good number of IOs using 8 threads/epolls. I am sure there are ways to simplify the library implementation, but for now the performance of the epoll threads is more than satisfactory. Let us know what you think about these proposals. Thanks, Ketan. On 9/30/16, 1:36 AM, "Stefan Hajnoczi" <stefa...@gmail.com> wrote: >On Tue, Sep 27, 2016 at 09:09:49PM -0700, Ashish Mittal wrote: >> This patch adds support for a new block device type called "vxhs". >> Source code for the library that this code loads can be downloaded from: >> https://github.com/MittalAshish/libqnio.git > >The QEMU block driver should deal with BlockDriver<->libqnio integration >and libqnio should deal with vxhs logic (network protocol, failover, >etc). Right now the vxhs logic is spread between both components. If >responsibilities aren't cleanly separated between QEMU and libqnio then >I see no point in having libqnio. > >Failover code should move into libqnio so that programs using libqnio >avoid duplicating the failover code. > >Similarly IIO_IO_BUF_SIZE/segments should be handled internally by >libqnio so programs using libqnio do not duplicate this code. > >libqnio itself can be simplified significantly: > >The multi-threading is not necessary and adds complexity. Right now >there seem to be two reasons for multi-threading: shared contexts and >the epoll thread. Both can be eliminated as follows. > >Shared contexts do not make sense in a multi-disk, multi-core >environment. Why is it advantages to tie disks to a single context? >It's simpler and more multi-core friendly to let every disk have its own >connection. > >The epoll thread forces library users to use thread synchronization when >processing callbacks. Look at libiscsi for an example of how to >eliminate it. Two APIs are defined: int iscsi_get_fd(iscsi) and int >iscsi_which_events(iscsi) (e.g. POLLIN, POLLOUT). The program using the >library can integrate the fd into its own event loop. The advantage of >doing this is that no library threads are necessary and all callbacks >are invoked from the program's event loop. Therefore no thread >synchronization is needed. > >If you make these changes then all multi-threading in libqnio and the >QEMU block driver can be dropped. There will be less code and it will >be simpler.