From: Patrick Fu <patrick...@intel.com> Performing large memory copies usually takes up a major part of CPU cycles and becomes the hot spot in vhost-user enqueue operation. To offload expensive memory operations from the CPU, this patch set proposes to leverage DMA engines, e.g., I/OAT, a DMA engine in the Intel's processor, to accelerate large copies.
Large copies are offloaded from the CPU to the DMA in an asynchronous manner. The CPU just submits copy jobs to the DMA but without waiting for its copy completion. Thus, there is no CPU intervention during data transfer; we can save precious CPU cycles and improve the overall throughput for vhost-user based applications, like OVS. During packet transmission, it offloads large copies to the DMA and performs small copies by the CPU, due to startup overheads associated with the DMA. This patch set construct a general framework that applications can leverage to attach DMA channels with vhost-user transmit queues. Four new RTE APIs are introduced to vhost library for applications to register and use the asynchronous data path. In addition, two new DMA operation callbacks are defined, by which vhost-user asynchronous data path can interact with DMA hardware. Currently only enqueue operation for split queue is implemented, but the frame is flexible to extend support for dequeue & packed queue. Patrick Fu (2): vhost: introduce async data path registration API vhost: introduce async enqueue for split ring lib/librte_vhost/Makefile | 3 +- lib/librte_vhost/rte_vhost.h | 1 + lib/librte_vhost/rte_vhost_async.h | 172 ++++++++++++ lib/librte_vhost/socket.c | 20 ++ lib/librte_vhost/vhost.c | 74 ++++- lib/librte_vhost/vhost.h | 30 ++- lib/librte_vhost/vhost_user.c | 28 +- lib/librte_vhost/virtio_net.c | 538 ++++++++++++++++++++++++++++++++++++- 8 files changed, 857 insertions(+), 9 deletions(-) create mode 100644 lib/librte_vhost/rte_vhost_async.h -- 1.8.3.1