On Thu, Aug 18, 2016 at 11:27:06AM -0700, Rich Lane wrote: > On Mon, Aug 15, 2016 at 7:37 PM, Yuanhan Liu <yuanhan.liu at linux.intel.com> > wrote: > > On Mon, Aug 15, 2016 at 01:00:24PM -0700, Rich Lane wrote: > > Concurrent enqueue is an important performance optimization when the > number > > of cores used for switching is different than the number of vhost > queues. > > I've observed a 20% performance improvement compared to a strategy that > > binds queues to cores. > > > > The atomic cmpset is only executed when the application calls > > rte_vhost_enqueue_burst_mp. Benchmarks show no performance impact > > when not using concurrent enqueue. > > > > Mergeable RX buffers aren't supported by concurrent enqueue to minimize > > code complexity. > > I think that would break things when Mergeable rx is enabled (which is > actually enabled by default). > > > Would it be reasonable to return -ENOTSUP in this case, and restrict > concurrent > enqueue > to devices where VIRTIO_NET_F_MRG_RXBUF is disabled? > > I could also add back concurrent enqueue support for mergeable RX, but I was > hoping to avoid > that since the mergeable codepath is already complex and wouldn't be used in > high performance > deployments.
Another note is that, you might also have noticed, Zhihong made a patch set [0] to optimize the enqueue code path (mainly on mergeable path). It basically does a rewrite from scatch, which removes the desc buf reservation, meaning it would be harder to do concurrent enqueue support based on that. [0]: Aug 19 Zhihong Wang ( 68) ??>[PATCH v3 0/5] vhost: optimize enqueue --yliu