On Mon, Aug 15, 2016 at 7:37 PM, Yuanhan Liu <yuanhan.liu at linux.intel.com> wrote:
> On Mon, Aug 15, 2016 at 01:00:24PM -0700, Rich Lane wrote: > > Concurrent enqueue is an important performance optimization when the > number > > of cores used for switching is different than the number of vhost queues. > > I've observed a 20% performance improvement compared to a strategy that > > binds queues to cores. > > > > The atomic cmpset is only executed when the application calls > > rte_vhost_enqueue_burst_mp. Benchmarks show no performance impact > > when not using concurrent enqueue. > > > > Mergeable RX buffers aren't supported by concurrent enqueue to minimize > > code complexity. > > I think that would break things when Mergeable rx is enabled (which is > actually enabled by default). > Would it be reasonable to return -ENOTSUP in this case, and restrict concurrent enqueue to devices where VIRTIO_NET_F_MRG_RXBUF is disabled? I could also add back concurrent enqueue support for mergeable RX, but I was hoping to avoid that since the mergeable codepath is already complex and wouldn't be used in high performance deployments. > Besides that, as mentioned in the last week f2f talk, do you think adding > a new flag RTE_VHOST_USER_CONCURRENT_ENQUEUE (for > rte_vhost_driver_register()) > __might__ be a better idea? That could save us a API, to which I don't > object > though. > Sure, I can add a flag instead. That will be similar to how the rte_ring library picks the enqueue method.