On Mon, Mar 06, 2017 at 09:10:15AM +0000, David Hunt wrote: > This patch aims to improve the throughput of the distributor library. > > It uses a similar handshake mechanism to the previous version of > the library, in that bits are used to indicate when packets are ready > to be sent to a worker and ready to be returned from a worker. One main > difference is that instead of sending one packet in a cache line, it makes > use of the 7 free spaces in the same cache line in order to send up to > 8 packets at a time to/from a worker. > > The flow matching algorithm has had significant re-work, and now keeps an > array of inflight flows and an array of backlog flows, and matches incoming > flows to the inflight/backlog flows of all workers so that flow pinning to > workers can be maintained. > > The Flow Match algorithm has both scalar and a vector versions, and a > function pointer is used to select the post appropriate function at run time, > depending on the presence of the SSE2 cpu flag. On non-x86 platforms, the > the scalar match function is selected, which should still gives a good boost > in performance over the non-burst API. > > v9 changes: > * fixed symbol versioning so it will compile on CentOS and RedHat >
I've flagged a number of things that could do with being cleaned up in the patchset. However, the idea itself of adding a new burst-mode to improve distributor performance - and using vector matching to further boost it - is a good improvement. Therefore Series-Acked-by: Bruce Richardson <bruce.richard...@intel.com>