On 2021-08-24 09:43, Mattias Rönnblom wrote: > On 2021-08-23 21:40, pbhagavat...@marvell.com wrote: >> From: Pavan Nikhilesh <pbhagavat...@marvell.com> >> >> Mark all the driver specific functions as internal, remove >> `rte` prefix from `struct rte_eventdev_ops`. >> Remove experimental tag from internal functions. >> Remove `eventdev_pmd.h` from non-internal header files. >> > Is the enqueue/dequeue shortcut still worth the trouble? Considering the > size of this patch set, it seems to be a lot of trouble to handle this > special case. > >
I had a quick look at this, using an overhead measurement benchmark for DSW. Depending on compiler version and details of the test program's structure, the gains ranged from modest to non-existent. In some scenarios, the inline versions even performed more poorly than a function call proper. This was on a Intel Skylake and static DPDK linking. The dev and port lookup are essentially a very short pointer chase, and in case the dev table and the dev struct itself is not in a close cache, significant stalls may occur. For most applications they will be in L1 though, I imagine. The inline version should give the compiler some freedom to generate the appropriate loads earlier. If you insert a compiler barrier before the rte_event_*() call, the inline version seem to have no gains at all. Did anyone else attempt to quantify the performance gains with keeping these functions as inline? /M