Hi Stephen, Have you tried link time optimization on DPDK application? Does it decrease the I-cache miss rate evidently?
thx & rgds, -Qinglai On Sun, Feb 16, 2014 at 9:02 PM, Stephen Hemminger <stephen at networkplumber.org> wrote: > On Fri, 14 Feb 2014 15:11:29 -0500 > Ymo Lists <ymolists at gmail.com> wrote: > >> "Enqueuing and dequeuing items from an rte_ring using the rings-based PMD >> may be slower than using the native rings API. This is because Intel? DPDK >> Ethernet drivers make use of function pointers to call the appropriate >> enqueue or dequeue functions, while the rte_ring specific functions are >> direct function calls in the code and are often inlined by the compiler." >> >> Is that statement correct ? I would imagine that inlined code would be be >> faster than using function pointers ? > > Actually, the Intel DPDK has a bad case of inlineitis. The code for ring's > and other parts use inline on largish functions which bloats the code without > any perceivable gain in performance. The larger code causes more cache misses > which actually hurt performance. Also using GCC link time optimization helps > to reduce any need for inlining larger code bits.