27/03/2020 09:13, Olivier Matz: > On Fri, Mar 20, 2020 at 03:55:15PM +0000, Alexander Kozyrev wrote: > > Introduction of pinned external buffers doubled memory loads in the > > rte_pktmbuf_prefree_seg() function. Analysis of the generated assembly > > code shows unnecessary load of the pool field of the rte_mbuf structure. > > Here is the snippet of the assembly for "if (!RTE_MBUF_DIRECT(m))": > > Before the change the code was: > > movq 0x18(%rbx), %rax // load the ol_flags field > > test %r13, %rax // check if ol_flags equals to 0x60...0 > > jz 0x9a8718 <Block 2> // jump out to "if (m->next != NULL)" > > After the change the code became: > > movq 0x18(%rbx), %rax // load ol_flags > > test %r14, %rax // check if ol_flags equals to 0x60...0 > > jnz 0x9bea38 <Block 2> // jump in to "if (!RTE_MBUF_HAS_EXTBUF(m)" > > movq 0x48(%rbx), %rax // load the pool field > > jmp 0x9bea78 <Block 7> // jump out to "if (m->next != NULL)" > > Look like this absolutely unneeded memory load of the pool field is an > > optimization for the external buffer case in GCC (4.8.5), since Clang > > generates the same assembly for both before and after the change versions. > > Plus, GCC favors the external buffer case over the simple case. > > This assembly code layout causes the performance degradation because the > > rte_pktmbuf_prefree_seg() function is a part of a very hot path. > > Workaround this compilation issue by moving the check for pinned buffer > > apart from the check for external buffer and restore the initial code > > flow that favors the direct mbuf case over the external one. > > > > Fixes: 6ef1107ad4c6 ("mbuf: detach mbuf with pinned external buffer") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Alexander Kozyrev <akozy...@mellanox.com> > > Acked-by: Viacheslav Ovsiienko <viachesl...@mellanox.com> > > Acked-by: Olivier Matz <olivier.m...@6wind.com> > > Thanks!
Applied, thanks