Hi Bruce, > -----Original Message----- > From: Bruce Richardson <bruce.richard...@intel.com> > Sent: Friday, October 16, 2020 12:59 PM > To: Ali Alnubani <alia...@nvidia.com> > Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon > <tho...@monjalon.net>; Asaf Penso <as...@nvidia.com> > Subject: Re: [dpdk-dev] performance degradation with fpic > > On Thu, Oct 15, 2020 at 06:08:04PM +0100, Bruce Richardson wrote: > > On Thu, Oct 15, 2020 at 04:00:44PM +0000, Ali Alnubani wrote: > > > Hi Bruce, > > > > > > > > > We have been seeing in some cases that the DPDK forwarding > performance > > > is up to 9% lower when DPDK is built as static with meson compared to a > > > build with makefiles. > > > > > > > > > The same degradation can be reproduced with makefiles on older DPDK > > > releases when building with EXTAR_CFLAGS set to “-fPIC”, it can also be > > > resolved in meson when passing “pic: false” to meson’s static_library > > > call (more tweaking needs to be done to prevent building shared > > > libraries because this change breaks them). > > > > > > > > > I can reproduce this drop with the following cases: > > > * Baremetal / NIC: ConnectX-4 Lx / OS: RHEL7.4 / CPU: Intel(R) > > > Xeon(R) Gold 6154. Testpmd command: > > > > > > testpmd -c 0x7ffc0000 -n 4 -w d8:00.1 -w d8:00.0 --socket- > mem=2048,2048 > > > -- --port-numa-config=0,1,1,1 --socket-num=1 --burst=64 --txd=512 > > > --rxd=512 --mbcache=512 --rxq=2 --txq=2 --nb-cores=1 --no-lsc- > interrupt > > > -i -a --rss-udp > > > * KVM guest with SR-IOV passthrough / OS: RHEL7.4 / NIC: ConnectX-5 > / > > > Host’s CPU: Intel(R) Xeon(R) Gold 6154. Testpmd command: > > > testpmd --master-lcore=0 -c 0x1ffff -n 4 -w > > > 00:05.0,mprq_en=1,mprq_log_stride_num=6 --socket-mem=2048,0 -- > > > --port-numa-config=0,0 --socket-num=0 --burst=64 --txd=1024 > > > --rxd=1024 --mbcache=512 --rxq=16 --txq=16 --nb-cores=8 > > > --port-topology=chained --forward-mode=macswap --no-lsc- > interrupt > > > -i -a --rss-udp > > > * Baremetal / OS: Ubuntu 18.04 / NIC: ConnectX-5 / CPU: Intel(R) > > > Xeon(R) CPU E5-2697A v4. Testpmd command: > > > testpmd -n 4 -w 0000:82:00.0,rxqs_min_mprq=8,mprq_en=1 -w > > > 0000:82:00.1,rxqs_min_mprq=8,mprq_en=1 -c 0xff80 -- --burst=64 > > > --mbcache=512 -i --nb-cores=8 --rxq=8 --txq=8 --txd=1024 > > > --rxd=1024 --rss-udp --auto-start > > > > > > The packets being received and forwarded by testpmd are of IPv4/UDP > > > type and 64B size. > > > > > > Should we disable PIC in static builds? > > > > > > > > > > Hi Ali, > > > > thanks for reporting, though it's strange that you see such a big impact. > > In my previous tests with i40e driver I never noticed a difference > > between make and meson builds, and I and some others here have been > > using meson builds for any performance work for over a year now. That > > being said let me reverify what I see on my end. > > > > In terms of solutions, disabling the -fPIC flag globally implies that > > we can no longer build static and shared libs from the same sources, > > so we would need to revert to doing either a static or a shared > > library build but not both. If the issue is limited to only some > > drivers or some cases, we can perhaps add in a build option to have > > no-fpic-static builds, to be used in a cases where it is problematic. > > > > However, at this point, I think we need a little more investigation. > > Is there any testing you can do to see if it's just in your driver, or > > in perhaps a mempool driver/lib that the issue appears, or if it's > > just a global slowdown? Do you see the impact with both clang and gcc? > > I'll retest things a bit tomorrow on my end to see what I see. > > > Hi again, > > I've done a quick retest with the i40e driver on my system, using the 20.08 > version so as to have make vs meson direct comparison. [For reference > command used was: "sudo </path/to/testpmd> -c F00000 -w af:00.0 -w > b1:00.0 -w da:00.0 -- --rxq=2 --txq=2 --rxd=2048 --txd=512" using 3x40G ports > to a single core running @3GHz.] No major performance differences were > seen, but if anything the meson build was very slightly faster, as reported to > Jerin, maybe 2%, though it's within the margin of error. >
Thanks for taking the time to investigate this. Disabling PIC for net/mlx5 driver alone in drivers/meson.build resolves the issue for me. I saw this issue with gcc (tested with 4.8.5, 9.3.0, and 7.5.0). But I see now that disabling PIC with an old clang version (clang 3.4.2, RHEL7.4) causes a drop in performance, not an improvement like with gcc. > Can you try adding '-fno-semantic-interposition' to your build, since reading > on the internet it appears that fPIC causes GCC to be very conservative about > optimizing things, and that may help. Clang may be less conservative so > testing with clang would be good too if you can manage it. > I don't see a noticeable change with '-fno-semantic-interposition'. Tested with both gcc and clang. Thanks, Ali