> -----Original Message----- > From: Bruce Richardson <bruce.richard...@intel.com> > Sent: Thursday, October 22, 2020 4:58 PM > To: Ali Alnubani <alia...@nvidia.com> > Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon > <tho...@monjalon.net>; Asaf Penso <as...@nvidia.com> > Subject: Re: [dpdk-dev] performance degradation with fpic > > On Thu, Oct 22, 2020 at 01:17:16PM +0000, Ali Alnubani wrote: > > Hi Bruce, > > Sorry for the delayed response. > > > > > -----Original Message----- > > > From: Bruce Richardson <bruce.richard...@intel.com> > > > Sent: Monday, October 19, 2020 4:02 PM > > > To: Ali Alnubani <alia...@nvidia.com> > > > Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon > <tho...@monjalon.net>; > > > Asaf Penso <as...@nvidia.com> > > > Subject: Re: [dpdk-dev] performance degradation with fpic > > > > > > On Mon, Oct 19, 2020 at 11:47:48AM +0000, Ali Alnubani wrote: > > > > Hi Bruce, > > > > > > > > > -----Original Message----- > > > > > From: Bruce Richardson <bruce.richard...@intel.com> > > > > > Sent: Friday, October 16, 2020 12:59 PM > > > > > To: Ali Alnubani <alia...@nvidia.com> > > > > > Cc: dev@dpdk.org; NBU-Contact-Thomas Monjalon > > > <tho...@monjalon.net>; > > > > > Asaf Penso <as...@nvidia.com> > > > > > Subject: Re: [dpdk-dev] performance degradation with fpic > > > > > > > > > > On Thu, Oct 15, 2020 at 06:08:04PM +0100, Bruce Richardson wrote: > > > > > > On Thu, Oct 15, 2020 at 04:00:44PM +0000, Ali Alnubani wrote: > > > > > > > Hi Bruce, > > > > > > > > > > > > > > > > > > > > > We have been seeing in some cases that the DPDK > > > > > > > forwarding > > > > > performance > > > > > > > is up to 9% lower when DPDK is built as static with meson > > > > > > > compared > > > to a > > > > > > > build with makefiles. > > > > > > > > > > > > > > > > > > > > > The same degradation can be reproduced with makefiles on > > > > > > > older > > > DPDK > > > > > > > releases when building with EXTAR_CFLAGS set to “-fPIC”, > > > > > > > it can also > > > be > > > > > > > resolved in meson when passing “pic: false” to meson’s > > > static_library > > > > > > > call (more tweaking needs to be done to prevent building shared > > > > > > > libraries because this change breaks them). > > > > > > > > > > > > > > > > > > > > > I can reproduce this drop with the following cases: > > > > > > > * Baremetal / NIC: ConnectX-4 Lx / OS: RHEL7.4 / CPU: > > > > > > > Intel(R) > > > > > > > Xeon(R) Gold 6154. Testpmd command: > > > > > > > > > > > > > > testpmd -c 0x7ffc0000 -n 4 -w d8:00.1 -w d8:00.0 > > > > > > > --socket- > > > > > mem=2048,2048 > > > > > > > -- --port-numa-config=0,1,1,1 --socket-num=1 --burst=64 -- > txd=512 > > > > > > > --rxd=512 --mbcache=512 --rxq=2 --txq=2 --nb-cores=1 > > > > > > > --no-lsc- > > > > > interrupt > > > > > > > -i -a --rss-udp > > > > > > > * KVM guest with SR-IOV passthrough / OS: RHEL7.4 / NIC: > > > > > > > ConnectX-5 > > > > > / > > > > > > > Host’s CPU: Intel(R) Xeon(R) Gold 6154. Testpmd command: > > > > > > > testpmd --master-lcore=0 -c 0x1ffff -n 4 -w > > > > > > > 00:05.0,mprq_en=1,mprq_log_stride_num=6 --socket- > > > mem=2048,0 -- > > > > > > > --port-numa-config=0,0 --socket-num=0 --burst=64 --txd=1024 > > > > > > > --rxd=1024 --mbcache=512 --rxq=16 --txq=16 --nb-cores=8 > > > > > > > --port-topology=chained --forward-mode=macswap > > > > > > > --no-lsc- > > > > > interrupt > > > > > > > -i -a --rss-udp > > > > > > > * Baremetal / OS: Ubuntu 18.04 / NIC: ConnectX-5 / CPU: > Intel(R) > > > > > > > Xeon(R) CPU E5-2697A v4. Testpmd command: > > > > > > > testpmd -n 4 -w 0000:82:00.0,rxqs_min_mprq=8,mprq_en=1 - > w > > > > > > > 0000:82:00.1,rxqs_min_mprq=8,mprq_en=1 -c 0xff80 -- > > > > > > > -- > > > burst=64 > > > > > > > --mbcache=512 -i --nb-cores=8 --rxq=8 --txq=8 --txd=1024 > > > > > > > --rxd=1024 --rss-udp --auto-start > > > > > > > > > > > > > > The packets being received and forwarded by testpmd are > > > > > > > of > > > IPv4/UDP > > > > > > > type and 64B size. > > > > > > > > > > > > > > Should we disable PIC in static builds? > > > > > > > > > > > > > > > > > > > > > > > > > > Hi Ali, > > > > > > > > > > > > thanks for reporting, though it's strange that you see such a big > impact. > > > > > > In my previous tests with i40e driver I never noticed a > > > > > > difference between make and meson builds, and I and some > > > > > > others here have been using meson builds for any performance > > > > > > work for over a year now. That being said let me reverify what I see > on my end. > > > > > > > > > > > > In terms of solutions, disabling the -fPIC flag globally > > > > > > implies that we can no longer build static and shared libs > > > > > > from the same sources, so we would need to revert to doing > > > > > > either a static or a shared library build but not both. If the > > > > > > issue is limited to only some drivers or some cases, we can > > > > > > perhaps add in a build option to have no-fpic-static builds, > > > > > > to be used in a cases where it is > > > problematic. > > > > > > > > > > > > However, at this point, I think we need a little more investigation. > > > > > > Is there any testing you can do to see if it's just in your > > > > > > driver, or in perhaps a mempool driver/lib that the issue > > > > > > appears, or if it's just a global slowdown? Do you see the > > > > > > impact with both clang > > > and gcc? > > > > > > I'll retest things a bit tomorrow on my end to see what I see. > > > > > > > > > > > Hi again, > > > > > > > > > > I've done a quick retest with the i40e driver on my system, > > > > > using the 20.08 version so as to have make vs meson direct > comparison. > > > > > [For reference command used was: "sudo </path/to/testpmd> -c > > > > > F00000 -w af:00.0 -w > > > > > b1:00.0 -w da:00.0 -- --rxq=2 --txq=2 --rxd=2048 --txd=512" > > > > > using 3x40G ports to a single core running @3GHz.] No major > > > > > performance differences were seen, but if anything the meson > > > > > build was very slightly faster, as reported to Jerin, maybe 2%, > > > > > though it's within the > > > margin of error. > > > > > > > > > > > > > Thanks for taking the time to investigate this. > > > > > > > > Disabling PIC for net/mlx5 driver alone in drivers/meson.build > > > > resolves the > > > issue for me. > > > > I saw this issue with gcc (tested with 4.8.5, 9.3.0, and 7.5.0). > > > > But I see now > > > that disabling PIC with an old clang version (clang 3.4.2, RHEL7.4) > > > causes a drop in performance, not an improvement like with gcc. > > > > > > > That's interesting. > > > > > > When you just build with and without -fpic with newer clang, do you > > > see the same perf drop as with gcc? With the older clang, is the > > > shared lib build faster than the static one? > > > > With the older clang on RHEL7.4, the shared lib is about ~2% slower > compared to the static build. > > With clang 11 compiled from source on ubuntu 18.04, I'm getting good > performance with static meson build, same performance as with makefiles > with gcc, and ~6% better than the static meson gcc build. Disabling PIC on > clang 11 degrades performance by ~4%. > > With clang 6.0.0 however, disabling PIC causes a very small drop (~0.1%). > > > > This is on v20.08 with KVM ConnectX-5 SR-IOV passthrough. Command: > "dpdk-testpmd --master-lcore=0 -c 0x1ffff -n 4 -w 00:05.0 --socket- > mem=2048,0 -- --port-numa-config=0,0 --socket-num=0 --burst=64 -- > txd=1024 --rxd=1024 --mbcache=512 --rxq=8 --txq=8 --nb-cores=4 --port- > topology=chained --forward-mode=macswap --no-lsc-interrupt -i -a --rss- > udp". > > > > So, am I right in saying that it appears the clang builds are all fine here, > that > performance is pretty much as expected in all cases with the default setting > of PIC enabled? Therefore it appears that the issue is limited to gcc builds > at > this point? > Yes it appears that way.
Regards, Ali