On 7/7/20 7:04 AM, Ananyev, Konstantin wrote:
Hi Feifei,
> Hi, Konstantin, David
I'm Feifei Wang from Arm. Sorry to make the following request:
Would you please do some ring performance tests of this patch in your platforms
at the time you are free?
And I want to know whether this patch has a significant impact on other
platforms except ARM.
I run few tests on SKX box and so far didn’t notice any real perf difference.
Konstantin
Full performance results for IBM POWER9 system below. I ran the tests
twice for each version and the results were consistent.
without this patch with this patch
Testing burst enq/deq
legacy APIs: SP/SC: burst (size: 8): 43.63 43.63
legacy APIs: SP/SC: burst (size: 32): 50.07 50.04
legacy APIs: MP/MC: burst (size: 8): 58.43 58.42
legacy APIs: MP/MC: burst (size: 32): 65.52 65.51
Testing bulk enq/deq
legacy APIs: SP/SC: bulk (size: 8): 43.61 43.61
legacy APIs: SP/SC: bulk (size: 32): 50.05 50.02
legacy APIs: MP/MC: bulk (size: 8): 58.43 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.50 65.49
HW:
Architecture: ppc64le
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 4
Core(s) per socket: 16
Socket(s): 2
NUMA node(s): 6
Model: 2.3 (pvr 004e 1203)
Model name: POWER9, altivec supported
CPU max MHz: 3800.0000
CPU min MHz: 2300.0000
L1d cache: 32K
L1i cache: 32K
L2 cache: 512K
L3 cache: 10240K
NUMA node0 CPU(s): 0-63
NUMA node8 CPU(s): 64-127
OS: RHEL 8.2
GCC: gcc version 8.3.1 20191121 (Red Hat 8.3.1-5) (GCC)
DPDK: 20.08.0-rc0 (a8550b773)
Unpatched
===========
sudo app/test/dpdk-test -l 68,69
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.0 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.1 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.0 (socket 8)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.1 (socket 8)
EAL: using IOMMU type 7 (sPAPR)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.0 (socket 8)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.1 (socket 8)
APP: HPET is not enabled, using TSC as default timer
RTE>>ring_perf_autotest
### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.01
legacy APIs: MP/MC: single: 56.27
### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.63
legacy APIs: SP/SC: burst (size: 32): 50.07
legacy APIs: MP/MC: burst (size: 8): 58.43
legacy APIs: MP/MC: burst (size: 32): 65.52
### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.61
legacy APIs: SP/SC: bulk (size: 32): 50.05
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.50
### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16
### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.44
legacy APIs: MP/MC: bulk (size: 8): 16.19
legacy APIs: SP/SC: bulk (size: 32): 3.10
legacy APIs: MP/MC: bulk (size: 32): 3.64
### Testing using all slave nodes ###
Bulk enq/dequeue count on size 8
Core [68] count = 362382
Core [69] count = 362516
Total count (size: 8): 724898
Bulk enq/dequeue count on size 32
Core [68] count = 361565
Core [69] count = 361852
Total count (size: 32): 723417
### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.81
elem APIs: element size 16B: MP/MC: single: 56.78
### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 45.04
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.27
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.68
elem APIs: element size 16B: MP/MC: burst (size: 32): 75.00
### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.05
elem APIs: element size 16B: SP/SC: bulk (size: 32): 59.23
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.64
elem APIs: element size 16B: MP/MC: bulk (size: 32): 75.11
### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16
### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.15
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.55
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.22
elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.86
### Testing using all slave nodes ###
Bulk enq/dequeue count on size 8
Core [68] count = 374327
Core [69] count = 374433
Total count (size: 8): 748760
Bulk enq/dequeue count on size 32
Core [68] count = 324111
Core [69] count = 320038
Total count (size: 32): 644149
Test OK
Patched
=======
$ sudo app/test/dpdk-test -l 68,69
EAL: Detected 128 lcore(s)
EAL: Detected 2 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: Selected IOVA mode 'VA'
EAL: No available hugepages reported in hugepages-2048kB
EAL: Probing VFIO support...
EAL: VFIO support initialized
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.0 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0000:01:00.1 (socket 0)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.0 (socket 8)
EAL: Probe PCI driver: net_mlx5 (15b3:1019) device: 0030:01:00.1 (socket 8)
EAL: using IOMMU type 7 (sPAPR)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.0 (socket 8)
EAL: Probe PCI driver: net_i40e (8086:1583) device: 0034:01:00.1 (socket 8)
APP: HPET is not enabled, using TSC as default timer
RTE>>ring_perf_autotest
### Testing single element enq/deq ###
legacy APIs: SP/SC: single: 42.00
legacy APIs: MP/MC: single: 56.27
### Testing burst enq/deq ###
legacy APIs: SP/SC: burst (size: 8): 43.63
legacy APIs: SP/SC: burst (size: 32): 50.04
legacy APIs: MP/MC: burst (size: 8): 58.42
legacy APIs: MP/MC: burst (size: 32): 65.51
### Testing bulk enq/deq ###
legacy APIs: SP/SC: bulk (size: 8): 43.61
legacy APIs: SP/SC: bulk (size: 32): 50.02
legacy APIs: MP/MC: bulk (size: 8): 58.43
legacy APIs: MP/MC: bulk (size: 32): 65.49
### Testing empty bulk deq ###
legacy APIs: SP/SC: bulk (size: 8): 7.16
legacy APIs: MP/MC: bulk (size: 8): 7.16
### Testing using two hyperthreads ###
legacy APIs: SP/SC: bulk (size: 8): 12.43
legacy APIs: MP/MC: bulk (size: 8): 16.17
legacy APIs: SP/SC: bulk (size: 32): 3.10
legacy APIs: MP/MC: bulk (size: 32): 3.65
### Testing using all slave nodes ###
Bulk enq/dequeue count on size 8
Core [68] count = 363208
Core [69] count = 363334
Total count (size: 8): 726542
Bulk enq/dequeue count on size 32
Core [68] count = 361592
Core [69] count = 361690
Total count (size: 32): 723282
### Testing single element enq/deq ###
elem APIs: element size 16B: SP/SC: single: 42.78
elem APIs: element size 16B: MP/MC: single: 56.75
### Testing burst enq/deq ###
elem APIs: element size 16B: SP/SC: burst (size: 8): 45.04
elem APIs: element size 16B: SP/SC: burst (size: 32): 59.27
elem APIs: element size 16B: MP/MC: burst (size: 8): 60.66
elem APIs: element size 16B: MP/MC: burst (size: 32): 75.03
### Testing bulk enq/deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 45.04
elem APIs: element size 16B: SP/SC: bulk (size: 32): 59.33
elem APIs: element size 16B: MP/MC: bulk (size: 8): 60.65
elem APIs: element size 16B: MP/MC: bulk (size: 32): 75.04
### Testing empty bulk deq ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 7.16
elem APIs: element size 16B: MP/MC: bulk (size: 8): 7.16
### Testing using two hyperthreads ###
elem APIs: element size 16B: SP/SC: bulk (size: 8): 12.14
elem APIs: element size 16B: MP/MC: bulk (size: 8): 15.56
elem APIs: element size 16B: SP/SC: bulk (size: 32): 3.22
elem APIs: element size 16B: MP/MC: bulk (size: 32): 3.86
### Testing using all slave nodes ###
Bulk enq/dequeue count on size 8
Core [68] count = 372618
Core [69] count = 372415
Total count (size: 8): 745033
Bulk enq/dequeue count on size 32
Core [68] count = 318784
Core [69] count = 316066
Total count (size: 32): 634850
Test OK