Hi Christian,

Yes looks like your bottleneck is crypto, not IO...
I have no idea about AES performance on latest AMD compared to Intel, but looks 
like you have your answer 😊

Best
ben

> -----Original Message-----
> From: Christian Hopps <cho...@chopps.org>
> Sent: mercredi 22 juillet 2020 18:47
> To: Benoit Ganne (bganne) <bga...@cisco.com>
> Cc: Christian Hopps <cho...@chopps.org>; vpp-dev <vpp-dev@lists.fd.io>
> Subject: Re: [vpp-dev] AMD Epyc and vpp.
> 
> Included the requested info below (FWIW this is vpp-1908, updated with
> DPDK 20.05 and all the relevant DPDK changes cherry picked form VPP into
> 1908). The results are basically the same when run on VPP master though.
> 
> I've first included as a summary, worker 4 (thread 5) as it is doing
> something very basic with no variations. This thread is taking pre-built
> 1500 octet packets from a lockless ring at a constant rate (to achieve
> 10G) and sending them through ESP which then heads out the same interface.
> I've included runs with NULL encryption and then with GCM-256 encryption..
> If I'm reading this right the AESNI encryption done by DPDK is taking
> almost 2 times as long on the 3GHz Epyc than it is on the 2.1GHz Intel. :(
> 
> The actual setup is
> 
> 
> +----+  RED
> |    | ----- [VPP1]
> |    |        |
> |TREX|        | Black
> |    |  RED   |
> |    | ----- [VPP2]
> +----+
> 
> 
> Traffic flows  like su:  RED (9G  of 1442 octet IP packets) <---> [VPPx]
> <----> BLACK (10G of 1500 octet IP packets)
> 
> Red Rx is on worker 0
> Black Rx is on worker 1
> Red Tx is on worker 3
> Black Tx is on worker 4 (thread 5)
> (worker 2 just creates zero'd buffers for worker 5 to use)
> 
> 
> amd-gcm-cypto-1908.txt
> Thread 5 vpp_wk_4 (lcore 5)
> Time 11.0, 10 sec internal node vector rate 252.94
>   vector rates in 1.5113e6, out 1.5113e6, drop 0.0000e0, punt 0.0000e0
>              Name                 State         Calls          Vectors
> Suspends      Packet-Clocks   Vectors/Call
> HundredGigabitEthernet21/0/0-o   active              32883         8316355
> 0          1.05e1          252.91       0
> HundredGigabitEthernet21/0/0-t   active              32883         8316355
> 0          5.97e1          252.91       2
> dpdk-crypto-input                polling             32883         8316355
> 0          2.84e3          252.91     100
> dpdk-esp4-encrypt                active              32883         8316355
> 0          9.89e2          252.91       2
> ip4-lookup                       active              32883         8316355
> 0          3.08e1          252.91       0
> ip4-rewrite                      active              32883         8316355
> 0          2.43e1          252.91       0
> iptfs-output                     polling             32883         8316355
> 0          1.68e1          252.91     100
> ID     Name                Type        LWP     Sched Policy (Priority)
> lcore  Core   Socket State
> 
> 
> intel-gcm-crypto-1908.txt
> Thread 5 vpp_wk_4 (lcore 5)
> Time 10.9, 10 sec internal node vector rate 11.53
>   vector rates in 1.6238e6, out 1.6238e6, drop 0.0000e0, punt 0.0000e0
>              Name                 State         Calls          Vectors
> Suspends      Packet-Clocks   Vectors/Call
> HundredGigabitEthernet65/0/1-o   active             776117         8925476
> 0          3.47e1           11.50       0
> HundredGigabitEthernet65/0/1-t   active             776117         8925476
> 0          1.18e2           11.50       2
> dpdk-crypto-input                polling            776117         8925476
> 0          1.68e3           11.50     100
> dpdk-esp4-encrypt                active             776117         8925480
> 0          5.53e2           11.50       2
> ip4-lookup                       active             776117         8925476
> 0          5.76e1           11.50       0
> ip4-rewrite                      active             776117         8925476
> 0          4.49e1           11.50       0
> iptfs-output                     polling            776117         8925480
> 0          8.39e1           11.50     100
> ID     Name                Type        LWP     Sched Policy (Priority)
> lcore  Core   Socket State
> 
> 
> amd-null-cypto-1908.txt
> Thread 5 vpp_wk_4 (lcore 5)
> Time 11.0, 10 sec internal node vector rate 5.24
>   vector rates in 1.6260e6, out 1.6260e6, drop 0.0000e0, punt 0.0000e0
>              Name                 State         Calls          Vectors
> Suspends      Packet-Clocks   Vectors/Call
> HundredGigabitEthernet21/0/0-o   active            1704581         8945941
> 0          5.88e1            5.25       0
> HundredGigabitEthernet21/0/0-t   active            1704581         8945941
> 0          3.61e2            5.25       2
> dpdk-crypto-input                polling           1704581         8945941
> 0          5.71e2            5.25     100
> dpdk-esp4-encrypt                active            1704581         8945939
> 0          1.82e3            5.25       2
> ip4-lookup                       active            1704581         8945941
> 0          7.83e1            5.25       0
> ip4-rewrite                      active            1704581         8945941
> 0          6.81e1            5.25       0
> iptfs-output                     polling           1704581         8945939
> 0          7.19e2            5.25     100
> ID     Name                Type        LWP     Sched Policy (Priority)
> lcore  Core   Socket State
> 
> 
> intel-null-crypto-1908.txt
> Thread 5 vpp_wk_4 (lcore 5)
> Time 10.9, 10 sec internal node vector rate 1.84
>   vector rates in 1.6236e6, out 1.6236e6, drop 0.0000e0, punt 0.0000e0
>              Name                 State         Calls          Vectors
> Suspends      Packet-Clocks   Vectors/Call
> HundredGigabitEthernet65/0/1-o   active            4857443         8925766
> 0          1.57e2            1.84       0
> HundredGigabitEthernet65/0/1-t   active            4857443         8925766
> 0          3.96e2            1.84       2
> dpdk-crypto-input                polling           4857443         8925766
> 0          4.08e2            1.84     100
> dpdk-esp4-encrypt                active            4857443         8925766
> 0          7.01e2            1.84       2
> ip4-lookup                       active            4857443         8925766
> 0          1.99e2            1.84       0
> ip4-rewrite                      active            4857443         8925766
> 0          1.45e2            1.84       0
> iptfs-output                     polling           4857443         8925766
> 0          5.35e2            1.84     100
> ID     Name                Type        LWP     Sched Policy (Priority)
> lcore  Core   Socket State
> 
> Here's show cpu / show pci for each
> 
> 
> Model name:               AMD EPYC 7302 16-Core Processor
> Microarch model (family): unknown (family 0x0f model 0x31)
> Flags:                    sse3 pclmulqdq ssse3 sse41 sse42 avx rdrand avx2
> pqm pqe rdseed aes sha invariant_tsc
> Base frequency:           2.99 GHz
> Pool Name            Index NUMA  Size  Data Size  Total  Avail  Cached
> Used
> default-numa-0         0     0   10688   10240   100462  84078   2220
> 14164
> default-numa-1         1     1   10688   10240   100462 100462     0
> 0
> Address      Sock VID:PID     Link Speed   Driver          Product Name
> Vital Product Data
> 0000:21:00.0   0  15b3:101b   unknown      mlx5_core       ConnectX-6 VPI
> adapter card, HDR PN: MCX653106A-HDAT
> 
> EC: A8
> 
> V2: 0x 4d 43 58 36 35 33 31 30 ...
> 
> SN: MT1944K18413
> 
> V3: 0x 64 34 36 36 30 30 32 64 ...
> 
> VA: 0x 4d 4c 58 3a 4d 4e 3d 4d ...
> 
> V0: 0x 50 43 49 65 47 65 6e 34 ...
> 
> RV: 0x ef 00
> 0000:21:00.1   0  15b3:101b   unknown      mlx5_core       ConnectX-6 VPI
> adapter card, HDR PN: MCX653106A-HDAT
> 
> EC: A8
> 
> V2: 0x 4d 43 58 36 35 33 31 30 ...
> 
> SN: MT1944K18413
> 
> V3: 0x 64 34 36 36 30 30 32 64 ...
> 
> VA: 0x 4d 4c 58 3a 4d 4e 3d 4d ...
> 
> V0: 0x 50 43 49 65 47 65 6e 34 ...
> 
> RV: 0x ef 00
> 0000:62:00.0   0  8086:1521   5.0 GT/s x2  igb
> 0000:62:00.1   0  8086:1521   5.0 GT/s x2  igb
> 
> 
> 
> Model name:               Intel(R) Xeon(R) Silver 4116 CPU @ 2.10GHz
> Microarch model (family): [0x6] Skylake ([0x55] Skylake X/SP) stepping 0x4
> Flags:                    sse3 pclmulqdq ssse3 sse41 sse42 avx rdrand avx2
> rtm pqm pqe avx512f rdseed aes invariant_tsc
> Base frequency:           2.09 GHz
> Pool Name            Index NUMA  Size  Data Size  Total  Avail  Cached
> Used
> default-numa-0         0     0   10688   10240   100462  84910   1991
> 13561
> Address      Sock VID:PID     Link Speed   Driver          Product Name
> Vital Product Data
> 0000:04:00.0   0  14e4:165f   5.0 GT/s x1  tg3             Broadcom
> NetXtreme Gigabit Ether PN: BCM95720
> 
> MN: 1028
> 
> V0: 0x 46 46 56 32 31 2e 34 30 ...
> 
> V1: 0x 44 53 56 31 30 32 38 56 ...
> 
> V2: 0x 4e 50 59 32
> 
> V3: 0x 50 4d 54 31
> 
> V4: 0x 4e 4d 56 42 72 6f 61 64 ...
> 
> V5: 0x 44 54 49 4e 49 43
> 
> V6: 0x 44 43 4d 31 30 30 31 30 ...
> 
> RV: 0x ee 00 00 00 00 00 00 00 ...
> 0000:04:00.1   0  14e4:165f   5.0 GT/s x1  tg3             Broadcom
> NetXtreme Gigabit Ether PN: BCM95720
> 
> MN: 1028
> 
> V0: 0x 46 46 56 32 31 2e 34 30 ...
> 
> V1: 0x 44 53 56 31 30 32 38 56 ...
> 
> V2: 0x 4e 50 59 32
> 
> V3: 0x 50 4d 54 31
> 
> V4: 0x 4e 4d 56 42 72 6f 61 64 ...
> 
> V5: 0x 44 54 49 4e 49 43
> 
> V6: 0x 44 43 4d 31 30 30 31 30 ...
> 
> RV: 0x ee 00 00 00 00 00 00 00 ...
> 0000:65:00.0   0  15b3:1017   8.0 GT/s x16 mlx5_core       CX516A -
> ConnectX-5 QSFP28      PN: MCX516A-CCAT
> 
> EC: AA
> 
> V2: 0x 4d 43 58 35 31 36 41 2d ...
> 
> SN: MT1934J09467
> 
> V3: 0x 34 65 39 61 34 39 39 34 ...
> 
> VA: 0x 4d 4c 58 3a 4d 4f 44 4c ...
> 
> V0: 0x 50 43 49 65 47 65 6e 33 ...
> 
> RV: 0x b5 00 00
> 0000:65:00.1   0  15b3:1017   8.0 GT/s x16 mlx5_core       CX516A -
> ConnectX-5 QSFP28      PN: MCX516A-CCAT
> 
> EC: AA
> 
> V2: 0x 4d 43 58 35 31 36 41 2d ...
> 
> SN: MT1934J09467
> 
> V3: 0x 34 65 39 61 34 39 39 34 ...
> 
> VA: 0x 4d 4c 58 3a 4d 4f 44 4c ...
> 
> V0: 0x 50 43 49 65 47 65 6e 33 ...
> 
> RV: 0x b5 00 00
> 
> 
> 
> Here are the 4 runs with all the output you requested (output of: vppctl
> clear run; sleep 11; vppctl show run; vppctl show threads; vppctl show
> cpu; vppctl show buffers ; vppctl show pci)
> 

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#17043): https://lists.fd.io/g/vpp-dev/message/17043
Mute This Topic: https://lists.fd.io/mt/75716056/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-

Reply via email to