On 8/6/20 9:48 PM, Scott Dial wrote:
The aes-aesni driver is smart enough to use the FPU if it's not busy and
fallback to the CPU otherwise. Unfortunately, the ghash-clmulni driver
does not have that kind of logic in it and only provides an async version,
so we are forced to use the ghash-generic implementation, which is a pure
CPU implementation. The ideal would be for aesni_intel to provide a
synchronous version of gcm(aes) that fell back to the CPU if the FPU is
busy.
I don't know how the AES-NI support works, but I did see your specific
mention of aesni_intel and figured I should mention that this does also
affect AMD. I just got access to AMD nodes (2 x EPYC 7302) with a
Mellanox 10 GbE NIC. I did the same test and it had a similar
performance pattern. I doubt this means much but I figured I should
mention it.
I don't know if the crypto maintainers would be open to such a change, but
if the choice was between reverting and patching the crypto code, then I
would work on patching the crypto code.
I can't opine on anything crypto-related since it is extremely way
outside of my area of expertise, though it is helpful to hear what is
going on.
In any case, you didn't report how many packets arrived out of order, which
was the issue being addressed by my change. It would be helpful to get
the output of "ip -s macsec show" and specifically the InPktsDelayed
counter. Did iperf3 report out-of-order packets with the patch reverted?
Otherwise, if this is the only process running on your test servers,
then you may not be generating any contention for the FPU, which is the
source of the out-of-order issue. Maybe you could run prime95 to busy
the FPU to see the issue that I was seeing.
I ran some tests again on the same servers as before with the Intel
NICs. I tested with prime95 running on 27 of the 28 cores in *each*
server simultaneously (allowing iperf3 to use a core on each) throughout
the entire test. This was using 5.7.11 with
ab046a5d4be4c90a3952a0eae75617b49c0cb01b reverted, so pre-5.7 performance.
MACsec interfaces are deleted and recreated before each test, so
counters are always fresh.
== MACSEC WITHOUT ENCRYPTION ==
* Server1:
18: ms1: protect on validate strict sc off sa off encrypt off send_sci
on end_station off scb off replay off
cipher suite: GCM-AES-128, using ICV length 16
TXSC: 0000000000001234 on SA 0
stats: OutPktsUntagged InPktsUntagged OutPktsTooLong InPktsNoTag
InPktsBadTag InPktsUnknownSCI InPktsNoSCI InPktsOverrun
0 0 0
1123 0 0 1 0
stats: OutPktsProtected OutPktsEncrypted OutOctetsProtected
OutOctetsEncrypted
3798421 0 30889802591 0
0: PN 3799655, state on, key 01000000000000000000000000000000
stats: OutPktsProtected OutPktsEncrypted
3798421 0
RXSC: 0000000000001234, state on
stats: InOctetsValidated InOctetsDecrypted InPktsUnchecked
InPktsDelayed InPktsOK InPktsInvalid InPktsLate InPktsNotValid
InPktsNotUsingSA InPktsUnusedSA
30042694872 0 0 218
3675170 0 0 0 0 0
0: PN 3676633, state on, key 01000000000000000000000000000000
stats: InPktsOK InPktsInvalid InPktsNotValid InPktsNotUsingSA
InPktsUnusedSA
3675170 0 0 0 0
*Server2:
18: ms1: protect on validate strict sc off sa off encrypt off send_sci
on end_station off scb off replay off
cipher suite: GCM-AES-128, using ICV length 16
TXSC: 0000000000001234 on SA 0
stats: OutPktsUntagged InPktsUntagged OutPktsTooLong InPktsNoTag
InPktsBadTag InPktsUnknownSCI InPktsNoSCI InPktsOverrun
0 0 0
1227 0 0 1 0
stats: OutPktsProtected OutPktsEncrypted OutOctetsProtected
OutOctetsEncrypted
3675399 0 30042696158 0
0: PN 3676633, state on, key 01000000000000000000000000000000
stats: OutPktsProtected OutPktsEncrypted
3675399 0
RXSC: 0000000000001234, state on
stats: InOctetsValidated InOctetsDecrypted InPktsUnchecked
InPktsDelayed InPktsOK InPktsInvalid InPktsLate InPktsNotValid
InPktsNotUsingSA InPktsUnusedSA
30889801305 0 0 0
3798410 0 0 0 0 0
0: PN 3799655, state on, key 01000000000000000000000000000000
stats: InPktsOK InPktsInvalid InPktsNotValid InPktsNotUsingSA
InPktsUnusedSA
3798410 0 0 0 0
InPktsDelayed was 218 for Server1 and 0 for Server2.
== MACSEC WITH ENCRYPTION ==
I got the following *with* encryption (macsec interface deleted and
recreated before the test, so counters are fresh):
*Server1:
19: ms1: protect on validate strict sc off sa off encrypt on send_sci on
end_station off scb off replay off
cipher suite: GCM-AES-128, using ICV length 16
TXSC: 0000000000001234 on SA 0
stats: OutPktsUntagged InPktsUntagged OutPktsTooLong InPktsNoTag
InPktsBadTag InPktsUnknownSCI InPktsNoSCI InPktsOverrun
0 0 0
1397 0 0 0 0
stats: OutPktsProtected OutPktsEncrypted OutOctetsProtected
OutOctetsEncrypted
0 5560714 0 46931594623
0: PN 5561948, state on, key 01000000000000000000000000000000
stats: OutPktsProtected OutPktsEncrypted
0 5560714
RXSC: 0000000000001234, state on
stats: InOctetsValidated InOctetsDecrypted InPktsUnchecked
InPktsDelayed InPktsOK InPktsInvalid InPktsLate InPktsNotValid
InPktsNotUsingSA InPktsUnusedSA
0 45977049585 0 3771
5417843 0 0 0 0 0
0: PN 5422860, state on, key 01000000000000000000000000000000
stats: InPktsOK InPktsInvalid InPktsNotValid InPktsNotUsingSA
InPktsUnusedSA
5417843 0 0 0 0
*Server2:
19: ms1: protect on validate strict sc off sa off encrypt on send_sci on
end_station off scb off replay off
cipher suite: GCM-AES-128, using ICV length 16
TXSC: 0000000000001234 on SA 0
stats: OutPktsUntagged InPktsUntagged OutPktsTooLong InPktsNoTag
InPktsBadTag InPktsUnknownSCI InPktsNoSCI InPktsOverrun
0 0 0
1490 0 0 0 0
stats: OutPktsProtected OutPktsEncrypted OutOctetsProtected
OutOctetsEncrypted
0 5421626 0 45977059885
0: PN 5422860, state on, key 01000000000000000000000000000000
stats: OutPktsProtected OutPktsEncrypted
0 5421626
RXSC: 0000000000001234, state on
stats: InOctetsValidated InOctetsDecrypted InPktsUnchecked
InPktsDelayed InPktsOK InPktsInvalid InPktsLate InPktsNotValid
InPktsNotUsingSA InPktsUnusedSA
0 46931106683 0 109
5560541 0 0 0 0 0
0: PN 5561948, state on, key 01000000000000000000000000000000
stats: InPktsOK InPktsInvalid InPktsNotValid InPktsNotUsingSA
InPktsUnusedSA
5560541 0 0 0 0
InPktsDelayed was 3771 for Server1 and 109 for Server2.
The performance numbers were:
* 9.87 Gb/s without macsec
* 6.00 Gb/s with macsec WITHOUT encryption
* 9.19 Gb/s with macsec WITH encryption
iperf3 retransmits were:
* 27 without macsec
* 1211 with macsec WITHOUT encryption
* 721 with macsec WITH encryption
Thanks for the reply and for the background on this.
Ryan