Re: [dpdk-dev] [RFC] Accelerator API to chain packet processing functions

Doherty, Declan Thu, 13 Feb 2020 03:31:32 -0800

On 06/02/2020 10:54 AM, Jerin Jacob wrote:

On Thu, Feb 6, 2020 at 3:35 PM Coyle, David <david.co...@intel.com> wrote:


Hi Jerin,


Hi David,

Thanks for the comments. Please see replies below.

Kind Regards,
David

On Tue, Feb 4, 2020 at 8:15 PM David Coyle <david.co...@intel.com> wrote:


Introduction
============

This RFC introduces a new DPDK library, rte_accelerator.

The main aim of this library is to provide a flexible and extensible way of

combining one or more packet-processing functions into a single operation,
thereby allowing these to be performed in parallel in optimized software
libraries or in a hardware accelerator. These functions can include
cryptography, compression and CRC/checksum calculation, while others can
potentially be added in the future. Performing these functions in parallel as a
single operation can enable a significant performance improvement.



Background
==========

There are a number of byte-wise operations which are present and

common across many access network data-plane pipelines, such as Cipher,
Authentication, CRC, Bit-Interleaved-Parity (BIP), other checksums etc. Some
prototyping has been done at Intel in relation to the 01.org access-network-
dataplanes project to prove that a significant performance improvement is
possible when such byte-wise operations are combined into a single pass of
packet data processing. This performance boost has been prototyped for
both XGS-PON MAC data-plane and DOCSIS MAC data-plane pipelines.


Could you share the relative performance numbers to show the gain?


[DC] As mentioned above, the main performance gains are when the packet 
processing operations can be combined into a single pass of the packet.
Both Crypto-CRC-BIP (for XGS-PON MAC) and Crypto-CRC (for DOCSIS MAC) have been 
implemented in the AESNI MB library as single pass operation chains.

We have modified the dpdk-crypto-perf-tester as part of our prototyping to test 
the cases where:
1) each packet processing function is done as an independent stage (e.g. 
calling rte_net_crc for CRC,  AESNI MB through rte_cryptodev for cipher, and a 
C function to calculate the BIP)
2) all packet processing functions done as a single-pass operation in AESNI MB 
through rte_cryptodev

We see the following results for 1024 byte input frames from 
dpdk-crypto-perf-tester:
         - XGS-PON MAC (Crypto-CRC-BIP):
                 - 3 independent stages: 1429 cycles/buf (13.75Gbps)
                 - 1 single-pass stage: 896 cycles/buf (21.9Gbps)
                 37% cycle reduction

         - DOCSIS MAC (Crypto-CRC):
                 - 2 independent stages: 1421 cycles/buf (13.84Gbps)
                 - 1 single-pass stage: 1133 cycles/buf (17.34Gbps)
                 20% cycle reduction

Adding the accelerator API will allow vendors gain the benefits of these cycle 
savings


Numbers make sense. I have seen a similar performance improvement
doing in one pass with CPU instructions.

- XGS-PON MAC: Crypto-CRC-BIP
         - Order:
                 - Downstream: CRC, Encrypt, BIP


I understand if the chain has two operations then it may possible to have
handcrafted SW code to do both operations in one pass.
I understand the spec is agnostic on a number of passes it does require to
enable the xfrom but To understand the SW/HW capability, In the above
case, "CRC, Encrypt, BIP", It is done in one pass in SW or three passes in SW
or one pass using HW?


[DC] The CRC, Encrypt, BIP is also currently done as 1 pass in AESNI MB library 
SW.
However, this could also be performed as a single pass in a HW accelerator


As a specification, cascading the xform chains make sense.
Do we have any HW that does support chaining the xforms more than
"two" in one pass?
i.e real chaining function where two blocks of HWs work hand in hand
for chaining.
If none, it may be better to abstract as synonymous API(No dequeue, no
enqueue) for the CPU use case.

Where you thinking along the lines of a synchronous API option like thatjust introduced to crytodev? i.e something like


uint16_t rte_accelerator_process(struct rte_accelerator_ctx *ctx,
                                 struct rte_accelerator_op ops[],
                                 uint16_t nb_ops);

Re: [dpdk-dev] [RFC] Accelerator API to chain packet processing functions

Reply via email to