Re: [dpdk-dev] [RFC 0/4] cpu-crypto API choices

Jerin Jacob Wed, 20 Nov 2019 06:28:07 -0800

On Mon, Nov 18, 2019 at 5:27 PM Ananyev, Konstantin
<[email protected]> wrote:
>
> Hi Jerin,


Hi Konstantin,

>
> Thanks for input, my answers inline.
> Other guys - please provide your input.
> Thanks
> Konstantin
>
> > > Originally both SW and HW crypto PMDs use rte_crypot_op based API to
> > > process the crypto workload asynchronously. This way provides uniformity
> > > to both PMD types, but also introduce unnecessary performance penalty to
> > > SW PMDs that have to "simulate" HW async behavior
> > > (crypto-ops enqueue/dequeue, HW addresses computations,
> > > storing/dereferencing user provided data (mbuf) for each crypto-op,
> > > etc).
> > >
> > > The aim is to introduce a new optional API for SW crypto-devices
> > > to perform crypto processing in a synchronous manner.
> > > As summarized by Akhil, we need a synchronous API to perform crypto
> > > operations on raw data using SW PMDs, that provides:
> > >  - no crypto-ops.
> > >  - avoid using mbufs inside this API, use raw data buffers instead.
> > >  - no separate enqueue-dequeue, only single process() API for data path.
> > >  - input data buffers should be grouped by session,
> > >    i.e. each process() call takes one session and group of input buffers
> > >    that  belong to that session.
> > >  - All parameters that are constant accross session, should be stored
> > >    inside the session itself and reused by all incoming data buffers.
> > >
> > > While there seems no controversy about need of such functionality,
> > > there seems to be no agreement on what would be the best API for that.
> > > So I am requesting for TB input on that matter.
> > >
> > > Series structure:
> > > - patch #1 - intorduce basic data structures to be used by sync API
> > >   (no controversy here, I hope ..)
> > >   [RFC 1/4] cpu-crypto: Introduce basic data structures
> > > - patch #2 - Intel initial approach for new API (via rte_security)
> > >   [RFC 2/4] security: introduce cpu-crypto API
> > > - patch #3 - approach that reuses existing rte_cryptodev API as much as
> > >   possible
> > >   [RFC 3/4] cryptodev: introduce cpu-crypto API
> > > - patch #4 - approach via introducing new session data structure and API
> > >   [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API
> > >
> > > Patches 2,3,4 are mutually exclusive,
> > > and we probably have to choose which one to go forward with.
> > > I put some explanations in each of the patches, hopefully that will help
> > > to  understand pros and cons of each one.
> > >
> > > Akhil strongly supports #3, AFAIK mainly because it allows PMDs to
> > > reuse existing API and minimize API level changes.
> > > My favorite is #4, #2 is less preferable but ok too.
> > > #3 seems problematic to me by the reasons I outlined in #4 patch
> > > description.
> > >
> > > Please provide your opinion.
> >
> > I spend some time on the proposal and I agree that sync API is needed
> > and it makes sense to remove queue emulation and allocating/freeing
> > the crypto_ops
> > in case of sync API.
> >
> > # I would prefer to not duplicate the session. If the newly added
> > fields are for optimization
> > then those can be applicable for HW too. For example, if we consider,
> > offset to be
> > constant for one session HW PMD will be able to leverage this. ref:
> > rte_crypto_aead_xfrom::cpu_crypto:offset
>
> It might, but right for async API we pass this info in crypto_op instead.
> So if I get you right your preference is sort of #3 approach
> that reuses existing rte_cryptodev API as much as possible:
> reuse existing rte_cryptodev_sym structure with new sync process() API?

Yes.

> > # I would prefer to not duplicate ops parameters, instead of the
> > existing rte_crypto_ops  can be updated.
> > I see that most members introduced in rte_crypto_sym_vec &
> > rte_crypto_vec are already existing in rte_crypto_op.
>
> rte_crypto_ops is way too generic/excessive.
> Filling/reading it seems one of the main slowdowns that  we trying to
> avoid in new API.

It does not look like it is going over 1 CL. Regarding the filling
case, I think,
We need to form the rte_crypto_ops in the slow path and change only in
mutable fields need to update per packet.

> >
> > Also, since we are agreeing that the ops for SYNC API can be from
> > stack/one time allocated, the size shouldn't matter.
>
> I can be on stack, but it means user will still have to fill them
> and PMD will have to read/process/overwrite them.
>
> > I understand that this would cause ABI breakage, but for this release,
> > we can work together and add some reserved fields
> > that we can implement later. I believe that's the reason why you want
> > to introduce new structures. I think that will bloat
> > the existing crypto lib.
>
> It will increase the lib code, but I don't think it will be significant.
> Honestly, I think messing with crypto_op and other existing structures
> might have much more negative effect.

Yes. We need to change it carefully.

>
> > If I understand it correctly, this will be used in conjunction with
> > IXGBE to handle fragmented IPsec traffic. If that's the fundamental
> > reasoning, then there is an alternate path possible.
>
> No, it's just one of the use-case.
> Pretty important, but not the only one.
> The main reason - current cryptodev API (crypto_op based) is suboptimal for 
> SW based PMDs.
> We wasting too many cycles to pretend that it is a lookaside device 
> underneath.

That I agree. I think, it should be fixed by the process() API.

> I think makes more sense to admit that it is SW based and exploit it nature,
> instead of trying to hide it.

Yes. I thought the separate process() device op will solve the major problems.

This is just my _personal_ opinion.  I leave crypto code contributors
to define specifics of API.

Re: [dpdk-dev] [RFC 0/4] cpu-crypto API choices

Reply via email to