On Mon, Nov 18, 2019 at 5:27 PM Ananyev, Konstantin <konstantin.anan...@intel.com> wrote: > > Hi Jerin,
Hi Konstantin, > > Thanks for input, my answers inline. > Other guys - please provide your input. > Thanks > Konstantin > > > > Originally both SW and HW crypto PMDs use rte_crypot_op based API to > > > process the crypto workload asynchronously. This way provides uniformity > > > to both PMD types, but also introduce unnecessary performance penalty to > > > SW PMDs that have to "simulate" HW async behavior > > > (crypto-ops enqueue/dequeue, HW addresses computations, > > > storing/dereferencing user provided data (mbuf) for each crypto-op, > > > etc). > > > > > > The aim is to introduce a new optional API for SW crypto-devices > > > to perform crypto processing in a synchronous manner. > > > As summarized by Akhil, we need a synchronous API to perform crypto > > > operations on raw data using SW PMDs, that provides: > > > - no crypto-ops. > > > - avoid using mbufs inside this API, use raw data buffers instead. > > > - no separate enqueue-dequeue, only single process() API for data path. > > > - input data buffers should be grouped by session, > > > i.e. each process() call takes one session and group of input buffers > > > that belong to that session. > > > - All parameters that are constant accross session, should be stored > > > inside the session itself and reused by all incoming data buffers. > > > > > > While there seems no controversy about need of such functionality, > > > there seems to be no agreement on what would be the best API for that. > > > So I am requesting for TB input on that matter. > > > > > > Series structure: > > > - patch #1 - intorduce basic data structures to be used by sync API > > > (no controversy here, I hope ..) > > > [RFC 1/4] cpu-crypto: Introduce basic data structures > > > - patch #2 - Intel initial approach for new API (via rte_security) > > > [RFC 2/4] security: introduce cpu-crypto API > > > - patch #3 - approach that reuses existing rte_cryptodev API as much as > > > possible > > > [RFC 3/4] cryptodev: introduce cpu-crypto API > > > - patch #4 - approach via introducing new session data structure and API > > > [RFC 4/4] cryptodev: introduce rte_crypto_cpu_sym_session API > > > > > > Patches 2,3,4 are mutually exclusive, > > > and we probably have to choose which one to go forward with. > > > I put some explanations in each of the patches, hopefully that will help > > > to understand pros and cons of each one. > > > > > > Akhil strongly supports #3, AFAIK mainly because it allows PMDs to > > > reuse existing API and minimize API level changes. > > > My favorite is #4, #2 is less preferable but ok too. > > > #3 seems problematic to me by the reasons I outlined in #4 patch > > > description. > > > > > > Please provide your opinion. > > > > I spend some time on the proposal and I agree that sync API is needed > > and it makes sense to remove queue emulation and allocating/freeing > > the crypto_ops > > in case of sync API. > > > > # I would prefer to not duplicate the session. If the newly added > > fields are for optimization > > then those can be applicable for HW too. For example, if we consider, > > offset to be > > constant for one session HW PMD will be able to leverage this. ref: > > rte_crypto_aead_xfrom::cpu_crypto:offset > > It might, but right for async API we pass this info in crypto_op instead. > So if I get you right your preference is sort of #3 approach > that reuses existing rte_cryptodev API as much as possible: > reuse existing rte_cryptodev_sym structure with new sync process() API? Yes. > > # I would prefer to not duplicate ops parameters, instead of the > > existing rte_crypto_ops can be updated. > > I see that most members introduced in rte_crypto_sym_vec & > > rte_crypto_vec are already existing in rte_crypto_op. > > rte_crypto_ops is way too generic/excessive. > Filling/reading it seems one of the main slowdowns that we trying to > avoid in new API. It does not look like it is going over 1 CL. Regarding the filling case, I think, We need to form the rte_crypto_ops in the slow path and change only in mutable fields need to update per packet. > > > > Also, since we are agreeing that the ops for SYNC API can be from > > stack/one time allocated, the size shouldn't matter. > > I can be on stack, but it means user will still have to fill them > and PMD will have to read/process/overwrite them. > > > I understand that this would cause ABI breakage, but for this release, > > we can work together and add some reserved fields > > that we can implement later. I believe that's the reason why you want > > to introduce new structures. I think that will bloat > > the existing crypto lib. > > It will increase the lib code, but I don't think it will be significant. > Honestly, I think messing with crypto_op and other existing structures > might have much more negative effect. Yes. We need to change it carefully. > > > If I understand it correctly, this will be used in conjunction with > > IXGBE to handle fragmented IPsec traffic. If that's the fundamental > > reasoning, then there is an alternate path possible. > > No, it's just one of the use-case. > Pretty important, but not the only one. > The main reason - current cryptodev API (crypto_op based) is suboptimal for > SW based PMDs. > We wasting too many cycles to pretend that it is a lookaside device > underneath. That I agree. I think, it should be fixed by the process() API. > I think makes more sense to admit that it is SW based and exploit it nature, > instead of trying to hide it. Yes. I thought the separate process() device op will solve the major problems. This is just my _personal_ opinion. I leave crypto code contributors to define specifics of API.