Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API

Ananyev, Konstantin Mon, 14 Oct 2019 04:12:32 -0700


> Hi Akhil,
> 
> Thanks for the review and comments!
> Knowing you are extremely busy. Here is my point in brief:
> I think placing the CPU synchronous crypto in the rte_security make sense, as
> 
> 1. rte_security contains inline crypto and lookaside crypto action type 
> already, adding cpu_crypto action type is reasonable.
> 2. rte_security contains the security features may not supported by all 
> devices, such as crypto, ipsec, and PDCP. cpu_crypto follow this
> category, again crypto.
> 3. placing CPU synchronous crypto API in rte_security is natural - as inline 
> mode works synchronously, too. However cryptodev doesn't.
> 4. placing CPU synchronous crypto API in rte_security helps boosting SW 
> crypto performance, I have already provided a simple perf test
> inside the unit test in the patchset for the user to try out - just comparing 
> its output against DPDK crypto perf app output.
> 5. placing CPU synchronous crypto API in cryptodev will never serve HW 
> lookaside crypto PMDs, as making them to work synchronously
> have huge performance penalty. However Cryptodev framework's existing design 
> is providing APIs that will work in all crypto PMDs
> (rte_cryptodev_enqueue_burst / dequeue_burst for example), this does not fit 
> in cryptodev's principle.
> 6. placing CPU synchronous crypto API in cryptodev confuses the user, as:
>       - the session created for async mode may not work in sync mode
>       - both enqueue/dequeue and cpu_crypto_process does the same crypto 
> processing, but one PMD may support only one API (set),
> the other may support another, and the third PMD supports both. We have to 
> provide another API to let the user query which one to
> support which.
>       - two completely different code paths for async/sync mode.
> 7. You said in the end of the email - placing CPU synchronous crypto API into 
> rte_security is not acceptable as it does not do any
> rte_security stuff - crypto isn't? You may call this a quibble, but in my 
> idea, in the patchset both PMDs' implementations did offload the work
> to the CPU's special circuit designed dedicated to accelerate the crypto 
> processing.
> 
> To me cryptodev is the one CPU synchronous crypto API should not go into, 
> rte_security is.


I also don't understand why rte_security is not an option here.
We do have inline-crypto right now, why we can't have cpu-crypto with new 
process() API here?
Actually would like to hear more opinions from the community here -
what other interested parties think is the best way for introducing cpu-crypto 
specific API? 

Konstantin

> 
> Regards,
> Fan
> 
> > -----Original Message-----
> > From: Akhil Goyal [mailto:akhil.go...@nxp.com]
> > Sent: Friday, October 11, 2019 2:24 PM
> > To: Ananyev, Konstantin <konstantin.anan...@intel.com>; 'dev@dpdk.org'
> > <dev@dpdk.org>; De Lara Guarch, Pablo <pablo.de.lara.gua...@intel.com>;
> > 'Thomas Monjalon' <tho...@monjalon.net>; Zhang, Roy Fan
> > <roy.fan.zh...@intel.com>; Doherty, Declan <declan.dohe...@intel.com>
> > Cc: 'Anoob Joseph' <ano...@marvell.com>
> > Subject: RE: [RFC PATCH 1/9] security: introduce CPU Crypto action type and
> > API
> >
> > Hi Konstantin,
> >
> > >
> > > Hi Akhil,
> > >
> > ..[snip]
> >
> > > > > > > > OK let us assume that you have a separate structure. But I
> > > > > > > > have a few
> > > > > queries:
> > > > > > > > 1. how can multiple drivers use a same session
> > > > > > >
> > > > > > > As a short answer: they can't.
> > > > > > > It is pretty much the same approach as with rte_security -
> > > > > > > each device
> > > needs
> > > > > to
> > > > > > > create/init its own session.
> > > > > > > So upper layer would need to maintain its own array (or so) for 
> > > > > > > such
> > case.
> > > > > > > Though the question is why would you like to have same session
> > > > > > > over
> > > > > multiple
> > > > > > > SW backed devices?
> > > > > > > As it would be anyway just a synchronous function call that
> > > > > > > will be
> > > executed
> > > > > on
> > > > > > > the same cpu.
> > > > > >
> > > > > > I may have single FAT tunnel which may be distributed over
> > > > > > multiple Cores, and each core is affined to a different SW device.
> > > > >
> > > > > If it is pure SW, then we don't need multiple devices for such 
> > > > > scenario.
> > > > > Device in that case is pure abstraction that we can skip.
> > > >
> > > > Yes agreed, but that liberty is given to the application whether it
> > > > need multiple devices with single queue or a single device with multiple
> > queues.
> > > > I think that independence should not be broken in this new API.
> > > > >
> > > > > > So a single session may be accessed by multiple devices.
> > > > > >
> > > > > > One more example would be depending on packet sizes, I may
> > > > > > switch
> > > between
> > > > > > HW/SW PMDs with the same session.
> > > > >
> > > > > Sure, but then we'll have multiple sessions.
> > > >
> > > > No, the session will be same and it will have multiple private data
> > > > for each of
> > > the PMD.
> > > >
> > > > > BTW, we have same thing now - these private session pointers are
> > > > > just
> > > stored
> > > > > inside the same rte_crypto_sym_session.
> > > > > And if user wants to support this model, he would also need to
> > > > > store <dev_id, queue_id> pair for each HW device anyway.
> > > >
> > > > Yes agreed, but how is that thing happening in your new struct, you
> > > > cannot
> > > support that.
> > >
> > > User can store all these info in his own struct.
> > > That's exactly what we have right now.
> > > Let say ipsec-secgw has to store for each IPsec SA:
> > > pointer to crypto-session and/or pointer to security session plus (for
> > > lookaside-devices) cdev_id_qp that allows it to extract dev_id +
> > > queue_id information.
> > > As I understand that works for now, as each ipsec_sa uses only one
> > > dev+queue. Though if someone would like to use multiple devices/queues
> > > for the same SA - he would need to have an array of these <dev+queue>
> > pairs.
> > > So even right now rte_cryptodev_sym_session is not self-consistent and
> > > requires extra information to be maintained by user.
> >
> > Why are you increasing the complexity for the user application.
> > The new APIs and struct should be such that it need to do minimum changes
> > in the stack so that stack is portable on multiple vendors.
> > You should try to hide as much complexity in the driver or lib to give the 
> > user
> > simple APIs.
> >
> > Having a same session for multiple devices was added by Intel only for some
> > use cases.
> > And we had split that session create API into 2. Now if those are not useful
> > shall we move back to the single API. I think @Doherty, Declan and @De Lara
> > Guarch, Pablo can comment on this.
> >
> > >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > > 2. Can somebody use the scheduler pmd for scheduling the
> > > > > > > > different
> > > type
> > > > > of
> > > > > > > payloads for the same session?
> > > > > > >
> > > > > > > In theory yes.
> > > > > > > Though for that scheduler pmd should have inside it's
> > > > > > > rte_crypto_cpu_sym_session an array of pointers to the
> > > > > > > underlying devices sessions.
> > > > > > >
> > > > > > > >
> > > > > > > > With your proposal the APIs would be very specific to your
> > > > > > > > use case
> > > only.
> > > > > > >
> > > > > > > Yes in some way.
> > > > > > > I consider that API specific for SW backed crypto PMDs.
> > > > > > > I can hardly see how any 'real HW' PMDs (lksd-none,
> > > > > > > lksd-proto) will
> > > benefit
> > > > > > > from it.
> > > > > > > Current crypto-op API is very much HW oriented.
> > > > > > > Which is ok, that's for it was intended for, but I think we
> > > > > > > also need one
> > > that
> > > > > > > would be designed
> > > > > > > for SW backed implementation in mind.
> > > > > >
> > > > > > We may re-use your API for HW PMDs as well which do not have
> > > requirement
> > > > > of
> > > > > > Crypto-op/mbuf etc.
> > > > > > The return type of your new process API may have a status which
> > > > > > say
> > > > > 'processed'
> > > > > > Or can be say 'enqueued'. So if it is  'enqueued', we may have a
> > > > > > new API for
> > > > > raw
> > > > > > Bufs dequeue as well.
> > > > > >
> > > > > > This requirement can be for any hardware PMDs like QAT as well.
> > > > >
> > > > > I don't think it is a good idea to extend this API for async 
> > > > > (lookaside)
> > devices.
> > > > > You'll need to:
> > > > >  - provide dev_id and queue_id for each process(enqueue) and
> > > > > dequeuer operation.
> > > > >  - provide IOVA for all buffers passing to that function (data
> > > > > buffers, digest,
> > > IV,
> > > > > aad).
> > > > >  - On dequeue provide some way to associate dequed data and digest
> > > > > buffers with
> > > > >    crypto-session that was used  (and probably with mbuf).
> > > > >  So most likely we'll end up with another just version of our
> > > > > current crypto-op structure.
> > > > > If you'd like to get rid of mbufs dependency within current
> > > > > crypto-op API that understandable, but I don't think we should
> > > > > have same API for both sync (CPU) and async
> > > > > (lookaside) cases.
> > > > > It doesn't seem feasible at all and voids whole purpose of that patch.
> > > >
> > > > At this moment we are not much concerned about the dequeue API and
> > > > about
> > > the
> > > > HW PMD support. It is just that the new API should be generic enough
> > > > to be
> > > used in
> > > > some future scenarios as well. I am just highlighting the possible
> > > > usecases
> > > which can
> > > > be there in future.
> > >
> > > Sorry, but I strongly disagree with such approach.
> > > We should stop adding/modifying API 'just in case' and because 'it
> > > might be useful for some future HW'.
> > > Inside DPDK we already do have too many dev level APIs without any
> > > implementations.
> > > That's quite bad practice and very dis-orienting for end-users.
> > > I think to justify API additions/changes we need at least one proper
> > > implementation for it, or at least some strong evidence that people
> > > are really committed to support it in nearest future.
> > > BTW, that what TB agreed on, nearly a year ago.
> > >
> > > This new API (if we'll go ahead with it of course) would stay
> > > experimental for some time anyway to make sure we don't miss anything
> > > needed (I think for about a year time- frame).
> > > So if you guys *really* want to extend it support _async_ devices too
> > > - I am open for modifications/additions here.
> > > Though personally I think such addition would over-complicate things
> > > and we'll end up with another reincarnation of current crypto-op.
> > > We actually discussed it internally, and decided to drop that idea because
> > of that.
> > > Again, my opinion - for lookaside devices it might be better to try to
> > > optimize current crypto-op path (remove mbuf requirement, probably add
> > > ability to group by session on enqueue/dequeue, etc.).
> >
> > I agree that the new API is experimental and can be modified later. So no
> > issues in that, but we can keep some things in mind while defining APIs.
> > These were some comments from my side, if those are impacting the current
> > scenario, you can drop those. We will take care of those later.
> >
> > >
> > > >
> > > > What is the issue that you face in making a dev-op for this new API.
> > > > Do you see
> > > any
> > > > performance impact with that?
> > >
> > > There are two main things:
> > > 1. user would need to maintain and provide for each process() call
> > > dev_id+queue_id.
> > > That's means extra (and totally unnecessary for SW) overhead.
> >
> > You are using a crypto device for performing the processing, you must use
> > dev_id to identify which SW device it is. This is how the DPDK Framework
> > works.
> > .
> >
> > > 2. yes I would expect some perf overhead too - it would be extra call or
> > branch.
> > > Again as it would be data-dependency - most likely cpu wouldn't be
> > > able to pipeline it efficiently:
> > >
> > > rte_crypto_sym_process(uint8_t dev_id, uint16 qp_id,
> > > rte_crypto_sym_session *ses, ...) {
> > >      struct rte_cryptodev *dev = &rte_cryptodevs[dev_id];
> > >      return (*dev->process)(sess->data[dev->driver_id, ...); }
> > >
> > > driver_specific_process(driver_specific_sym_session *sess) {
> > >    return sess->process(sess, ...) ;
> > > }
> > >
> > > I didn't make any exact measurements but sure it would be slower than
> > just:
> > > session_udata->process(session->udata->sess, ...); Again it would be
> > > much more noticeable on low end cpus.
> > > Let say here:
> > > http://mails.dpdk.org/archives/dev/2019-September/144350.html
> > > Jerin claims 1.5-3% drop for introducing extra call via hiding eth_dev
> > > contents - I suppose we would have something similar here.
> > > I do realize that in majority of cases crypto is more expensive then
> > > RX/TX, but still.
> > >
> > > If it would be a really unavoidable tradeoff (support already existing
> > > API, or so) I wouldn't mind, but I don't see any real need for it right 
> > > now.
> >
> > Calling session_udata->process(session->udata->sess, ...); from the
> > application and Application need to maintain for each PMD the process() API
> > in its memory will make the application not portable to other vendors.
> >
> > What we are doing here is defining another way to create sessions for the
> > same stuff that is already done. This make applications non-portable and
> > confusing for the application writer.
> >
> > I would say you should do some profiling first. As you also mentioned crypto
> > workload is more Cycle consuming, it will not impact this case.
> >
> >
> > >
> > > >
> > > > >
> > > > > > That is why a dev-ops would be a better option.
> > > > > >
> > > > > > >
> > > > > > > > When you would add more functionality to this sync
> > > > > > > > API/struct, it will
> > > end
> > > > > up
> > > > > > > being the same API/struct.
> > > > > > > >
> > > > > > > > Let us  see how close/ far we are from the existing APIs
> > > > > > > > when the
> > > actual
> > > > > > > implementation is done.
> > > > > > > >
> > > > > > > > > > I am not sure if that would be needed.
> > > > > > > > > > It would be internal to the driver that if synchronous
> > > > > > > > > > processing is
> > > > > > > > > supported(from feature flag) and
> > > > > > > > > > Have relevant fields in xform(the newly added ones which
> > > > > > > > > > are
> > > packed
> > > > > as
> > > > > > > per
> > > > > > > > > your suggestions) set,
> > > > > > > > > > It will create that type of session.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > + * Main points:
> > > > > > > > > > > + * - Current crypto-dev API is reasonably mature and
> > > > > > > > > > > + it is
> > > desirable
> > > > > > > > > > > + *   to keep it unchanged (API/ABI stability). From other
> > side, this
> > > > > > > > > > > + *   new sync API is new one and probably would require
> > extra
> > > > > changes.
> > > > > > > > > > > + *   Having it as a new one allows to mark it as 
> > > > > > > > > > > experimental,
> > > without
> > > > > > > > > > > + *   affecting existing one.
> > > > > > > > > > > + * - Fully opaque cpu_sym_session structure gives more
> > flexibility
> > > > > > > > > > > + *   to the PMD writers and again allows to avoid ABI
> > breakages
> > > in
> > > > > future.
> > > > > > > > > > > + * - process() function per set of xforms
> > > > > > > > > > > + *   allows to expose different process() functions for
> > different
> > > > > > > > > > > + *   xform combinations. PMD writer can decide, does he
> > wants
> > > to
> > > > > > > > > > > + *   push all supported algorithms into one process()
> > function,
> > > > > > > > > > > + *   or spread it across several ones.
> > > > > > > > > > > + *   I.E. More flexibility for PMD writer.
> > > > > > > > > >
> > > > > > > > > > Which process function should be chosen is internal to
> > > > > > > > > > PMD, how
> > > > > would
> > > > > > > that
> > > > > > > > > info
> > > > > > > > > > be visible to the application or the library. These will
> > > > > > > > > > get stored in
> > > the
> > > > > > > session
> > > > > > > > > private
> > > > > > > > > > data. It would be upto the PMD writer, to store the per
> > > > > > > > > > session
> > > process
> > > > > > > > > function in
> > > > > > > > > > the session private data.
> > > > > > > > > >
> > > > > > > > > > Process function would be a dev ops just like enc/deq
> > > > > > > > > > operations
> > > and it
> > > > > > > should
> > > > > > > > > call
> > > > > > > > > > The respective process API stored in the session private 
> > > > > > > > > > data.
> > > > > > > > >
> > > > > > > > > That model (via devops) is possible, but has several
> > > > > > > > > drawbacks from
> > > my
> > > > > > > > > perspective:
> > > > > > > > >
> > > > > > > > > 1. It means we'll need to pass dev_id as a parameter to
> > > > > > > > > process()
> > > function.
> > > > > > > > > Though in fact dev_id is not a relevant information for us
> > > > > > > > > here (all we need is pointer to the session and pointer to
> > > > > > > > > the fuction to call) and I tried to avoid using it in 
> > > > > > > > > data-path
> > functions for that API.
> > > > > > > >
> > > > > > > > You have a single vdev, but someone may have multiple vdevs
> > > > > > > > for each
> > > > > thread,
> > > > > > > or may
> > > > > > > > Have same dev with multiple queues for each core.
> > > > > > >
> > > > > > > That's fine. As I said above it is a SW backed implementation.
> > > > > > > Each session has to be a separate entity that contains all
> > > > > > > necessary
> > > > > information
> > > > > > > (keys, alg/mode info,  etc.)  to process input buffers.
> > > > > > > Plus we need the actual function pointer to call.
> > > > > > > I just don't see what for we need a dev_id in that situation.
> > > > > >
> > > > > > To iterate the session private data in the session.
> > > > > >
> > > > > > > Again, here we don't need care about queues and their pinning to
> > cores.
> > > > > > > If let say someone would like to process buffers from the same
> > > > > > > IPsec SA
> > > on 2
> > > > > > > different cores in parallel, he can just create 2 sessions for
> > > > > > > the same
> > > xform,
> > > > > > > give one to thread #1  and second to thread #2.
> > > > > > > After that both threads are free to call process(this_thread_ses, 
> > > > > > > ...)
> > at will.
> > > > > >
> > > > > > Say you have a 16core device to handle 100G of traffic on a single
> > tunnel.
> > > > > > Will we make 16 sessions with same parameters?
> > > > >
> > > > > Absolutely same question we can ask for current crypto-op API.
> > > > > You have lookaside crypto-dev with 16 HW queues, each queue is
> > > > > serviced by different CPU.
> > > > > For the same SA, do you need a separate session per queue, or is
> > > > > it ok to
> > > reuse
> > > > > current one?
> > > > > AFAIK, right now this is a grey area not clearly defined.
> > > > > For crypto-devs I am aware - user can reuse the same session (as
> > > > > PMD uses it read-only).
> > > > > But again, right now I think it is not clearly defined and is
> > > > > implementation specific.
> > > >
> > > > User can use the same session, that is what I am also insisting, but
> > > > it may have
> > > separate
> > > > Session private data. Cryptodev session create API provide that
> > > > functionality
> > > and we can
> > > > Leverage that.
> > >
> > > rte_cryptodev_sym_session. sess_data[] is indexed by driver_id, which
> > > means we can't use the same rte_cryptodev_sym_session to hold sessions
> > > for both sync and async mode for the same device. Off course we can
> > > add a hard requirement that any driver that wants to support process()
> > > has to create sessions that can handle both  process and
> > > enqueue/dequeue, but then again  what for to create such overhead?
> > >
> > > BTW, to be honest, I don't consider current rte_cryptodev_sym_session
> > > construct for multiple device_ids:
> > > __extension__ struct {
> > >                 void *data;
> > >                 uint16_t refcnt;
> > >         } sess_data[0];
> > >         /**< Driver specific session material, variable size */
> > >
> > Yes I also feel the same. I was also not in favor of this when it was 
> > introduced.
> > Please go ahead and remove this. I have no issues with that.
> >
> > > as an advantage.
> > > It looks too error prone for me:
> > > 1. Simultaneous session initialization/de-initialization for devices
> > > with the same driver_id is not possible.
> > > 2. It assumes that all device driver will be loaded before we start to
> > > create session pools.
> > >
> > > Right now it seems ok, as no-one requires such functionality, but I
> > > don't know how it will be in future.
> > > For me rte_security session model, where for each security context
> > > user have to create new session looks much more robust.
> > Agreed
> >
> > >
> > > >
> > > > BTW, I can see a v2 to this RFC which is still based on security 
> > > > library.
> > >
> > > Yes, v2 was concentrated on fixing found issues, some code
> > > restructuring, i.e. - changes that would be needed anyway whatever API
> > aproach we'll choose.
> > >
> > > > When do you plan
> > > > To submit the patches for crypto based APIs. We have RC1 merge
> > > > deadline for
> > > this
> > > > patchset on 21st Oct.
> > >
> > > We'd like to start working on it ASAP, but it seems we still have a
> > > major disagreement about how this crypto-dev API should look like.
> > > Which makes me think - should we return to our original proposal via
> > > rte_security?
> > > It still looks to me like clean and straightforward way to enable this
> > > new API, and probably wouldn't cause that much controversy.
> > > What do you think?
> >
> > I cannot spend more time discussing on this until RC1 date. I have some 
> > other
> > stuff pending.
> > You can send the patches early next week with the approach that I
> > mentioned or else we can discuss this post RC1(which would mean deferring
> > to 20.02).
> >
> > But moving back to security is not acceptable to me. The code should be put
> > where it is intended and not where it is easy to put. You are not doing any
> > rte_security stuff.
> >
> >
> > Regards,
> > Akhil

Re: [dpdk-dev] [RFC PATCH 1/9] security: introduce CPU Crypto action type and API

Reply via email to