Hi Fan,
> 
> Hi Akhil,
> 
> ...
> > >
> > > As you may have seen the structure definition is hW centric with the
> > > IOVA addresses all over. Also as you will from the patch series the
> > operation is
> > > Per operation basis instead of operating in a burst. The external 
> > > application
> > > may sooner know when a specific enqueue is failed.
> >
> > You may also need to save a virtual address as well. As some hardware are
> > able to
> > Convert virtual to physical addresses on it's own giving a performance
> > improvement.
> >
> > I do not see an issue in using enqueue burst with burst size=1 , but since 
> > you
> > are doing
> > Optimizations, none of the hardware can perform well with burst = 1, I think
> > it is always
> > Greater than 1.
> 
> Shall I update the rte_crypto_sym_vec as the following - so the 2 problems can
> be
> resolved?
> 
> struct rte_crypto_sym_vec {
>       /** array of SGL vectors */
>       struct rte_crypto_sgl *sgl;
>       union {
>               /* Supposed to be used with CPU crypto API call. */
>               struct {
>                       /** array of pointers to IV */
>                       void **iv;
>                       /** array of pointers to AAD */
>                       void **aad;
>                       /** array of pointers to digest */
>                       void **digest;
>                       /**
>                        * array of statuses for each operation:
>                        *  - 0 on success
>                        *  - errno on error
>                        */
>                       int32_t *status;
>               };
> 
>               /* Supposed to be used with HW crypto API call. */
>               struct {
>                       /** array of pointers to IV */
>                       struct rte_crypto_vec *iv_hw;
>                       /** array of pointers to AAD */
>                       struct rte_crypto_vec *aad_hw;
>                       /** array of pointers to Digest */
>                       struct rte_crypto_vec *digest_hw;
>               };
> 
>       };
>       /** number of operations to perform */
>       uint32_t num;
> };

Yes something of that sort can work. 

The current use case was also discussed in the CPU crypto mail chain
About the need of non-mbuf use case for async enq/deq APIs.
http://patches.dpdk.org/patch/58528/#101995


> 
> > >
> > > > Now for this patchset, the requirement is
> > > > - raw buffers
> > > > - asynchronous APIs
> > > >
> > > > The data structure for raw buffers and crypto related offsets are 
> > > > already
> > > > defined
> > > > So they should be reused.
> > > > And I believe with some changes in rte_crypto_op  and
> > rte_crypto_sym_op,
> > > > We can support raw buffers with the same APIs.
> > > > Instead of m_src and m_dst, raw buffer data structures can be combined
> > in a
> > > > Union and some of the fields in the rte_crypto_op can be left NULL in
> > case of
> > > > raw buffers.
> > > >
> > >
> > > This is a good point but we still face too many unnecessary fields to be
> > NULL,
> > > such as
> > > digest pointers, I have given a lot thought to this structure. Hopefully 
> > > it
> > covers
> > > all vendor's HW symmetric crypto needs and in the same time it well
> > squeeze
> > > the required HW addresses into 1 cacheline, instead of rte_crypto_op +
> > > rte_crypto_sym_op 3 cacheline footprint. Another purpose of the
> > structure
> > > design
> > > is the structure buffer can be taken from stack and can be used to fill 
> > > all
> > > jobs to the PMD HW.
> >
> > Which fields you think are not useful and should be set as NULL?
> > Digest pointers you are anyways setting in the new structure.
> > Your new struct does not support session less as well as security sessions.
> > It does not take care of asymmetric crypto.
> > So whenever, a vendor need to support all these, we would end up getting
> > the rte_crypto_op structure.
> > IMO, you only need to make m_src and m_dst as union to a raw
> > input/output
> > buffers. Everything else will be relevant.
> >
> 
> Rte_crypto_op is designed to be allocated from mempool with HW address
> info contained so it is possible to deduct IV and AAD physical address from
> it. More importantly rte_crypto_op is designed to be taken from heap and
> being freed after dequeue. So they cannot be allocated from stack - for this
> reason I think rte_crypot_sym_vec is a better fit for the patch, do you agree?
> (the Proposed change is at above).

Agreed.

> 
> > Have you done some profiling with using rte_crypto_op instead of this new
> > struct?
> >
> Yes, the code are actually upstreamed in VPP
> https://gerrit.fd.io/r/c/vpp/+/18036, please try out. If you
> have a look at the
> enqueue/dequeue functions you should see the struggle we had to translate
> ops, and creating a second software ring to make sure we only dequeue a
> frame of data. Lucky VPP has space to store mbufs otherwise the perf will
> be even worse.
What is the performance gap do you see after making m_src and m_dst as
Raw buffers?

> 
> > >
> > > >
> > > > > +/* Struct for user to perform HW specific enqueue/dequeue function
> > calls
> > > > */
> > > > > +struct rte_crypto_hw_ops {
> > > > > +     /* Driver written queue pair data pointer, should NOT be
> alterred by
> > > > > +      * the user.
> > > > > +      */
> > > > > +     void *qp;
> > > > > +     /* Function handler to enqueue AEAD job */
> > > > > +     rte_crypto_hw_enq_cb_fn enqueue_aead;
> > > > > +     /* Function handler to enqueue cipher only job */
> > > > > +     rte_crypto_hw_enq_cb_fn enqueue_cipher;
> > > > > +     /* Function handler to enqueue auth only job */
> > > > > +     rte_crypto_hw_enq_cb_fn enqueue_auth;
> > > > > +     /* Function handler to enqueue cipher + hash chaining job */
> > > > > +     rte_crypto_hw_enq_cb_fn enqueue_chain;
> > > > > +     /* Function handler to query processed jobs */
> > > > > +     rte_crypto_hw_query_processed query_processed;
> > > > > +     /* Function handler to dequeue one job and return opaque data
> > > > stored
> > > > > */
> > > > > +     rte_crypto_hw_deq_one_cb_fn dequeue_one;
> > > > > +     /* Function handler to dequeue many jobs */
> > > > > +     rte_crypto_hw_deq_many_cb_fn dequeue_many;
> > > > > +     /* Reserved */
> > > > > +     void *reserved[8];
> > > > > +};
> > > >
> > > > Why do we need such callbacks in the library?
> > > > These should be inside the drivers, or else we do the same for
> > > > Legacy case as well. The pain of finding the correct enq function for
> > > > Appropriate crypto operation is already handled by all the drivers
> > > > And we can reuse that or else we modify it there as well.
> > > >
> > >
> > > Providing different types of enqueue functions for specific operation type
> > > could save a lot of branches for the driver to handle. As mentioned this
> > > data-path API is intended to be used as an advanced feature to provide
> > > close-to-native perf to external library/applications that are not mbuf
> > > centric. And I don't agree classifying choosing 1 enqueue function from
> > > 4 candidates as "pain".
> >
> > My point is why don't we have it in the Legacy code path as well?
> > I think it is useful in both the paths. Branching is a pain for the driver.
> >
> 
> That's a good point :-) we definitely can do something about it in future 
> releases.
> 
> > >
> > > > We should not add a lot of data paths for the user, otherwise the
> > > > APIs will become centric to a particular vendor and it will be very 
> > > > difficult
> > > > For the user to migrate from one vendor to another and would defeat
> > the
> > > > Purpose of DPDK which provide uniform abstraction layer for all the
> > > > hardware
> > > > Vendors.
> > > >
> > >
> > > The purpose of adding data-path for the user is performance for non-mbuf
> > > data-path centric applications/libraries, in the meantime not creating
> > > confusion. In this version we aim to provide a more friendly data-path for
> >
> > I do not see the new path as friendly.
> > Adding a parallel new datapath with create more confusion for the
> > application
> > developer. It would be convenient, if we can use the same path with minimal
> > changes so that people can migrate easily.
> >
> 
> We are working on next version that is based on rte_crypto_sym_vec and single
> Enqueue-then-dequeue API. To be honest it won't be that of friendly to
> application
> developer that the applications are mbuf-based or already built on top of
> cryptodev,
> however for the applications that are not mbuf based and having their own
> data-path
> structures it will be surely friendlier than existing enqueue and dequeue 
> APIs. No
> dependency to mbuf, mbuf and crypto op pool, and no need to consider how to
> adapt
> different working methods.

Agreed with your point. The intention is just not to create multiple copies of
Same information in multiple structures.

> 
> > > them, and aims to be useful to all vendor's PMDs. If there is any place in
> > > the API that blocks a PMD please let me know.
> >
> > As commented above, sessionless, rte_security sessions, asymmetric crypto
> > Not supported.
> >
> You are right -
> Sessionless support aims the usability and is not intended to be used in high-
> throughput
> Application.

There may be some cases where a limited amount of control pkts can be sent
Which may be session less. We cannot ask people to use a different data path
For such traffic. So we may need to support that too.

> Rte_security is built on top of mbuf and ethdev and is not intended to "mbuf-
> independent"
> applications either.

Rte_security for lookaside protocol can be mbuf independent and NXP may
Support it in future especially in case of PDCP.

Regards,
Akhil

Reply via email to