Hi Fan, > > Hi Akhil, > > ... > > > > > > As you may have seen the structure definition is hW centric with the > > > IOVA addresses all over. Also as you will from the patch series the > > operation is > > > Per operation basis instead of operating in a burst. The external > > > application > > > may sooner know when a specific enqueue is failed. > > > > You may also need to save a virtual address as well. As some hardware are > > able to > > Convert virtual to physical addresses on it's own giving a performance > > improvement. > > > > I do not see an issue in using enqueue burst with burst size=1 , but since > > you > > are doing > > Optimizations, none of the hardware can perform well with burst = 1, I think > > it is always > > Greater than 1. > > Shall I update the rte_crypto_sym_vec as the following - so the 2 problems can > be > resolved? > > struct rte_crypto_sym_vec { > /** array of SGL vectors */ > struct rte_crypto_sgl *sgl; > union { > /* Supposed to be used with CPU crypto API call. */ > struct { > /** array of pointers to IV */ > void **iv; > /** array of pointers to AAD */ > void **aad; > /** array of pointers to digest */ > void **digest; > /** > * array of statuses for each operation: > * - 0 on success > * - errno on error > */ > int32_t *status; > }; > > /* Supposed to be used with HW crypto API call. */ > struct { > /** array of pointers to IV */ > struct rte_crypto_vec *iv_hw; > /** array of pointers to AAD */ > struct rte_crypto_vec *aad_hw; > /** array of pointers to Digest */ > struct rte_crypto_vec *digest_hw; > }; > > }; > /** number of operations to perform */ > uint32_t num; > };
Yes something of that sort can work. The current use case was also discussed in the CPU crypto mail chain About the need of non-mbuf use case for async enq/deq APIs. http://patches.dpdk.org/patch/58528/#101995 > > > > > > > > Now for this patchset, the requirement is > > > > - raw buffers > > > > - asynchronous APIs > > > > > > > > The data structure for raw buffers and crypto related offsets are > > > > already > > > > defined > > > > So they should be reused. > > > > And I believe with some changes in rte_crypto_op and > > rte_crypto_sym_op, > > > > We can support raw buffers with the same APIs. > > > > Instead of m_src and m_dst, raw buffer data structures can be combined > > in a > > > > Union and some of the fields in the rte_crypto_op can be left NULL in > > case of > > > > raw buffers. > > > > > > > > > > This is a good point but we still face too many unnecessary fields to be > > NULL, > > > such as > > > digest pointers, I have given a lot thought to this structure. Hopefully > > > it > > covers > > > all vendor's HW symmetric crypto needs and in the same time it well > > squeeze > > > the required HW addresses into 1 cacheline, instead of rte_crypto_op + > > > rte_crypto_sym_op 3 cacheline footprint. Another purpose of the > > structure > > > design > > > is the structure buffer can be taken from stack and can be used to fill > > > all > > > jobs to the PMD HW. > > > > Which fields you think are not useful and should be set as NULL? > > Digest pointers you are anyways setting in the new structure. > > Your new struct does not support session less as well as security sessions. > > It does not take care of asymmetric crypto. > > So whenever, a vendor need to support all these, we would end up getting > > the rte_crypto_op structure. > > IMO, you only need to make m_src and m_dst as union to a raw > > input/output > > buffers. Everything else will be relevant. > > > > Rte_crypto_op is designed to be allocated from mempool with HW address > info contained so it is possible to deduct IV and AAD physical address from > it. More importantly rte_crypto_op is designed to be taken from heap and > being freed after dequeue. So they cannot be allocated from stack - for this > reason I think rte_crypot_sym_vec is a better fit for the patch, do you agree? > (the Proposed change is at above). Agreed. > > > Have you done some profiling with using rte_crypto_op instead of this new > > struct? > > > Yes, the code are actually upstreamed in VPP > https://gerrit.fd.io/r/c/vpp/+/18036, please try out. If you > have a look at the > enqueue/dequeue functions you should see the struggle we had to translate > ops, and creating a second software ring to make sure we only dequeue a > frame of data. Lucky VPP has space to store mbufs otherwise the perf will > be even worse. What is the performance gap do you see after making m_src and m_dst as Raw buffers? > > > > > > > > > > > > > +/* Struct for user to perform HW specific enqueue/dequeue function > > calls > > > > */ > > > > > +struct rte_crypto_hw_ops { > > > > > + /* Driver written queue pair data pointer, should NOT be > alterred by > > > > > + * the user. > > > > > + */ > > > > > + void *qp; > > > > > + /* Function handler to enqueue AEAD job */ > > > > > + rte_crypto_hw_enq_cb_fn enqueue_aead; > > > > > + /* Function handler to enqueue cipher only job */ > > > > > + rte_crypto_hw_enq_cb_fn enqueue_cipher; > > > > > + /* Function handler to enqueue auth only job */ > > > > > + rte_crypto_hw_enq_cb_fn enqueue_auth; > > > > > + /* Function handler to enqueue cipher + hash chaining job */ > > > > > + rte_crypto_hw_enq_cb_fn enqueue_chain; > > > > > + /* Function handler to query processed jobs */ > > > > > + rte_crypto_hw_query_processed query_processed; > > > > > + /* Function handler to dequeue one job and return opaque data > > > > stored > > > > > */ > > > > > + rte_crypto_hw_deq_one_cb_fn dequeue_one; > > > > > + /* Function handler to dequeue many jobs */ > > > > > + rte_crypto_hw_deq_many_cb_fn dequeue_many; > > > > > + /* Reserved */ > > > > > + void *reserved[8]; > > > > > +}; > > > > > > > > Why do we need such callbacks in the library? > > > > These should be inside the drivers, or else we do the same for > > > > Legacy case as well. The pain of finding the correct enq function for > > > > Appropriate crypto operation is already handled by all the drivers > > > > And we can reuse that or else we modify it there as well. > > > > > > > > > > Providing different types of enqueue functions for specific operation type > > > could save a lot of branches for the driver to handle. As mentioned this > > > data-path API is intended to be used as an advanced feature to provide > > > close-to-native perf to external library/applications that are not mbuf > > > centric. And I don't agree classifying choosing 1 enqueue function from > > > 4 candidates as "pain". > > > > My point is why don't we have it in the Legacy code path as well? > > I think it is useful in both the paths. Branching is a pain for the driver. > > > > That's a good point :-) we definitely can do something about it in future > releases. > > > > > > > > We should not add a lot of data paths for the user, otherwise the > > > > APIs will become centric to a particular vendor and it will be very > > > > difficult > > > > For the user to migrate from one vendor to another and would defeat > > the > > > > Purpose of DPDK which provide uniform abstraction layer for all the > > > > hardware > > > > Vendors. > > > > > > > > > > The purpose of adding data-path for the user is performance for non-mbuf > > > data-path centric applications/libraries, in the meantime not creating > > > confusion. In this version we aim to provide a more friendly data-path for > > > > I do not see the new path as friendly. > > Adding a parallel new datapath with create more confusion for the > > application > > developer. It would be convenient, if we can use the same path with minimal > > changes so that people can migrate easily. > > > > We are working on next version that is based on rte_crypto_sym_vec and single > Enqueue-then-dequeue API. To be honest it won't be that of friendly to > application > developer that the applications are mbuf-based or already built on top of > cryptodev, > however for the applications that are not mbuf based and having their own > data-path > structures it will be surely friendlier than existing enqueue and dequeue > APIs. No > dependency to mbuf, mbuf and crypto op pool, and no need to consider how to > adapt > different working methods. Agreed with your point. The intention is just not to create multiple copies of Same information in multiple structures. > > > > them, and aims to be useful to all vendor's PMDs. If there is any place in > > > the API that blocks a PMD please let me know. > > > > As commented above, sessionless, rte_security sessions, asymmetric crypto > > Not supported. > > > You are right - > Sessionless support aims the usability and is not intended to be used in high- > throughput > Application. There may be some cases where a limited amount of control pkts can be sent Which may be session less. We cannot ask people to use a different data path For such traffic. So we may need to support that too. > Rte_security is built on top of mbuf and ethdev and is not intended to "mbuf- > independent" > applications either. Rte_security for lookaside protocol can be mbuf independent and NXP may Support it in future especially in case of PDCP. Regards, Akhil