> > > > > > > > > > > > > > > > > > > > > > > > > > +#ifdef RTE_CRYPTODEV_CALLBACKS int > > > > > > > > > > > +rte_cryptodev_rcu_qsbr_add(uint8_t dev_id, struct > > > > > > > > > > > +rte_rcu_qsbr > > > > > > > > > > > +*qsbr) { > > > > > > > > > > > + > > > > > > > > > > > + struct rte_cryptodev *dev; > > > > > > > > > > > + > > > > > > > > > > > + if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) { > > > > > > > > > > > + CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, > > dev_id); > > > > > > > > > > > + return -EINVAL; > > > > > > > > > > > + } > > > > > > > > > > > + > > > > > > > > > > > + dev = &rte_crypto_devices[dev_id]; > > > > > > > > > > > + dev->qsbr = qsbr; > > > > > > > > > > > + return 0; > > > > > > > > > > > +} > > > > > > > > > > > > > > > > > > > > So if I understand your patch correctly you propose a > > > > > > > > > > new working model for > > > > > > > > > > crypto-devs: > > > > > > > > > > 1. Control-plane has to allocate/setup rcu_qsbr and do > > > > > > > > > > rte_cryptodev_rcu_qsbr_add(). > > > > > > > > > > 2. Data-plane has somehow to obtain pointer to that > > > > > > > > > > rcu_qsbr and wrap > > > > > > > > > > cryptodev_enqueue() > > > > > > > > > > with rcu_qsbr_quiescent() or > > > > rcu_qsbr_online()/rcu_qsbr_offline(). > > > > > > > > > Yes. I think, it is not a new model. It is same as RCU > > > > > > > > > integration with > > > > > > LPM. > > > > > > > > > Please refer: https://patches.dpdk.org/cover/73673/ > > > > > > > > > > > > > > > > I am talking about new working model for crypto-dev > > > > enqueue/dequeue. > > > > > > > > As I said above now it becomes data-plane thread responsibility > > > > > > > > to: > > > > > > > > -somehow to obtain pointer to that rcu_qsbr for each > > > > > > > > cryptodev it is > > > > > > using. > > > > > > > > -call rcu sync functions (quiescent/online/offline) on a > > > > > > > > regular > > basis. > > > > > > > It is not on regular basis. When data plane comes up, they report > > online. > > > > > > > They report quiescent when they are done with critical section > > > > > > > or shared > > > > > > structure. > > > > > > > > > > > > I understand that, but it means all existing apps have to be > > > > > > changed that > > > > way. > > > > > > > > > > > > > All though, there is some dataplane changes involved here, I > > > > > > > don't think, it > > > > > > is major. > > > > > > > > > > > > I still think our goal here should be to make no visible changes > > > > > > to the dataplane. > > > > > > I.E. all necessary data-plane changes need to be hidden inside > > > > > > CB invocation part. > > > > > Please note that this is being implemented using the memory > > > > > reclamation framework documented at > > > > > https://doc.dpdk.org/guides/prog_guide/rcu_lib.html#resource-recla > > > > > mati > > > > > on-framework-for-dpdk > > > > > > > > > > While using RCU there are couple of trade-offs that applications > > > > > have to > > > > consider: > > > > > 1) Performance - reporting the quiescent state too often results > > > > > in performance impact on data plane > > > > > 2) Amount of outstanding memory to reclaim - reporting less often > > > > > results in more outstanding memory to reclaim > > > > > > > > > > Hence, the quiescent state reporting is left to the application. > > > > > The application decides how often it reports the quiescent state > > > > > and has control > > > > over the data plane performance and the outstanding memory to reclaim. > > > > > > > > > > When you say "new working model for crypto-dev enqueue/dequeue", > > > > > > > > > > 1) are you comparing these with existing crypto-dev > > > > > enqueue/dequeue > > > > APIs? If yes, these are new APIs, it is not breaking anything. > > > > > 2) are you comparing these with existing call back functions in > > > > > ethdev enqueue/dequeue APIs? If yes, agree that this is a new > > > > > model. But, it is > > > > possible to support what ethdev supports along with the RCU method > > > > used in this patch. > > > > > > > > What I am talking about: > > > > > > > > Existing cryptodev enqueue/dequeue model doesn't require for the > > > > user to manage any RCU QSBR state manually. > > > > I believe that addition of ability to add/remove enqueue/dequeue > > > > callbacks shouldn't change existing working model. > > > > I think that adding/removing such callbacks has to be opaque to the > > > > user DP code and shouldn't require user to change it. Same as we > > > > have now for ethdev callback implementation. > > > The ethdev callback implementation conveniently leaves the problem of > > freeing memory to the user to resolve, it does not handle the issue. > > > Hence, it "looks" to be opaque to the DP code. However, if the > > > application has to implement a safe way to free the call back memory, its > > DP is affected based on call backs are being used or not. > > > > Yes, I think that's big drawback in initial ethdev callback implementation > > - it > > simply ignores DP/CP sync problem completely. > > Though I think it is possible to have both here: > > keep callback "opaque" to DP code and provide some sync mechanism > > between DP/CP. > > Hopefully one day we can fix ethdev callbacks too. > The solution we develop can be used in ethdev too. > > > > > > > I think that forcing DP code to be aware that callbacks are present > > > > or not and to modify its behaviour depending on that nearly voids > > > > the purpose of having callbacks at all. > > > > In that case DP can just invoke callback function directly from it's > > codepath . > > > > > > > > > > > > > > > > > > > > > > > > > Note that now data-plane thread would have to do that always > > > > > > > > - even if there are now callbacks installed for that > > > > > > > > cryptodev queue > > > > right now. > > > > > > > > All that changes behaviour of existing apps and I presume > > > > > > > > would reduce adoption of that fature. > > > > > If I understand this correct, you are talking about a case where > > > > > in the application might be registering/unregistering multiple > > > > > times during its lifetime. In this case, yes, the application > > > > > might be reporting the > > > > quiescent state even when it has not registered the call backs. But, > > > > it has the flexibility to not report it if it implements additional > > > > logic. > > > > > Note that we are assuming that the application has to report > > > > > quiescent state only for using callback functions. Most probably > > > > > the application has > > > > other requirements to use RCU. > > > > > Why not support what is done for ethdev call back functions along > > > > > with > > > > providing RCU method? > > > > > > > > > > > > There is always trade off involved! > > > > > > > In the previous patch, you suggested that some lazy app may > > > > > > > not free up the memory allocated by add cb. For such apps, > > > > > > > this patch has sync mechanism with some additional cost of CP & DP > > changes. > > > > > > > > > > > > Sigh, it is not about laziness of the app. > > > > > > The problem with current ethedev cb mechanism and yours V1 > > > > > > (which was just a clone of it) - CP doesn't know when it is safe > > > > > > after CB removal to free related memory. > > > > > > > > > > > > > > I still think all this callback mechanism should be totally > > > > > > > > opaque to data-plane threads - user shouldn't change his app > > > > > > > > code depending on would some enqueue/dequeue callbacks be > > > > installed or not. > > > > > > > I am not sure, how that can be implemented with existing RCU > > design. > > > > > > > > > > > > As I said below the simplest way - with calling rcu > > > > > > onine/offline inside CB invocation block. > > > > > > That's why I asked you - did you try that approach and what is > > > > > > the perf numbers? > > > > > > I presume with no callbacks installed the perf change should be > > > > > > nearly > > > > zero. > > > > > > > > > > > > > @Honnappa Nagarahalli, Do you have any suggestions? > > > > > Reporting quiescent state in the call back functions has several > > > > disadvantages: > > > > > 1) it will have performance impacts and the impacts will increase > > > > > as the > > > > number of data plane threads increase. > > > > > 2) It will require additional configuration parameters to control > > > > > how often the quiescent state is reported to control the performance > > impact. > > > > > 3) Does not take advantage of the fact that most probably the > > > > > application is using RCU already > > > > > 4) There are few difficulties as well, please see below. > > > > > > > > I suggested Abhinandan to use RCU library because it is already > > > > there, and I thought it would be good not to re-implement the wheel. > > > > Though if you feel librte_rcu doesn't match that task - fine, let's > > > > do it without librte_rcu. > > > > After all, what we need here - just an atomic ref count per queue > > > > that we are going to increment at entering and leaving list of > > > > callbacks inside enqueue/dequeue. > > > Ok, looks like I missed the point that a queue is used by a single data > > > plane > > thread. > > > Along with ref count increment you need the memory orderings to avoid > > race conditions. These will be the same ones used in RCU. > > > On the control plane, you need to read this counter and poll for the > > counter updates. All this is same cost as in RCU. > > > > Agree. > > > > > To control the cost, you > > > will have to control the rate of quiescent state reporting and might have > > > to > > expose this as a configuration parameter. > > > > > > The other important information you have to consider is if the thread > > > is making any blocking calls, which may be in some other library. The > > > thread is supposed to call rcu_qsbr_thread_offline API before calling a > > blocking call. This allows the RCU to know that this particular thread will > > not > > report quiescent state. The cryptodev library might not have that > > information. > > > > > > If you want to go ahead with this design, you can still use RCU with > > > single thread configuration (like you have mentioned below) and hide the > > details from the application. > > > > Yes, same thought here - use rcu_qsbr online/offline for DP part and hide > > actual sync details inside callback code. > We can give it a try. If we can have the performance numbers, we can decide > about how to control how often these APIs are called on the > data plane.
To avoid misunderstanding: I am talking about calling online/offline with every cryptodev_enqueue() traversal over CB list. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > That seems quite a big change and I don't think it is > > > > > > > > > > acceptable for most users. > > > > > > > > > > From my perspective adding/installing call-backs to the > > > > > > > > > > dev has to be opaque to the data-plane code. > > > > > > > > > > Also note that different callbacks can be installed by > > > > > > > > > > different entities (libs) and might have no idea about each > > other. > > > > > > > > > > That's why I thought it would be better to make all this > > > > > > > > > > RCU stuff internal inside cryptodev: > > > > > > > > > > hide all this rcu_qsbr allocation/setup inside > > > > > > > > > > cryptod somehow to > > > > > > > > obtain pointer to that rcu_qsbr ev init/queue setup > > > > > > > > > > invoke rcu_qsbr_online()/rcu_qsbr_offline() inside > > > > > > > > cryptodev_enqueue(). > > > > > This will bring in the application related information such as the > > > > > thread ID > > > > into the library. > > > > > > > > I don't think it would. > > > > Cryptodev enqueue/dequeue functions are not supposed to be thread > > > > safe (same as rx/tx burst). > > > > So we can always use RCU with just one thread(thread_id = 0). > > > Agree, the memory that needs to be freed is accessed by a single thread > > on the data plane. RCU with one thread would suffice. > > > > > > > But as I said above - if you feel RCU lib is an overhead here, > > > > that's fine - I think it would be easy enough to do without librte_rcu. > > > > > > > > > If the same API calls are being made from multiple data plane > > > > > threads, you need a way to configure that information to the > > > > > library. So, it is better to leave those details for the application > > > > > to handle. > > > > > > > > > > > > > > I have already tried exploring above stuffs. There are too > > > > > > > > > many > > > > > > constraints. > > > > > > > > > The changes don't fit in, as per RCU design. > > > > > > > > > > > > > > > > Hmm could you be more specific here - what constraints are > > > > > > > > you referring to? > > > > > > > > > > > > > > > > > Moreover, having rcu api under enqueue_burst() will affect > > > > > > > > > the > > > > > > > > performance. > > > > > > > > > > > > > > > > It most likely will. Though my expectation it will affect > > > > > > > > performance only when some callbacks are installed. My > > > > > > > > thought > > > > here: > > > > > > > > callback function by itself will affect cryptdev_enqueue > > > > > > > > performance anyway, > > > > > > > With existing callback design, I have measured the > > > > > > > performance(with > > > > > > crypto perf test) on xeon. > > > > > > > It was almost negligible and same was shared with Declan. > > > > > > > > > > > > I am asking about different thing: did you try alternate > > > > > > approach I described, that wouldn't require changes in the user > > > > > > data- > > plane code. > > > > > > > > > > > > > That is one of the reasons, I didn't want to add to many > > > > > > > stuffs in to the > > > > > > callback. > > > > > > > The best part of existing design is crypto lib is not much > > > > > > > modified. > > > > > > > The changes are either pushed to CP or DP. > > > > > > > > > > > > > > so adding extra overhead for sync is probably ok here. > > > > > > > > > > > > I think that extra overhead when callbacks are present is > > > > > > expected and probably acceptable. > > > > > > Changes in the upper-layer data-plane code - probably not. > > > > > > > > > > > > > > Though for situation when no callbacks are installed - > > > > > > > > perfomance should be left unaffected (or impact should be as > > > > > > > > small > > > > as possible). > > > > > > > > > > > > > > > > > The changes are more on control plane side, which is one time. > > > > > > > > > The data plane changes are minimal. > > > > > > > > > > > > > > > > I still think upper layer data-plane code should stay > > > > > > > > unaffected (zero changes). > > > > > > > > > > > > > > > > > > > > > > > > <snip>