<snip>

> > > > > > > >
> > > > > > > > +#ifdef RTE_CRYPTODEV_CALLBACKS int
> > > > > > > > +rte_cryptodev_rcu_qsbr_add(uint8_t dev_id, struct
> > > > > > > > +rte_rcu_qsbr
> > > > > > > > +*qsbr) {
> > > > > > > > +
> > > > > > > > +       struct rte_cryptodev *dev;
> > > > > > > > +
> > > > > > > > +       if (!rte_cryptodev_pmd_is_valid_dev(dev_id)) {
> > > > > > > > +               CDEV_LOG_ERR("Invalid dev_id=%" PRIu8, dev_id);
> > > > > > > > +               return -EINVAL;
> > > > > > > > +       }
> > > > > > > > +
> > > > > > > > +       dev = &rte_crypto_devices[dev_id];
> > > > > > > > +       dev->qsbr = qsbr;
> > > > > > > > +       return 0;
> > > > > > > > +}
> > > > > > >
> > > > > > > So if I understand your patch correctly you propose a new
> > > > > > > working model for
> > > > > > > crypto-devs:
> > > > > > > 1. Control-plane has to allocate/setup rcu_qsbr and do
> > > > > > > rte_cryptodev_rcu_qsbr_add().
> > > > > > > 2. Data-plane has somehow to obtain pointer to that rcu_qsbr
> > > > > > > and wrap
> > > > > > > cryptodev_enqueue()
> > > > > > >    with rcu_qsbr_quiescent()  or
> rcu_qsbr_online()/rcu_qsbr_offline().
> > > > > > Yes. I think, it is not a new model. It is same as RCU
> > > > > > integration with
> > > LPM.
> > > > > > Please refer: https://patches.dpdk.org/cover/73673/
> > > > >
> > > > > I am talking about new working model for crypto-dev
> enqueue/dequeue.
> > > > > As I said above now it becomes data-plane thread responsibility to:
> > > > >  -somehow to obtain pointer to that rcu_qsbr for each cryptodev
> > > > > it is
> > > using.
> > > > >  -call rcu sync functions (quiescent/online/offline) on a regular 
> > > > > basis.
> > > > It is not on regular basis. When data plane comes up, they report 
> > > > online.
> > > > They report quiescent when they are done with critical section or
> > > > shared
> > > structure.
> > >
> > > I understand that, but it means all existing apps have to be changed that
> way.
> > >
> > > > All though, there is some dataplane changes involved here, I don't
> > > > think, it
> > > is major.
> > >
> > > I still think our goal here should be to make no visible changes to
> > > the dataplane.
> > > I.E. all necessary data-plane changes need to be hidden inside CB
> > > invocation part.
> > Please note that this is being implemented using the memory
> > reclamation framework documented at
> > https://doc.dpdk.org/guides/prog_guide/rcu_lib.html#resource-reclamati
> > on-framework-for-dpdk
> >
> > While using RCU there are couple of trade-offs that applications have to
> consider:
> > 1) Performance - reporting the quiescent state too often results in
> > performance impact on data plane
> > 2) Amount of outstanding memory to reclaim - reporting less often
> > results in more outstanding memory to reclaim
> >
> > Hence, the quiescent state reporting is left to the application. The
> > application decides how often it reports the quiescent state and has control
> over the data plane performance and the outstanding memory to reclaim.
> >
> > When you say "new working model for crypto-dev enqueue/dequeue",
> >
> > 1) are you comparing these with existing crypto-dev enqueue/dequeue
> APIs? If yes, these are new APIs, it is not breaking anything.
> > 2) are you comparing these with existing call back functions in ethdev
> > enqueue/dequeue APIs? If yes, agree that this is a new model. But, it is
> possible to support what ethdev supports along with the RCU method used
> in this patch.
> 
> What I am talking about:
> 
> Existing cryptodev enqueue/dequeue model doesn't require for the user to
> manage any RCU QSBR state manually.
> I believe that addition of ability to add/remove enqueue/dequeue callbacks
> shouldn't change existing working model.
> I think that adding/removing such callbacks has to be opaque to the user DP
> code and shouldn't require user to change it. Same as we have now for
> ethdev callback implementation.
The ethdev callback implementation conveniently leaves the problem of freeing 
memory to the user to resolve, it does not handle the issue. Hence, it "looks" 
to be opaque to the DP code. However, if the application has to implement a 
safe way to free the call back memory, its DP is affected based on call backs 
are being used or not.

> I think that forcing DP code to be aware that callbacks are present or not and
> to modify its behaviour depending on that nearly voids the purpose of having
> callbacks at all.
> In that case DP can just invoke callback function directly from it's codepath 
> .
> 
> > >
> > > >
> > > > > Note that now data-plane thread would have to do that always -
> > > > > even if there are now callbacks installed for that cryptodev queue
> right now.
> > > > > All that changes behaviour of existing apps and I presume would
> > > > > reduce adoption of  that fature.
> > If I understand this correct, you are talking about a case where in
> > the application might be registering/unregistering multiple times
> > during its lifetime. In this case, yes, the application might be reporting 
> > the
> quiescent state even when it has not registered the call backs. But, it has 
> the
> flexibility to not report it if it implements additional logic.
> > Note that we are assuming that the application has to report quiescent
> > state only for using callback functions. Most probably the application has
> other requirements to use RCU.
> > Why not support what is done for ethdev call back functions along with
> providing RCU method?
> >
> > > > There is always trade off involved!
> > > > In the previous patch, you suggested that some lazy app may not
> > > > free up the memory allocated by add cb. For such apps, this patch
> > > > has sync mechanism with some additional cost of CP & DP changes.
> > >
> > > Sigh, it is not about laziness of the app.
> > > The problem with current ethedev cb mechanism and yours V1 (which
> > > was just a clone of it) - CP doesn't know when it is safe after CB
> > > removal to free related memory.
> > >
> > > > > I still think all this callback mechanism should be totally
> > > > > opaque to data-plane threads - user shouldn't change his app
> > > > > code depending on would some enqueue/dequeue callbacks be
> installed or not.
> > > > I am not sure, how that can be implemented with existing RCU design.
> > >
> > > As I said below the simplest way - with calling rcu onine/offline
> > > inside CB invocation block.
> > > That's why I asked you - did you try that approach and what is the
> > > perf numbers?
> > > I presume with no callbacks installed the perf change should be nearly
> zero.
> > >
> > > > @Honnappa Nagarahalli, Do you have any suggestions?
> > Reporting quiescent state in the call back functions has several
> disadvantages:
> > 1) it will have performance impacts and the impacts will increase as the
> number of data plane threads increase.
> > 2) It will require additional configuration parameters to control how
> > often the quiescent state is reported to control the performance impact.
> > 3) Does not take advantage of the fact that most probably the
> > application is using RCU already
> > 4) There are few difficulties as well, please see below.
> 
> I suggested Abhinandan to use RCU library because it is already there, and I
> thought it would be good not to re-implement the wheel.
> Though if you feel librte_rcu doesn't match that task - fine, let's do it 
> without
> librte_rcu.
> After all, what we need here - just an atomic ref count per queue that we are
> going to increment at entering and leaving list of callbacks inside
> enqueue/dequeue.
Ok, looks like I missed the point that a queue is used by a single data plane 
thread.
Along with ref count increment you need the memory orderings to avoid race 
conditions. These will be the same ones used in RCU.
On the control plane, you need to read this counter and poll for the counter 
updates. All this is same cost as in RCU. To control the cost, you will have to 
control the rate of quiescent state reporting and might have to expose this as 
a configuration parameter.

The other important information you have to consider is if the thread is making 
any blocking calls, which may be in some other library. The thread is supposed 
to call rcu_qsbr_thread_offline API before calling a blocking call. This allows 
the RCU to know that this particular thread will not report quiescent state. 
The cryptodev library might not have that information.

If you want to go ahead with this design, you can still use RCU with single 
thread configuration (like you have mentioned below) and hide the details from 
the application.

> 
> >
> > > >
> > > > >
> > > > > >
> > > > > > >
> > > > > > > That seems quite a big change and I don't think it is
> > > > > > > acceptable for most users.
> > > > > > > From my perspective adding/installing call-backs to the dev
> > > > > > > has to be opaque to the data-plane code.
> > > > > > > Also note that different callbacks can be installed by
> > > > > > > different entities (libs) and might have no idea about each other.
> > > > > > > That's why I thought it would be better to make all this RCU
> > > > > > > stuff internal inside cryptodev:
> > > > > > >     hide all this rcu_qsbr allocation/setup inside cryptod
> > > > > > > somehow to
> > > > > obtain pointer to that rcu_qsbr ev init/queue setup
> > > > > > >     invoke rcu_qsbr_online()/rcu_qsbr_offline() inside
> > > > > cryptodev_enqueue().
> > This will bring in the application related information such as the thread ID
> into the library.
> 
> I don't think it would.
> Cryptodev enqueue/dequeue functions are not supposed to be thread safe
> (same as rx/tx burst).
> So we can always use RCU with just one thread(thread_id = 0).
Agree, the memory that needs to be freed is accessed by a single thread on the 
data plane. RCU with one thread would suffice.

> But as I said above - if you feel RCU lib is an overhead here, that's fine - I
> think it would be easy enough to do without librte_rcu.
> 
> > If the same API calls are being made from multiple data plane threads,
> > you need a way to configure that information to the library. So, it is
> > better to leave those details for the application to handle.
> >
> > > > > > I have already tried exploring above stuffs. There are too
> > > > > > many
> > > constraints.
> > > > > > The changes don't fit in, as per RCU design.
> > > > >
> > > > > Hmm could you be more specific here - what constraints are you
> > > > > referring to?
> > > > >
> > > > > > Moreover, having rcu api under enqueue_burst() will affect the
> > > > > performance.
> > > > >
> > > > > It most likely will. Though my expectation it will affect
> > > > > performance only when some callbacks are installed. My thought
> here:
> > > > > callback function by itself will affect cryptdev_enqueue
> > > > > performance anyway,
> > > > With existing callback design, I have measured the
> > > > performance(with
> > > crypto perf test) on xeon.
> > > > It was almost negligible and same was shared with Declan.
> > >
> > > I am asking about different thing: did you try alternate approach I
> > > described, that wouldn't require changes in the user data-plane code.
> > >
> > > > That is one of the reasons, I didn't want to add to many stuffs in
> > > > to the
> > > callback.
> > > > The best part of existing design is crypto lib is not much modified.
> > > > The changes are either pushed to CP or DP.
> > > >
> > > > so adding extra overhead for sync is probably ok here.
> > >
> > > I think that extra overhead when callbacks are present is expected
> > > and probably acceptable.
> > > Changes in the upper-layer data-plane code - probably not.
> > >
> > > > > Though for situation when no callbacks are installed -
> > > > > perfomance should be left unaffected (or impact should be as small
> as possible).
> > > > >
> > > > > > The changes are more on control plane side, which is one time.
> > > > > > The data plane changes are minimal.
> > > > >
> > > > > I still think upper layer data-plane code should stay unaffected
> > > > > (zero changes).
> > > > >
> > > > > > >

<snip>

Reply via email to