> > > > > >
> > > > > > @@ -69,5 +89,13 @@ kni_fifo_get(struct rte_kni_fifo *fifo,
> > > > > > void **data, unsigned num)  static inline uint32_t
> > > > > > kni_fifo_count(struct rte_kni_fifo *fifo)  {
> > > > > > +#ifdef RTE_USE_C11_MEM_MODEL
> > > > > > +       unsigned fifo_write = __atomic_load_n(&fifo->write,
> > > > > > +                                                 __ATOMIC_ACQUIRE);
> > > > > > +       unsigned fifo_read = __atomic_load_n(&fifo->read,
> > > > > > +
> > > > > > +__ATOMIC_ACQUIRE);
> > > > >
> > > > > Isn't too  heavy to have two __ATOMIC_ACQUIREs? a simple
> > > > > rte_smp_rmb() would be enough here. Right?
> > > > > or
> > > > > Do we need __ATOMIC_ACQUIRE for fifo_write case?
> > > > >
> > > > We also had some amount of debate internally on this:
> > > > 1) We do not want to use rte_smp_rmb() as we want to keep the
> > > > memory
> > > models separated (for ex: while using C11, use C11 everywhere). It
> > > is also not sufficient, please see 3) below.
> > >
> > > But Nothing technically wrong in using rte_smp_rmb() here in terms
> > > functionally and code generated by the compiler.
> >
> > rte_smp_rmb() generates 'DMB ISHLD'. This works fine, but it is not optimal.
> 'LDAR' is a better option which is generated when C11 atomics are used.
> 
> Yes. But which one is optimal 1 x DMB ISHLD vs 2 x LDAR ?

Good point. I am not sure which one is optimal, it needs to be measured. 'DMB 
ISHLD' orders 'all' earlier loads against 'all' later loads and stores. 'LDAR' 
orders the 'specific' load with 'all' later loads and stores.

> 
> >
> > >
> > > > 2) This API can get called from writer or reader, so both the
> > > > loads have to be __ATOMIC_ACQUIRE
> > > > 3) Other option is to use __ATOMIC_RELAXED. That would allow any
> > > loads/stores around of this API to get reordered, especially since
> > > this is an inline function. This would put burden on the application
> > > to manage the ordering depending on its usage. It will also require
> > > the application to understand the implementation of this API.
> > >
> > > __ATOMIC_RELAXED may be fine too for _count() case as it may not
> > > very important to get the exact count for the exact very moment,
> > > Application can retry.
> > >
> > > I am in favor of performance effective implementation.
> >
> > The requirement on the correctness of the count depends on the usage of
> this function. I see the following usage:
> >
> > In the file kni_net.c, function: kni_net_tx:
> >
> >        if (kni_fifo_free_count(kni->tx_q) == 0 ||
> >                         kni_fifo_count(kni->alloc_q) == 0) {
> >                 /**
> >                  * If no free entry in tx_q or no entry in alloc_q,
> >                  * drops skb and goes out.
> >                  */
> >                 goto drop;
> >         }
> >
> > There is no retry here, the packet is dropped.
> 
> OK. Then pick an implementation which is an optimal this case.
> I think, then rte_smp_rmb() makes sense here as
> a) no #ifdef clutter
> b) it is optimal compared to 2 x LDAR
> 
As I understand, one of the principals of using C11 model is to match the store 
releases and load acquires. IMO, combining C11 memory model with barrier based 
functions makes the code unreadable.
I realized rte_smp_rmb() is required for x86 as well to prevent compiler 
reordering. We can add that in the non-C11 case. This way, we will have clean 
code for both the options (similar to rte_ring).
So, if 'RTE_USE_C11_MEM_MODEL' is set to 'n', then the 'rte_smp_rmb' would be 
used.

We can look at handling the #ifdef clutter based on Ferruh's feedback.

> 
> >
> > >
> > > >
> > > > >
> > > > > Other than that, I prefer to avoid ifdef clutter by introducing
> > > > > two separate file just like ring C11 implementation.
> > > > >
> > > > > I don't have strong opinion on this this part, I let KNI
> > > > > MAINTAINER to decide on how to accommodate this change.
> > > >
> > > > I prefer to change this as well, I am open for suggestions.
> > > > Introducing two separate files would be too much for this library.
> > > > A better
> > > way would be to have something similar to 'smp_store_release'
> > > provided by the kernel. i.e. create #defines for loads/stores. Hide
> > > the clutter behind the #defines.
> > >
> > > No Strong opinion on this, leaving to KNI Maintainer.
> > Will wait on this before re-spinning the patch
> >
> > >
> > > This patch needs to split by two,
> > > a) Fixes for non C11 implementation(i.e new addition to
> > > rte_smp_wmb())
> > > b) add support for C11 implementation.
> > Agree
> >
> > >
> > > >
> > > > >
> > > > > > +       return (fifo->len + fifo_write - fifo_read) &
> > > > > > +(fifo->len - 1); #else
> > > > > >         return (fifo->len + fifo->write - fifo->read) &
> > > > > > (fifo->len
> > > > > > - 1);
Requires rte_smp_rmb() for x86 to prevent compiler reordering.

> > > > > > +#endif
> > > > > >  }
> > > > > > --
> > > > > > 2.7.4
> > > > > >

Reply via email to