Thanks for your comments. I'll update it in the next version.
> -----Original Message----- > From: Ferruh Yigit <ferruh.yi...@intel.com> > Sent: Wednesday, September 26, 2018 7:43 PM > To: Honnappa Nagarahalli <honnappa.nagaraha...@arm.com>; Jerin Jacob > <jerin.ja...@caviumnetworks.com> > Cc: Phil Yang (Arm Technology China) <phil.y...@arm.com>; dev@dpdk.org; nd > <n...@arm.com>; kkokkilaga...@caviumnetworks.com; Gavin Hu (Arm > Technology China) <gavin...@arm.com> > Subject: Re: [PATCH v2 2/3] kni: fix kni fifo synchronization > > On 9/21/2018 7:37 AM, Honnappa Nagarahalli wrote: > >>>>>>> > >>>>>>> @@ -69,5 +89,13 @@ kni_fifo_get(struct rte_kni_fifo *fifo, void > >>>>>>> **data, unsigned num) static inline uint32_t > >>>>>>> kni_fifo_count(struct rte_kni_fifo *fifo) { > >>>>>>> +#ifdef RTE_USE_C11_MEM_MODEL > >>>>>>> + unsigned fifo_write = __atomic_load_n(&fifo->write, > >>>>>>> + __ATOMIC_ACQUIRE); > >>>>>>> + unsigned fifo_read = __atomic_load_n(&fifo->read, > >>>>>>> + > >>>>>>> +__ATOMIC_ACQUIRE); > >>>>>> > >>>>>> Isn't too heavy to have two __ATOMIC_ACQUIREs? a simple > >>>>>> rte_smp_rmb() would be enough here. Right? > >>>>>> or > >>>>>> Do we need __ATOMIC_ACQUIRE for fifo_write case? > >>>>>> > >>>>> We also had some amount of debate internally on this: > >>>>> 1) We do not want to use rte_smp_rmb() as we want to keep the > >>>>> memory > >>>> models separated (for ex: while using C11, use C11 everywhere). It > >>>> is also not sufficient, please see 3) below. > >>>> > >>>> But Nothing technically wrong in using rte_smp_rmb() here in terms > >>>> functionally and code generated by the compiler. > >>> > >>> rte_smp_rmb() generates 'DMB ISHLD'. This works fine, but it is not > >>> optimal. > >> 'LDAR' is a better option which is generated when C11 atomics are used. > >> > >> Yes. But which one is optimal 1 x DMB ISHLD vs 2 x LDAR ? > > > > Good point. I am not sure which one is optimal, it needs to be measured. > > 'DMB > ISHLD' orders 'all' earlier loads against 'all' later loads and stores. > 'LDAR' orders > the 'specific' load with 'all' later loads and stores. > > > >> > >>> > >>>> > >>>>> 2) This API can get called from writer or reader, so both the > >>>>> loads have to be __ATOMIC_ACQUIRE > >>>>> 3) Other option is to use __ATOMIC_RELAXED. That would allow any > >>>> loads/stores around of this API to get reordered, especially since > >>>> this is an inline function. This would put burden on the > >>>> application to manage the ordering depending on its usage. It will > >>>> also require the application to understand the implementation of this > >>>> API. > >>>> > >>>> __ATOMIC_RELAXED may be fine too for _count() case as it may not > >>>> very important to get the exact count for the exact very moment, > >>>> Application can retry. > >>>> > >>>> I am in favor of performance effective implementation. > >>> > >>> The requirement on the correctness of the count depends on the usage > >>> of > >> this function. I see the following usage: > >>> > >>> In the file kni_net.c, function: kni_net_tx: > >>> > >>> if (kni_fifo_free_count(kni->tx_q) == 0 || > >>> kni_fifo_count(kni->alloc_q) == 0) { > >>> /** > >>> * If no free entry in tx_q or no entry in alloc_q, > >>> * drops skb and goes out. > >>> */ > >>> goto drop; > >>> } > >>> > >>> There is no retry here, the packet is dropped. > >> > >> OK. Then pick an implementation which is an optimal this case. > >> I think, then rte_smp_rmb() makes sense here as > >> a) no #ifdef clutter > >> b) it is optimal compared to 2 x LDAR > >> > > As I understand, one of the principals of using C11 model is to match the > > store > releases and load acquires. IMO, combining C11 memory model with barrier > based functions makes the code unreadable. > > I realized rte_smp_rmb() is required for x86 as well to prevent compiler > reordering. We can add that in the non-C11 case. This way, we will have clean > code for both the options (similar to rte_ring). > > So, if 'RTE_USE_C11_MEM_MODEL' is set to 'n', then the 'rte_smp_rmb' would > be used. > > > > We can look at handling the #ifdef clutter based on Ferruh's feedback. > > Hi Honnappa, Jerin, > > Sorry for delay, I missed that this is waiting my input. > > +1 to remove #ifdef, but I don't think a separate file is required for > +this, > specially when it will be duplication of same implementation, nothing arch > specific implementation. > +1 Honnappa's suggestion to hide ifdef's behind APIs, plus those APIs > +can be > reused later... > > And +1 to split into two patches, one for fix to current code and one for c11 > atomic implementation support. > > I have some basic questions on the patch, will send in different thread. > > Thanks, > ferruh > > > > >> > >>> > >>>> > >>>>> > >>>>>> > >>>>>> Other than that, I prefer to avoid ifdef clutter by introducing > >>>>>> two separate file just like ring C11 implementation. > >>>>>> > >>>>>> I don't have strong opinion on this this part, I let KNI > >>>>>> MAINTAINER to decide on how to accommodate this change. > >>>>> > >>>>> I prefer to change this as well, I am open for suggestions. > >>>>> Introducing two separate files would be too much for this library. > >>>>> A better > >>>> way would be to have something similar to 'smp_store_release' > >>>> provided by the kernel. i.e. create #defines for loads/stores. Hide > >>>> the clutter behind the #defines. > >>>> > >>>> No Strong opinion on this, leaving to KNI Maintainer. > >>> Will wait on this before re-spinning the patch > >>> > >>>> > >>>> This patch needs to split by two, > >>>> a) Fixes for non C11 implementation(i.e new addition to > >>>> rte_smp_wmb()) > >>>> b) add support for C11 implementation. > >>> Agree > >>> > >>>> > >>>>> > >>>>>> > >>>>>>> + return (fifo->len + fifo_write - fifo_read) & (fifo->len > >>>>>>> +- 1); #else > >>>>>>> return (fifo->len + fifo->write - fifo->read) & > >>>>>>> (fifo->len > >>>>>>> - 1); > > Requires rte_smp_rmb() for x86 to prevent compiler reordering. > > > >>>>>>> +#endif > >>>>>>> } > >>>>>>> -- > >>>>>>> 2.7.4 > >>>>>>> Thanks Phil