On 9/21/2018 7:37 AM, Honnappa Nagarahalli wrote:
>>>>>>>
>>>>>>> @@ -69,5 +89,13 @@ kni_fifo_get(struct rte_kni_fifo *fifo,
>>>>>>> void **data, unsigned num)  static inline uint32_t
>>>>>>> kni_fifo_count(struct rte_kni_fifo *fifo)  {
>>>>>>> +#ifdef RTE_USE_C11_MEM_MODEL
>>>>>>> +       unsigned fifo_write = __atomic_load_n(&fifo->write,
>>>>>>> +                                                 __ATOMIC_ACQUIRE);
>>>>>>> +       unsigned fifo_read = __atomic_load_n(&fifo->read,
>>>>>>> +
>>>>>>> +__ATOMIC_ACQUIRE);
>>>>>>
>>>>>> Isn't too  heavy to have two __ATOMIC_ACQUIREs? a simple
>>>>>> rte_smp_rmb() would be enough here. Right?
>>>>>> or
>>>>>> Do we need __ATOMIC_ACQUIRE for fifo_write case?
>>>>>>
>>>>> We also had some amount of debate internally on this:
>>>>> 1) We do not want to use rte_smp_rmb() as we want to keep the
>>>>> memory
>>>> models separated (for ex: while using C11, use C11 everywhere). It
>>>> is also not sufficient, please see 3) below.
>>>>
>>>> But Nothing technically wrong in using rte_smp_rmb() here in terms
>>>> functionally and code generated by the compiler.
>>>
>>> rte_smp_rmb() generates 'DMB ISHLD'. This works fine, but it is not optimal.
>> 'LDAR' is a better option which is generated when C11 atomics are used.
>>
>> Yes. But which one is optimal 1 x DMB ISHLD vs 2 x LDAR ?
> 
> Good point. I am not sure which one is optimal, it needs to be measured. 'DMB 
> ISHLD' orders 'all' earlier loads against 'all' later loads and stores. 
> 'LDAR' orders the 'specific' load with 'all' later loads and stores.
> 
>>
>>>
>>>>
>>>>> 2) This API can get called from writer or reader, so both the
>>>>> loads have to be __ATOMIC_ACQUIRE
>>>>> 3) Other option is to use __ATOMIC_RELAXED. That would allow any
>>>> loads/stores around of this API to get reordered, especially since
>>>> this is an inline function. This would put burden on the application
>>>> to manage the ordering depending on its usage. It will also require
>>>> the application to understand the implementation of this API.
>>>>
>>>> __ATOMIC_RELAXED may be fine too for _count() case as it may not
>>>> very important to get the exact count for the exact very moment,
>>>> Application can retry.
>>>>
>>>> I am in favor of performance effective implementation.
>>>
>>> The requirement on the correctness of the count depends on the usage of
>> this function. I see the following usage:
>>>
>>> In the file kni_net.c, function: kni_net_tx:
>>>
>>>        if (kni_fifo_free_count(kni->tx_q) == 0 ||
>>>                         kni_fifo_count(kni->alloc_q) == 0) {
>>>                 /**
>>>                  * If no free entry in tx_q or no entry in alloc_q,
>>>                  * drops skb and goes out.
>>>                  */
>>>                 goto drop;
>>>         }
>>>
>>> There is no retry here, the packet is dropped.
>>
>> OK. Then pick an implementation which is an optimal this case.
>> I think, then rte_smp_rmb() makes sense here as
>> a) no #ifdef clutter
>> b) it is optimal compared to 2 x LDAR
>>
> As I understand, one of the principals of using C11 model is to match the 
> store releases and load acquires. IMO, combining C11 memory model with 
> barrier based functions makes the code unreadable.
> I realized rte_smp_rmb() is required for x86 as well to prevent compiler 
> reordering. We can add that in the non-C11 case. This way, we will have clean 
> code for both the options (similar to rte_ring).
> So, if 'RTE_USE_C11_MEM_MODEL' is set to 'n', then the 'rte_smp_rmb' would be 
> used.
> 
> We can look at handling the #ifdef clutter based on Ferruh's feedback.

Hi Honnappa, Jerin,

Sorry for delay, I missed that this is waiting my input.

+1 to remove #ifdef, but I don't think a separate file is required for this,
specially when it will be duplication of same implementation, nothing arch
specific implementation.
+1 Honnappa's suggestion to hide ifdef's behind APIs, plus those APIs can be
reused later...

And +1 to split into two patches, one for fix to current code and one for c11
atomic implementation support.

I have some basic questions on the patch, will send in different thread.

Thanks,
ferruh

> 
>>
>>>
>>>>
>>>>>
>>>>>>
>>>>>> Other than that, I prefer to avoid ifdef clutter by introducing
>>>>>> two separate file just like ring C11 implementation.
>>>>>>
>>>>>> I don't have strong opinion on this this part, I let KNI
>>>>>> MAINTAINER to decide on how to accommodate this change.
>>>>>
>>>>> I prefer to change this as well, I am open for suggestions.
>>>>> Introducing two separate files would be too much for this library.
>>>>> A better
>>>> way would be to have something similar to 'smp_store_release'
>>>> provided by the kernel. i.e. create #defines for loads/stores. Hide
>>>> the clutter behind the #defines.
>>>>
>>>> No Strong opinion on this, leaving to KNI Maintainer.
>>> Will wait on this before re-spinning the patch
>>>
>>>>
>>>> This patch needs to split by two,
>>>> a) Fixes for non C11 implementation(i.e new addition to
>>>> rte_smp_wmb())
>>>> b) add support for C11 implementation.
>>> Agree
>>>
>>>>
>>>>>
>>>>>>
>>>>>>> +       return (fifo->len + fifo_write - fifo_read) &
>>>>>>> +(fifo->len - 1); #else
>>>>>>>         return (fifo->len + fifo->write - fifo->read) &
>>>>>>> (fifo->len
>>>>>>> - 1);
> Requires rte_smp_rmb() for x86 to prevent compiler reordering.
> 
>>>>>>> +#endif
>>>>>>>  }
>>>>>>> --
>>>>>>> 2.7.4
>>>>>>>

Reply via email to