(Now using macOS Mail program in plain text mode, hope this works)

> On 2 Apr 2022, at 21:31, Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> 
> wrote:
> 
> <snip>
> 
>>> +__rte_experimental
>>> +static inline bool
>>> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t
>>> +*begin_sn) {
>>> +   uint32_t end_sn;
>>> +
>>> +   /* make sure the data loads happens before the sn load */
>>> +   rte_atomic_thread_fence(__ATOMIC_ACQUIRE);
>>> +
>>> +   end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED);
>> 
>> Since we are reading and potentially returning the sequence number here
>> (repeating the read of the protected data), we need to use load-acquire.
>> I assume it is not expected that the user will call
>> rte_seqlock_read_lock() again.
> Good point, we need a load-acquire (due to changes done in v3).
> 
>> 
>> Seeing this implementation, I might actually prefer the original
>> implementation, I think it is cleaner. But I would like for the begin 
>> function
>> also to wait for an even sequence number, the end function would only have
>> to check for same sequence number, this might improve performance a little
>> bit as readers won't perform one or several broken reads while a write is in
>> progress. The function names are a different thing though.
> I think we need to be optimizing for the case where there is no contention 
> between readers and writers (as that happens most of the time). From this 
> perspective, not checking for an even seq number in the begin function would 
> reduce one 'if' statement.
The number of statements in C is not relevant, instead we need to look at the 
generated code. On x86, I would assume an if-statement like “if ((sn & 1) || 
(sn == sl->sn))” to generate two separate evaluations with their own 
conditional jump instructions. On AArch64, the two evaluations could probably 
be combined using a CCMP instruction and need only one conditional branch 
instruction. With branch prediction, it is doubtful we will see any difference 
in performance.

> 
> Going back to the earlier model is better as well, because of the 
> load-acquire required in the 'rte_seqlock_read_tryunlock' function. The 
> earlier model would not introduce the load-acquire for the no contention case.
The earlier model still had load-acquire in the read_begin function which would 
have to invoked again. There is no difference in the number or type of memory 
accesses. We just need to copy the implementation of read_begin into the 
read_tryunlock function if we decide that the user should not have to re-invoke 
read_begin on a failed read_tryunlock.

> 
>> 
>> The writer side behaves much more like a lock with mutual exclusion so
>> write_lock/write_unlock makes sense.
>> 
>>> +
>>> +   if (unlikely(end_sn & 1 || *begin_sn != end_sn)) {
>>> +           *begin_sn = end_sn;
>>> +           return false;
>>> +   }
>>> +
>>> +   return true;
>>> +}
>>> +

Reply via email to