(Now using macOS Mail program in plain text mode, hope this works) > On 2 Apr 2022, at 21:31, Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > wrote: > > <snip> > >>> +__rte_experimental >>> +static inline bool >>> +rte_seqlock_read_tryunlock(const rte_seqlock_t *seqlock, uint32_t >>> +*begin_sn) { >>> + uint32_t end_sn; >>> + >>> + /* make sure the data loads happens before the sn load */ >>> + rte_atomic_thread_fence(__ATOMIC_ACQUIRE); >>> + >>> + end_sn = __atomic_load_n(&seqlock->sn, __ATOMIC_RELAXED); >> >> Since we are reading and potentially returning the sequence number here >> (repeating the read of the protected data), we need to use load-acquire. >> I assume it is not expected that the user will call >> rte_seqlock_read_lock() again. > Good point, we need a load-acquire (due to changes done in v3). > >> >> Seeing this implementation, I might actually prefer the original >> implementation, I think it is cleaner. But I would like for the begin >> function >> also to wait for an even sequence number, the end function would only have >> to check for same sequence number, this might improve performance a little >> bit as readers won't perform one or several broken reads while a write is in >> progress. The function names are a different thing though. > I think we need to be optimizing for the case where there is no contention > between readers and writers (as that happens most of the time). From this > perspective, not checking for an even seq number in the begin function would > reduce one 'if' statement. The number of statements in C is not relevant, instead we need to look at the generated code. On x86, I would assume an if-statement like “if ((sn & 1) || (sn == sl->sn))” to generate two separate evaluations with their own conditional jump instructions. On AArch64, the two evaluations could probably be combined using a CCMP instruction and need only one conditional branch instruction. With branch prediction, it is doubtful we will see any difference in performance.
> > Going back to the earlier model is better as well, because of the > load-acquire required in the 'rte_seqlock_read_tryunlock' function. The > earlier model would not introduce the load-acquire for the no contention case. The earlier model still had load-acquire in the read_begin function which would have to invoked again. There is no difference in the number or type of memory accesses. We just need to copy the implementation of read_begin into the read_tryunlock function if we decide that the user should not have to re-invoke read_begin on a failed read_tryunlock. > >> >> The writer side behaves much more like a lock with mutual exclusion so >> write_lock/write_unlock makes sense. >> >>> + >>> + if (unlikely(end_sn & 1 || *begin_sn != end_sn)) { >>> + *begin_sn = end_sn; >>> + return false; >>> + } >>> + >>> + return true; >>> +} >>> +