> > > > > > 1. rte_ring_generic_pvt.h: > > > > ===================== > > > > > > > > pseudo-c-code // related > > > > armv8 instructions > > > > -------------------- > > > > -------------------------------------- > > > > head.load() // ldr > > > > [head] > > > > rte_smp_rmb() // dmb ishld > > > > opposite_tail.load() // ldr > > > > [opposite_tail] > > > > ... > > > > rte_atomic32_cmpset(head, ...) // ldrex[head];... > > > > stlex[head] > > > > > > > > > > > > 2. rte_ring_c11_pvt.h > > > > ===================== > > > > > > > > pseudo-c-code // related > > > > armv8 instructions > > > > -------------------- > > > > -------------------------------------- > > > > head.atomic_load(relaxed) // ldr[head] > > > > atomic_thread_fence(acquire) // dmb ish > > > > opposite_tail.atomic_load(acquire) // lda[opposite_tail] > > > > ... > > > > head.atomic_cas(..., relaxed) // ldrex[haed]; ... > > > > strex[head] > > > > > > > > > > > > 3. rte_ring_hts_elem_pvt.h > > > > ========================== > > > > > > > > pseudo-c-code // related > > > > armv8 instructions > > > > -------------------- > > > > -------------------------------------- > > > > head.atomic_load(acquire) // lda [head] > > > > opposite_tail.load() // ldr > > > > [opposite_tail] > > > > ... > > > > head.atomic_cas(..., acquire) // ldaex[head]; ... > > > > strex[head] > > > > > > > > The questions that arose from these observations: > > > > a) are all 3 approaches equivalent in terms of functionality? > > > Different, lda (Load with acquire semantics) and ldr (load) are different. > > > > I understand that, my question was: > > lda {head]; ldr[tail] > > vs > > ldr [head]; dmb ishld; ldr [tail]; > > > > Is there any difference in terms of functionality (memory ops > ordering/observability)? > > To be more precise: > > lda {head]; ldr[tail] > vs > ldr [head]; dmb ishld; ldr [tail]; > vs > ldr [head]; dmb ishld; lda [tail]; > > what would be the difference between these 3 cases?
Case A: lda {head]; ldr[tail] load of the head will be observed by the memory subsystem before the load of the tail. Case B: ldr [head]; dmb ishld; ldr [tail]; load of the head will be observed by the memory subsystem Before the load of the tail. Case C: ldr [head]; dmb ishld; lda [tail]; load of the head will be observed by the memory subsystem before the load of the tail. In addition, any load or store program order after lda[tail] will not be observed by the memory subsystem before the load of the tail. Essentially both cases A and B are the same. They preserve following program orders. LOAD-LOAD LOAD-STORE