On 2023-03-20 15:38, Duncan Sands via lttng-dev wrote:
Hi Mathieu,

While OK for the general case, I would recommend that we immediately implement something more efficient on x86 32/64 which takes into account that __ATOMIC_ACQ_REL atomic operations are implemented with LOCK prefixed atomic ops, which imply the barrier already, leaving the before/after_uatomic_*() as no-ops.

maybe first check whether the GCC optimizers merge them.  I believe some optimizations of atomic primitives are allowed and implemented, but I couldn't say which ones.

Best wishes, Duncan.

Tested on godbolt.org with:

int a;

void fct(void)
{
    (void) __atomic_add_fetch(&a, 1, __ATOMIC_RELAXED);
    __atomic_thread_fence(__ATOMIC_SEQ_CST);
}

x86-64 gcc 12.2 -O2 -std=c11:

fct:
        lock add        DWORD PTR a[rip], 1
        lock or QWORD PTR [rsp], 0
        ret
a:
        .zero   4

x86-64 clang 16.0.0 -O2 -std=c11:

fct:                                    # @fct
        lock            inc     dword ptr [rip + a]
        mfence
        ret
a:
        .long   0

So none of gcc/clang optimize this today, hence the need for an x86-specific implementation.

Thanks,

Mathieu


--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com

_______________________________________________
lttng-dev mailing list
lttng-dev@lists.lttng.org
https://lists.lttng.org/cgi-bin/mailman/listinfo/lttng-dev

Reply via email to