On 2022/11/17 上午9:39, Jinyang He wrote:

On 2022/11/16 下午7:46, Xi Ruoyao wrote:

On Wed, 2022-11-16 at 10:11 +0800, Jinyang He wrote:

+  return "%G6\\n\\t"
+        "1:\\n\\t"
+        "ll.<amo>\\t%0,%1\\n\\t"
+        "and\\t%7,%0,%z3\\n\\t"
+        "or%i5\\t%7,%7,%5\\n\\t"
+        "sc.<amo>\\t%7,%1\\n\\t"
+        "beqz\\t%7,1b\\n\\t";
Do we need a "dbar 0x700" after beqz?

/* snip */
That's worth discussing. Actually I don't see any dbar hint definition
like 0x700 in the manual right now.
Besides, I think what should be provided here is a relaxed version. And
whether the barrier exsit or not is depend on the specific memory_order.
It's not related to memory order, but for a hardware issue workaround.
Jiaxun told me (via LKML):

    I had checked with Loongson guys and they confirmed that the
    workaround still needs to be applied to latest 3A4000 processors,
    including 3A4000 for MIPS and 3A5000 for LoongArch.
        Though, the reason behind the workaround varies with the evaluation
    of their uArch, for GS464V based core, barrier is required as the
    uArch design allows regular load to be reordered after an atomic
    linked load, and that would break assumption of compiler atomic
    constraints.

That certainly seems to be needed, but before or after. It's beyond my
recognition and cc huang...@loongson.cn for help.


Pei told me the ll-sc works at present like follows,

uArch like:
  ll -> (ll.dbar ll.ld_atomic)
  sc -> (sc.dbar sc.st_atomic)

exchange:
ll.dbar
<---------------------------+
ll.ld_atomic $rd            |
...(no jmp)                 |
sc.dbar                     |
sc.st_stomic $rd            |
ld $rj -can-not-emit-at-----+

The load $rj can not emit between ll.dbar and ll.ld_atomic because the sc.dbar barrier it.


compare and exchange:
ll.dbar
<-----------------------+
ll.ld_atomic $rd        |
...(jmp) ---------------+------+
sc.dbar                 |      |
sc.st_stomic $rd        |      |
                        |   <--+
ld $rj -may-emit-at-----+

Jumping out ll-sc may lead loading $rj emit between ll.dbar and ll.atomic.


Thus, exchange not need dbar.





Without these dbar instructions I'd got random test failures in GCC
libgomp test suite.

Which test suite?



We use a non-zero hint here because it is treated exactly same as zero
in 3A5000, and the future LoongArch processors can fix the issue and
ignore the dbar 0x700 instruction.
Thanks, it's a nice workaround.

Reply via email to