[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-09-16 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye updated 
https://github.com/llvm/llvm-project/pull/108636

>From c23b4d2dde0e03051332604d18eaf3bd434e4d4e Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Thu, 15 Aug 2024 21:49:23 +
Subject: [PATCH 1/3] [BPF] Refactor BPFSubtarget::initSubtargetFeatures()
 (NFC)

Refactor it to make it match the style of
BPFTargetInfo::getTargetDefines(), so that when we add -mcpu=v5 in the
future, we can simply append e.g.:

  if (CpuVerNum >= 5)
HasFoo = true;

instead of saying:

  if (CPU == "v5") {
HasJmpExt = true;
HasJmp32 = true;
HasAlu32 = true;
HasLdsx = !Disable_ldsx;
HasMovsx = !Disable_movsx;
HasBswap = !Disable_bswap;
HasSdivSmod = !Disable_sdiv_smod;
HasGotol = !Disable_gotol;
HasStoreImm = !Disable_StoreImm;
HasFoo = true;
  }

This also makes it clearer that newer "CPU"s always support older
features.  No functional changes intended.
---
 llvm/lib/Target/BPF/BPFSubtarget.cpp | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFSubtarget.cpp 
b/llvm/lib/Target/BPF/BPFSubtarget.cpp
index 305e9a2bf2cda3..dbb7b669575b57 100644
--- a/llvm/lib/Target/BPF/BPFSubtarget.cpp
+++ b/llvm/lib/Target/BPF/BPFSubtarget.cpp
@@ -69,29 +69,25 @@ void BPFSubtarget::initSubtargetFeatures(StringRef CPU, 
StringRef FS) {
 CPU = "v3";
   if (CPU == "probe")
 CPU = sys::detail::getHostCPUNameForBPF();
-  if (CPU == "generic" || CPU == "v1")
+  if (CPU.empty() || CPU == "generic" || CPU == "v1")
 return;
-  if (CPU == "v2") {
-HasJmpExt = true;
-return;
-  }
-  if (CPU == "v3") {
+
+  int CpuVerNum = CPU.back() - '0';
+  if (CpuVerNum >= 2)
 HasJmpExt = true;
+
+  if (CpuVerNum >= 3) {
 HasJmp32 = true;
 HasAlu32 = true;
-return;
   }
-  if (CPU == "v4") {
-HasJmpExt = true;
-HasJmp32 = true;
-HasAlu32 = true;
+
+  if (CpuVerNum >= 4) {
 HasLdsx = !Disable_ldsx;
 HasMovsx = !Disable_movsx;
 HasBswap = !Disable_bswap;
 HasSdivSmod = !Disable_sdiv_smod;
 HasGotol = !Disable_gotol;
 HasStoreImm = !Disable_StoreImm;
-return;
   }
 }
 

>From 680b668fc597728c23fa77926e11491cd0512ea1 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Fri, 23 Aug 2024 19:11:05 +
Subject: [PATCH 2/3] [BPF] Refactor {LOAD,STORE}{,32} classes (NFC)

We will need different AsmString formats for load-acquire and
store-release instructions.  To make that easier, refactor
{LOAD,STORE}{,32} classes to take AsmString as an argument directly.
Add a BPFModeModifer parameter to STORE{,32} for similar reasons.

No functional changes intended.
---
 llvm/lib/Target/BPF/BPFInstrInfo.td | 35 ++---
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td 
b/llvm/lib/Target/BPF/BPFInstrInfo.td
index f7e17901c7ed5e..aab2291a70efc2 100644
--- a/llvm/lib/Target/BPF/BPFInstrInfo.td
+++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -497,12 +497,11 @@ def LD_pseudo
 }
 
 // STORE instructions
-class STORE Pattern>
-: TYPE_LD_ST Pattern>
+: TYPE_LD_ST {
+ AsmString, Pattern> {
   bits<4> src;
   bits<20> addr;
 
@@ -513,7 +512,7 @@ class STORE Pattern>
 }
 
 class STOREi64
-: STORE;
+: STORE;
 
 let Predicates = [BPFNoALU32] in {
   def STW : STOREi64;
@@ -567,12 +566,11 @@ let Predicates = [BPFHasALU32, BPFHasStoreImm] in {
 }
 
 // LOAD instructions
-class LOAD Pattern>
+class LOAD Pattern>
 : TYPE_LD_ST {
+ AsmString, Pattern> {
   bits<4> dst;
   bits<20> addr;
 
@@ -583,7 +581,8 @@ class LOAD
-: LOAD;
+: LOAD;
 
 let isCodeGenOnly = 1 in {
   class CORE_LD
@@ -1069,12 +1068,11 @@ def : Pat<(i32 (trunc GPR:$src)),
 def : Pat<(i64 (anyext GPR32:$src)),
   (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$src, sub_32)>;
 
-class STORE32 Pattern>
-: TYPE_LD_ST Pattern>
+: TYPE_LD_ST {
+ AsmString, Pattern> {
   bits<4> src;
   bits<20> addr;
 
@@ -1085,7 +1083,8 @@ class STORE32 Pattern>
 }
 
 class STOREi32
-: STORE32;
+: STORE32;
 
 let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
   def STW32 : STOREi32;
@@ -1093,12 +1092,11 @@ let Predicates = [BPFHasALU32], DecoderNamespace = 
"BPFALU32" in {
   def STB32 : STOREi32;
 }
 
-class LOAD32 Pattern>
+class LOAD32 Pattern>
 : TYPE_LD_ST {
+AsmString, Pattern> {
   bits<4> dst;
   bits<20> addr;
 
@@ -1109,7 +1107,8 @@ class LOAD32
-: LOAD32;
+: LOAD32;
 
 let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
   def LDW32 : LOADi32;

>From b154221a54d10e34e5009c44c9587582e6769482 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Tue, 10 Sep 2024 23:09:28 +
Subject: [PATCH 3/3] [BPF] Add load-acquire and store-release instructions
 under -mcpu=v5

As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v5.

A "load_acquire" is a BPF_LDX instruction with a new

[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-09-16 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye ready_for_review 
https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-10 Thread Peilin Ye via cfe-commits


@@ -622,6 +665,47 @@ let Predicates = [BPFHasLdsx] in {
 
 def LDD : LOADi64;
 
+class LOAD_ACQUIRE
+: TYPE_LD_ST {
+  bits<4> dst;
+  bits<20> addr;
+
+  let Inst{51-48} = dst;
+  let Inst{55-52} = addr{19-16}; // base reg
+  let Inst{47-32} = addr{15-0}; // offset
+  let Inst{7-4} = BPF_LOAD_ACQ.Value;

peilin-ye wrote:

Sure!  (I'm on leave for the rest of the week; will get back to this first 
thing next week)

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-07 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye edited 
https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-10-07 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye updated 
https://github.com/llvm/llvm-project/pull/108636

>From 7a3c9741ec7a5680b0a59ed7c0316a71c78ccf48 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 06:44:21 +
Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in
 BPFMISimplifyPatchable.cpp (NFC)

We are planning to add load (specifically, atomic acquiring load, or
"load-acquire") instructions under the STX instruction class.  To make
that easier, rename the isST*() and isLD*() helper functions based on
what the instructions actually do, rather than their instruction class.
---
 .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +--
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp 
b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
index 39390e8c38f8c1..4a1684ccebb793 100644
--- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
+++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
@@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction 
&MFParm) {
   LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n");
 }
 
-static bool isST(unsigned Opcode) {
+static bool isStoreImm(unsigned Opcode) {
   return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm ||
  Opcode == BPF::STW_imm || Opcode == BPF::STD_imm;
 }
 
-static bool isSTX32(unsigned Opcode) {
+static bool isStore32(unsigned Opcode) {
   return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32;
 }
 
-static bool isSTX64(unsigned Opcode) {
+static bool isStore64(unsigned Opcode) {
   return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW ||
  Opcode == BPF::STD;
 }
 
-static bool isLDX32(unsigned Opcode) {
+static bool isLoad32(unsigned Opcode) {
   return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32;
 }
 
-static bool isLDX64(unsigned Opcode) {
+static bool isLoad64(unsigned Opcode) {
   return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW ||
  Opcode == BPF::LDD;
 }
 
-static bool isLDSX(unsigned Opcode) {
+static bool isLoadSext(unsigned Opcode) {
   return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX;
 }
 
 bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) {
-  return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode);
+  return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode);
 }
 
 void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
@@ -143,11 +143,11 @@ void 
BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
 MachineInstr *DefInst = MO.getParent();
 unsigned Opcode = DefInst->getOpcode();
 unsigned COREOp;
-if (isLDX64(Opcode) || isLDSX(Opcode))
+if (isLoad64(Opcode) || isLoadSext(Opcode))
   COREOp = BPF::CORE_LD64;
-else if (isLDX32(Opcode))
+else if (isLoad32(Opcode))
   COREOp = BPF::CORE_LD32;
-else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode))
+else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode))
   COREOp = BPF::CORE_ST;
 else
   continue;
@@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo 
*MRI,
 // Reject the form:
 //   %1 = ADD_rr %2, %3
 //   *(type *)(%2 + 0) = %1
-if (isSTX64(Opcode) || isSTX32(Opcode)) {
+if (isStore64(Opcode) || isStore32(Opcode)) {
   const MachineOperand &Opnd = DefInst->getOperand(0);
   if (Opnd.isReg() && Opnd.getReg() == MO.getReg())
 continue;

>From b3e494c9e4a36c68134c3bfda81b2068c1c08bca Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 07:31:54 +
Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions
 under -mcpu=v4

As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4.

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x1, acquiring atomic load).  Similarly, a
"store-release" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_STORE_REL (0xb, releasing atomic store).

Unlike existing atomic operations that only support BPF_W (32-bit) and
BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or 16-bit load-acquire
zero-extends the value before writing it to a 32-bit register, just like
ARM64 instruction LDAPRH and friends.

For example:

  long foo(long *ptr) {
  return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  db 10 00 00 10 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x0010): BPF_LOAD_ACQ

Similarly:

  void bar(short *ptr, short val) {
  __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
  }

bar() can be compiled to:

  cb 21 00 00 b0 00 00 00  store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00 00 00  exit

  opco

[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-07 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye edited 
https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-16 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

Back on this; rebased and pushed patchset v6 to make `acquire-release.ll` also 
cover `-march=bpfeb` as suggested by Eduard.  Basically:
```
; CHECK-LE: w0 = load_acquire((u8 *)(r1 + 0)) # encoding: 
[0xd3,0x10,0x00,0x00,0x10,0x00,0x00,0x00]
; CHECK-BE: w0 = load_acquire((u8 *)(r1 + 0)) # encoding: 
[0xd3,0x01,0x00,0x00,0x00,0x00,0x00,0x10]
```
`src_reg` and `dst_reg` are swapped, and `imm` is in big endian.  Currently 
focusing on making kernel-side changes (verifier and JITs).
- - -
That weird `fatal: reference is not a tree:` CI error looks similar to [the one 
mentioned in PR 
73506](https://github.com/llvm/llvm-project/pull/73506#issuecomment-1832909676).
  Running check-all locally for now...

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-16 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye edited 
https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-16 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye updated 
https://github.com/llvm/llvm-project/pull/108636

>From 32cc1b238ddd67a96627a1cc57c1b9ca5c6be938 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 06:44:21 +
Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in
 BPFMISimplifyPatchable.cpp (NFC)

We are planning to add load (specifically, atomic acquiring load, or
"load-acquire") instructions under the STX instruction class.  To make
that easier, rename the isST*() and isLD*() helper functions based on
what the instructions actually do, rather than their instruction class.
---
 .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +--
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp 
b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
index 39390e8c38f8c1..4a1684ccebb793 100644
--- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
+++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
@@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction 
&MFParm) {
   LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n");
 }
 
-static bool isST(unsigned Opcode) {
+static bool isStoreImm(unsigned Opcode) {
   return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm ||
  Opcode == BPF::STW_imm || Opcode == BPF::STD_imm;
 }
 
-static bool isSTX32(unsigned Opcode) {
+static bool isStore32(unsigned Opcode) {
   return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32;
 }
 
-static bool isSTX64(unsigned Opcode) {
+static bool isStore64(unsigned Opcode) {
   return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW ||
  Opcode == BPF::STD;
 }
 
-static bool isLDX32(unsigned Opcode) {
+static bool isLoad32(unsigned Opcode) {
   return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32;
 }
 
-static bool isLDX64(unsigned Opcode) {
+static bool isLoad64(unsigned Opcode) {
   return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW ||
  Opcode == BPF::LDD;
 }
 
-static bool isLDSX(unsigned Opcode) {
+static bool isLoadSext(unsigned Opcode) {
   return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX;
 }
 
 bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) {
-  return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode);
+  return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode);
 }
 
 void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
@@ -143,11 +143,11 @@ void 
BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
 MachineInstr *DefInst = MO.getParent();
 unsigned Opcode = DefInst->getOpcode();
 unsigned COREOp;
-if (isLDX64(Opcode) || isLDSX(Opcode))
+if (isLoad64(Opcode) || isLoadSext(Opcode))
   COREOp = BPF::CORE_LD64;
-else if (isLDX32(Opcode))
+else if (isLoad32(Opcode))
   COREOp = BPF::CORE_LD32;
-else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode))
+else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode))
   COREOp = BPF::CORE_ST;
 else
   continue;
@@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo 
*MRI,
 // Reject the form:
 //   %1 = ADD_rr %2, %3
 //   *(type *)(%2 + 0) = %1
-if (isSTX64(Opcode) || isSTX32(Opcode)) {
+if (isStore64(Opcode) || isStore32(Opcode)) {
   const MachineOperand &Opnd = DefInst->getOperand(0);
   if (Opnd.isReg() && Opnd.getReg() == MO.getReg())
 continue;

>From ee7836ecbccd745417d145690d3abf822ecc85c0 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 07:31:54 +
Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions
 under -mcpu=v4

As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4.

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x1, acquiring atomic load).  Similarly, a
"store-release" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_STORE_REL (0xb, releasing atomic store).

Unlike existing atomic operations that only support BPF_W (32-bit) and
BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or 16-bit load-acquire
zero-extends the value before writing it to a 32-bit register, just like
ARM64 instruction LDAPRH and friends.

As an example, for -march=bpfel:

  long foo(long *ptr) {
  return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  db 10 00 00 10 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x0010): BPF_LOAD_ACQ

Similarly:

  void bar(short *ptr, short val) {
  __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
  }

bar() can be compiled to:

  cb 21 00 00 b0 00 00 00  store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00

[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-16 Thread Peilin Ye via cfe-commits


@@ -1205,10 +1298,19 @@ class LOAD32
 : LOAD32;
 
+class LOAD_ACQUIREi32
+: LOAD_ACQUIRE;
+
 let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
   def LDW32 : LOADi32;
   def LDH32 : LOADi32;
   def LDB32 : LOADi32;
+
+  let Predicates = [BPFHasLoadAcquire] in {

peilin-ye wrote:

> Is there a reason to define these (and stores) behind `BPFHasALU32` flag?

Not really.  Looks like there is no way to turn off `HasALU32` for v4.  I just 
wanted to make it clear in the `.td` file that we are using half-registers for 
8-, 16- and 32-bit insns.

Also I wanted to define them behind `DecoderNamespace = "BPFALU32"` because 
currently `BPFDisassembler::getInstruction()` uses `DecoderTableBPFALU3264` for 
all non-64-bit `BPF_STX | BPF_ATOMIC` insns:

```cpp
  if ((InstClass == BPF_LDX || InstClass == BPF_STX) &&
  getInstSize(Insn) != BPF_DW &&
  (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) &&
  STI.hasFeature(BPF::ALU32))
Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn, Address,
   this, STI);
  else
Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, this,
   STI);
```
So if I move them out of `HasALU32` and keep BPFDisassembler.cpp as-is, 
llvm-objdump would give me:
```
 :
; __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
   0:   cb 21 00 00 b0 00 00 00 
; }
   1:   95 00 00 00 00 00 00 00 exit
```
- - -
> From instruction encoding pow we use STX class, which has no 32/64 
> sub-division.

What I'm doing is similar to `XXOR*`: right now we only have `XXORD` and 
`XXORW32` (both `BPF_STX`), and `XXORW32` is behind `Predicates = 
[BPFHasALU32], DecoderNamespace = "BPFALU32"`.  There is no "ALU64 version" of 
`XXORW32`.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-16 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye updated 
https://github.com/llvm/llvm-project/pull/108636

>From fda5dadf6af584fab9eef9fa588efe5454a9d1d1 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 06:44:21 +
Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in
 BPFMISimplifyPatchable.cpp (NFC)

We are planning to add load (specifically, atomic acquiring load, or
"load-acquire") instructions under the STX instruction class.  To make
that easier, rename the isST*() and isLD*() helper functions based on
what the instructions actually do, rather than their instruction class.
---
 .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +--
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp 
b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
index 39390e8c38f8c1..4a1684ccebb793 100644
--- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
+++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
@@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction 
&MFParm) {
   LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n");
 }
 
-static bool isST(unsigned Opcode) {
+static bool isStoreImm(unsigned Opcode) {
   return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm ||
  Opcode == BPF::STW_imm || Opcode == BPF::STD_imm;
 }
 
-static bool isSTX32(unsigned Opcode) {
+static bool isStore32(unsigned Opcode) {
   return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32;
 }
 
-static bool isSTX64(unsigned Opcode) {
+static bool isStore64(unsigned Opcode) {
   return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW ||
  Opcode == BPF::STD;
 }
 
-static bool isLDX32(unsigned Opcode) {
+static bool isLoad32(unsigned Opcode) {
   return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32;
 }
 
-static bool isLDX64(unsigned Opcode) {
+static bool isLoad64(unsigned Opcode) {
   return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW ||
  Opcode == BPF::LDD;
 }
 
-static bool isLDSX(unsigned Opcode) {
+static bool isLoadSext(unsigned Opcode) {
   return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX;
 }
 
 bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) {
-  return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode);
+  return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode);
 }
 
 void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
@@ -143,11 +143,11 @@ void 
BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
 MachineInstr *DefInst = MO.getParent();
 unsigned Opcode = DefInst->getOpcode();
 unsigned COREOp;
-if (isLDX64(Opcode) || isLDSX(Opcode))
+if (isLoad64(Opcode) || isLoadSext(Opcode))
   COREOp = BPF::CORE_LD64;
-else if (isLDX32(Opcode))
+else if (isLoad32(Opcode))
   COREOp = BPF::CORE_LD32;
-else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode))
+else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode))
   COREOp = BPF::CORE_ST;
 else
   continue;
@@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo 
*MRI,
 // Reject the form:
 //   %1 = ADD_rr %2, %3
 //   *(type *)(%2 + 0) = %1
-if (isSTX64(Opcode) || isSTX32(Opcode)) {
+if (isStore64(Opcode) || isStore32(Opcode)) {
   const MachineOperand &Opnd = DefInst->getOperand(0);
   if (Opnd.isReg() && Opnd.getReg() == MO.getReg())
 continue;

>From 963e522de5a6ddbd7c1aa386e70d2997c6df2359 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 07:31:54 +
Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions
 under -mcpu=v4

As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4.

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_LOAD_ACQ (0x1, acquiring atomic load).  Similarly, a
"store-release" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_STORE_REL (0xb, releasing atomic store).

Unlike existing atomic operations that only support BPF_W (32-bit) and
BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or 16-bit load-acquire
zero-extends the value before writing it to a 32-bit register, just like
ARM64 instruction LDAPRH and friends.

As an example, for -march=bpfel:

  long foo(long *ptr) {
  return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  db 10 00 00 10 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x0010): BPF_LOAD_ACQ

Similarly:

  void bar(short *ptr, short val) {
  __atomic_store_n(ptr, val, __ATOMIC_RELEASE);
  }

bar() can be compiled to:

  cb 21 00 00 b0 00 00 00  store_release((u16 *)(r1 + 0x0), w2)
  95 00 00 00 00 00

[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-07 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

Thanks for all your suggestions!  Main changes in patchset v5:

1. use `-mcpu=v4` instead of `-mcpu=v5`
2. make both load-acquire and store-release `BPF_STX | BPF_ATOMIC` insns, 
instead of introducing new mode modifiers:

`BPF_STX | BPF_ATOMIC` insns use a subset of `BPFArithOp<>` in bit `4-7` of the 
`imm` field.  Currently these are in use:

```
def BPF_ADD  : BPFArithOp<0x0>;
def BPF_OR   : BPFArithOp<0x4>;
def BPF_AND  : BPFArithOp<0x5>;
def BPF_XOR  : BPFArithOp<0xa>;
def BPF_XCHG: BPFArithOp<0xe>;
def BPF_CMPXCHG : BPFArithOp<0xf>;
```

On top of this, I tentatively chose `0x1` for `BPF_LOAD_ACQ`, and `0xb` for 
`BPF_STORE_REL`, because:
 - `0x1` is `BPF_SUB` in `BPFArithOp<>`, but atomic SUB is implemented using a 
NEG + ADD:
```
// atomic_load_sub can be represented as a neg followed
// by an atomic_load_add.
```
 - `0xb` is `BPF_MOV` in `BPFArithOp<>`, which is "moving value between 
registers" and isn't applicable for atomic (memory) operations

Please let me know if you have better suggestions for encoding (or if I'm 
overthinking this and we can simply use `0x1` and `0x2`).
- - -
@eddyz87, do my changes to `BPFMISimplifyPatchable.cpp` still look good to you? 
 Now that load-acquires are `STX` insns, I wanted to make sure that 
`BPFMISimplifyPatchable::checkADDrr()` can still handle them correctly for 
CO-RE.

I'll add tests to `llvm/test/CodeGen/BPF/CORE/` later, like what you did in 
commit 08d92dedd26c ("[BPF] Fix in/out argument constraints for CORE_MEM 
instructions").
- - -
About whether we should sign-extend for 8- and 16-bit load-acquires (brought up 
by Yonghong):

All ARM64 insns that match `acquiring_load` seem to 
zero-extend the value before writing it to register, like, `LDAPRH`:

> Load-Acquire RCpc Register Halfword derives an address from a base register 
> value, loads a halfword from the derived address in memory, zero-extends it 
> and writes it to a register.

So right now I'm keeping our `LDBACQ32` and `LDHACQ32` to zero-extend.  I'll 
take a look at other archs later.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-10-03 Thread Peilin Ye via cfe-commits


@@ -522,6 +526,28 @@ let Predicates = [BPFNoALU32] in {
 }
 def STD : STOREi64;
 
+class relaxed_store
+  : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> {
+  let IsAtomic = 1;
+  let IsAtomicOrderingReleaseOrStronger = 0;
+}
+
+class releasing_store
+  : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> {
+  let IsAtomic = 1;
+  let IsAtomicOrderingRelease = 1;
+}

peilin-ye wrote:

Got it, thanks for the input!  I also found this in [Instruction-Level BPF 
Memory 
Model](https://docs.google.com/document/d/1TaSEfWfLnRUi5KqkavUQyL2tThJXYWHS15qcbxIsFb0/edit?usp=sharing),
 section "Atomic Loads" :

> For `__ATOMIC_RELAXED`, these can simply be implemented as plain BPF load 
> instructions.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-22 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

> In the above, you will do unsigned load extension. But what about signed
extension load variant?

@yonghong-song, if we ever need that, we can add a new flag to bit `0-3` of 
`imm` to indicate "this 8- or 16-bit load-acquire is sign-extending", just like 
`BPF_FETCH` for `BPF_XCHG` and `BPF_CMPXCHG`.

However, I wonder how do I generate a sign-extending (`sext`), acquiring 
`ATOMIC_LOAD` in SelectionDAG?  If I compile the following program:
```c
// clang --target=bpf -mcpu=v4 -mllvm -debug-only=isel-dump -O2 -g -c -o 
demo.bpf.o demo.bpf.c
long bar(char *ptr) {
return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
}
```
I get this:
```
Optimized legalized selection DAG: %bb.0 'bar:entry'
SelectionDAG has 10 nodes:
  t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
  t8: i32,ch = AtomicLoad<(load acquire (s8) from %ir.ptr), zext from i8> t0, 
t2, demo.bpf.c:3:12
  t9: i64 = any_extend t8, demo.bpf.c:3:12
t11: i64 = sign_extend_inreg t9, ValueType:ch:i8, demo.bpf.c:3:12
  t6: ch,glue = CopyToReg t8:1, Register:i64 $r0, t11, demo.bpf.c:3:5
  t7: ch = BPFISD::RET_GLUE t6, Register:i64 $r0, t6:1, demo.bpf.c:3:5
```
The `AtomicLoad` node (`t8`) still says `zext` (zero-extending).  The program 
would still work because it does an 8-to-64-bit `MOVSX` after the load-acquire:
```
; return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
   0:   d3 11 00 00 10 00 00 00 w1 = load_acquire((u8 *)(r1 + 0x0))
   1:   bf 10 08 00 00 00 00 00 r0 = (s8)r1
   2:   95 00 00 00 00 00 00 00 exit
```
`r0 = (s8)r1` takes the lowest byte of `r1` and sign-extends it into 64-bit, so 
it doesn't matter if the load-acquire zero- or sign-extended the char into 
32-bit.

I still wonder if there's a way to change that `AtomicLoad` node (`t8`) from 
`zext` to `sext` though.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-22 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

Another problem is, right now, if I do this:
```diff
--- a/llvm/lib/Target/BPF/BPFInstrInfo.td
+++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -1343,11 +1343,11 @@ let Predicates = [BPFHasALU32] in {
 
   let Predicates = [BPFHasLoadAcquire] in {
 foreach P = [[relaxed_load, LDWACQ32],
- [relaxed_load, LDHACQ32],
- [relaxed_load, LDBACQ32],
+ [relaxed_load, LDHACQ32],
+ [relaxed_load, LDBACQ32],
  [acquiring_load, LDWACQ32],
- [acquiring_load, LDHACQ32],
- [acquiring_load, LDBACQ32],
+ [acquiring_load, LDHACQ32],
+ [acquiring_load, LDBACQ32],
 ] in {
   def : Pat<(P[0] ADDRri:$addr), (P[1] ADDRri:$addr)>;
 }
```
The above toy program no longer compiles:
```
fatal error: error in backend: Cannot select: t9: i32,ch = AtomicLoad<(load 
acquire (s8) from %ir.ptr), zext from i8> t0, t2, demo.bpf.c:3:12
  t2: i64,ch = CopyFromReg t0, Register:i64 %0
t1: i64 = Register %0
In function: bar
...
```
This is because commit 1ee6ce9bad4d ("GlobalISel: Allow forming atomic/volatile 
G_ZEXTLOAD") did this in `llvm/utils/TableGen/CodeGenDAGPatterns.cpp`:
```diff
@@ -1105,6 +1109,10 @@ std::string TreePredicateFn::getPredCode() const {
 Code += "if 
(isReleaseOrStronger(cast(N)->getMergedOrdering())) "
 "return false;\n";
 
+  // TODO: Handle atomic sextload/zextload normally when ATOMIC_LOAD is 
removed.
+  if (isAtomic() && (isZeroExtLoad() || isSignExtLoad()))
+Code += "return false;\n";
+
   if (isLoad() || isStore()) {
 StringRef SDNodeName = isLoad() ? "LoadSDNode" : "StoreSDNode";
```
As a result, nothing would match `atomic_load_zext_{8,16}` because these 
patterns are **explicitly** marked as `zext` (`isZeroExtLoad()`).
- - -
If our plan is to:
 - generate `zext` BPF 8- and 16-bit load-acquire for `atomic_load_zext_{8,16}`
 - (if needed) generate `sext` BPF 8- and 16-bit load-acquire for 
`atomic_load_sext_{8,16}`

Then this problem needs to be solved, and I can include the author of 
1ee6ce9bad4d into discussion.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-28 Thread Peilin Ye via cfe-commits


@@ -48,6 +48,24 @@ def BPF_END  : BPFArithOp<0xd>;
 def BPF_XCHG: BPFArithOp<0xe>;
 def BPF_CMPXCHG : BPFArithOp<0xf>;
 
+class BPFAtomicLoadStoreOp val> {
+  bits<4> Value = val;
+}
+
+def BPF_ATOMIC_LOAD : BPFAtomicLoadStoreOp<0x1>;
+def BPF_ATOMIC_STORE : BPFAtomicLoadStoreOp<0x2>;
+
+class BPFAtomicOrdering val> {
+  bits<4> Value = val;
+}
+
+def BPF_RELAXED : BPFAtomicOrdering<0x0>;
+def BPF_CONSUME : BPFAtomicOrdering<0x1>;

peilin-ye wrote:

Sure, let me delete it.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-28 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye updated 
https://github.com/llvm/llvm-project/pull/108636

>From 885d5141f6707a0fdf4be363351083f8fdf8fd54 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 06:44:21 +
Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in
 BPFMISimplifyPatchable.cpp (NFC)

We are planning to add load (specifically, atomic acquiring load, or
"load-acquire") instructions under the STX instruction class.  To make
that easier, rename the isST*() and isLD*() helper functions based on
what the instructions actually do, rather than their instruction class.
---
 .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +--
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp 
b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
index 39390e8c38f8c1..4a1684ccebb793 100644
--- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
+++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
@@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction 
&MFParm) {
   LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n");
 }
 
-static bool isST(unsigned Opcode) {
+static bool isStoreImm(unsigned Opcode) {
   return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm ||
  Opcode == BPF::STW_imm || Opcode == BPF::STD_imm;
 }
 
-static bool isSTX32(unsigned Opcode) {
+static bool isStore32(unsigned Opcode) {
   return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32;
 }
 
-static bool isSTX64(unsigned Opcode) {
+static bool isStore64(unsigned Opcode) {
   return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW ||
  Opcode == BPF::STD;
 }
 
-static bool isLDX32(unsigned Opcode) {
+static bool isLoad32(unsigned Opcode) {
   return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32;
 }
 
-static bool isLDX64(unsigned Opcode) {
+static bool isLoad64(unsigned Opcode) {
   return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW ||
  Opcode == BPF::LDD;
 }
 
-static bool isLDSX(unsigned Opcode) {
+static bool isLoadSext(unsigned Opcode) {
   return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX;
 }
 
 bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) {
-  return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode);
+  return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode);
 }
 
 void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
@@ -143,11 +143,11 @@ void 
BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
 MachineInstr *DefInst = MO.getParent();
 unsigned Opcode = DefInst->getOpcode();
 unsigned COREOp;
-if (isLDX64(Opcode) || isLDSX(Opcode))
+if (isLoad64(Opcode) || isLoadSext(Opcode))
   COREOp = BPF::CORE_LD64;
-else if (isLDX32(Opcode))
+else if (isLoad32(Opcode))
   COREOp = BPF::CORE_LD32;
-else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode))
+else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode))
   COREOp = BPF::CORE_ST;
 else
   continue;
@@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo 
*MRI,
 // Reject the form:
 //   %1 = ADD_rr %2, %3
 //   *(type *)(%2 + 0) = %1
-if (isSTX64(Opcode) || isSTX32(Opcode)) {
+if (isStore64(Opcode) || isStore32(Opcode)) {
   const MachineOperand &Opnd = DefInst->getOperand(0);
   if (Opnd.isReg() && Opnd.getReg() == MO.getReg())
 continue;

>From 77807685b76607db97c18bb5dd146203c675ab98 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 07:31:54 +
Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions
 under -mcpu=v4

As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4.

The following new flags are defined:

  BPF_ATOMIC_LOAD   0x10
  BPF_ATOMIC_STORE  0x20

  BPF_RELAXED:   0x0
  BPF_ACQUIRE:   0x1
  BPF_RELEASE:   0x2
  BPF_ACQ_REL:   0x3
  BPF_SEQ_CST:   0x4

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_ATOMIC_LOAD | BPF_ACQUIRE (0x11).

Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_ATOMIC_STORE | BPF_RELEASE (0x22).

Unlike existing atomic operations that only support BPF_W (32-bit) and
BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or 16-bit load-acquire
zero-extends the value before writing it to a 32-bit register, just like
ARM64 instruction LDAPRH and friends.

As an example, for -march=bpfel (big-endian):

  long foo(long *ptr) {
  return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  db 10 00 00 11 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x0011): BPF_ATOMIC_L

[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-28 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

Changes in v7:

1. Change encoding to make it easier to support `SEQ_CST` in the future 
(Yingchi)
```
 (before)  | imm{7-4}| imm{3-0} |
 - | --- |  |
 load-acquire  | BPF_LOAD_ACQ (0x1)  | 0x0  |
 store-release | BPF_STORE_REL (0x2) | 0x0  |
```
```
 (after)   | imm{7-4}| imm{3-0}  |
 - | --- | - |
 load-acquire  | BPF_LOAD (0x1)  | BPF_ACQUIRE (0x1) |
 store-release | BPF_STORE (0x2) | BPF_RELEASE (0x2) |
```
Introduce the following new flags for memory orders:
```
def BPF_RELAXED : BPFAtomicOrdering<0x0>;
def BPF_ACQUIRE : BPFAtomicOrdering<0x1>;
def BPF_RELEASE : BPFAtomicOrdering<0x2>;
def BPF_ACQ_REL : BPFAtomicOrdering<0x3>;
def BPF_SEQ_CST : BPFAtomicOrdering<0x4>;
```
Include all orders defined by the C++ standard except `CONSUME` (Yonghong)

2. `BPFISelLowering.cpp` changes (Matt)
   - Avoid unnecessarily setting `MVT::i{8,16}` `ATOMIC_{LOAD,STORE}` to 
`Custom`
   - Coding style change

3. `acquire-release.ll` changes (Yingchi)
   - Move `; CHECK` labels into their corresponding function bodies
   - Delete the `; Source:` section

4. Update `bpf-predefined-macros.c`

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-24 Thread Peilin Ye via cfe-commits


@@ -48,6 +48,13 @@ def BPF_END  : BPFArithOp<0xd>;
 def BPF_XCHG: BPFArithOp<0xe>;
 def BPF_CMPXCHG : BPFArithOp<0xf>;
 
+class BPFAtomicLoadStoreOp val> {
+  bits<4> Value = val;
+}
+
+def BPF_LOAD_ACQ : BPFAtomicLoadStoreOp<0x1>;
+def BPF_STORE_REL : BPFAtomicLoadStoreOp<0xb>;

peilin-ye wrote:

@yonghong-song, my thinking was:

Right now for `BPF_ATOMIC` insns, we use a subset of `BPFArithOp<>` in `imm`:
```
def BPF_ADD  : BPFArithOp<0x0>;
def BPF_OR   : BPFArithOp<0x4>;
def BPF_AND  : BPFArithOp<0x5>;
def BPF_XOR  : BPFArithOp<0xa>;
def BPF_XCHG: BPFArithOp<0xe>;
def BPF_CMPXCHG : BPFArithOp<0xf>;
```
Will we ever want to support other `BPFArithOp<>` ops in `BPF_ATOMIC` ?  Like, 
`0x1` is `BPF_SUB`, but it looks like currently atomic SUB is implemented using 
NEG + ADD:
```
// atomic_load_sub can be represented as a neg followed
// by an atomic_load_add.
```
So I thought, "OK, if we don't need to reserve `0x1` for `BPF_SUB`, then I can 
use it".  Similarly, `0xb` is `BPF_MOV` (moving value between registers), and I 
think it's safe to say we'll never need it for `BPF_ATOMIC`.

But `0x2` is `BPF_MUL`; will we ever support atomic multiplication? :-)
- - -
Am I overthinking this?  Should I simply use `0x1` and `0x2` ?

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-11-04 Thread Peilin Ye via cfe-commits


@@ -0,0 +1,142 @@
+; RUN: llc < %s -march=bpfel -mcpu=v4 -verify-machineinstrs -show-mc-encoding \
+; RUN:   | FileCheck -check-prefixes=CHECK-LE %s
+; RUN: llc < %s -march=bpfeb -mcpu=v4 -verify-machineinstrs -show-mc-encoding \

peilin-ye wrote:

(replied above)

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-11-04 Thread Peilin Ye via cfe-commits


@@ -0,0 +1,142 @@
+; RUN: llc < %s -march=bpfel -mcpu=v4 -verify-machineinstrs -show-mc-encoding \

peilin-ye wrote:

> again

Thanks for pointing this out, but I do want to test the MC encoding.  It looks 
like at the moment a lot of tests under `llvm/test/CodeGen/BPF/` are doing this:

```shell
$ git grep --files-with-matches '\-show-mc-encoding' -- llvm/test/CodeGen/BPF/ 
| wc -l
18
```

I think we can do a separate PR to move all the "test MC encoding" bits from 
`llvm/test/CodeGen/BPF/` to `llvm/test/MC/BPF/`, if needed.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-31 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

Pushed v8 to make it generate plain (`BPF_MEM`) loads and stores if user 
requested `__ATOMIC_RELAXED`, as suggested by Yonghong.  Updated commit message 
accordingly.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-31 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

Sure Yonghong, I can include that in this PR.  Similarly, since we cannot have 
both `relaxed_load` and 
`relaxed_load`, I'll keep it `zext` (`BPF_MEM` | 
`BPF_LDX`) for now.  For example:
```c
int foo(char *ptr) {
return __atomic_load_n(ptr, __ATOMIC_RELAXED);
}
```
This'll be compiled into:
```
 :
   0:   71 11 00 00 00 00 00 00 w1 = *(u8 *)(r1 + 0x0)
   1:   bc 10 08 00 00 00 00 00 w0 = (s8)w1
   2:   95 00 00 00 00 00 00 00 exit
```


https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-26 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye updated 
https://github.com/llvm/llvm-project/pull/108636

>From 8ab5d1c2e36dfa44be637478deeabaa4f586dcee Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 06:44:21 +
Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in
 BPFMISimplifyPatchable.cpp (NFC)

We are planning to add load (specifically, atomic acquiring load, or
"load-acquire") instructions under the STX instruction class.  To make
that easier, rename the isST*() and isLD*() helper functions based on
what the instructions actually do, rather than their instruction class.
---
 .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +--
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp 
b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
index 39390e8c38f8c1..4a1684ccebb793 100644
--- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
+++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
@@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction 
&MFParm) {
   LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n");
 }
 
-static bool isST(unsigned Opcode) {
+static bool isStoreImm(unsigned Opcode) {
   return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm ||
  Opcode == BPF::STW_imm || Opcode == BPF::STD_imm;
 }
 
-static bool isSTX32(unsigned Opcode) {
+static bool isStore32(unsigned Opcode) {
   return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32;
 }
 
-static bool isSTX64(unsigned Opcode) {
+static bool isStore64(unsigned Opcode) {
   return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW ||
  Opcode == BPF::STD;
 }
 
-static bool isLDX32(unsigned Opcode) {
+static bool isLoad32(unsigned Opcode) {
   return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32;
 }
 
-static bool isLDX64(unsigned Opcode) {
+static bool isLoad64(unsigned Opcode) {
   return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW ||
  Opcode == BPF::LDD;
 }
 
-static bool isLDSX(unsigned Opcode) {
+static bool isLoadSext(unsigned Opcode) {
   return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX;
 }
 
 bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) {
-  return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode);
+  return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode);
 }
 
 void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
@@ -143,11 +143,11 @@ void 
BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
 MachineInstr *DefInst = MO.getParent();
 unsigned Opcode = DefInst->getOpcode();
 unsigned COREOp;
-if (isLDX64(Opcode) || isLDSX(Opcode))
+if (isLoad64(Opcode) || isLoadSext(Opcode))
   COREOp = BPF::CORE_LD64;
-else if (isLDX32(Opcode))
+else if (isLoad32(Opcode))
   COREOp = BPF::CORE_LD32;
-else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode))
+else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode))
   COREOp = BPF::CORE_ST;
 else
   continue;
@@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo 
*MRI,
 // Reject the form:
 //   %1 = ADD_rr %2, %3
 //   *(type *)(%2 + 0) = %1
-if (isSTX64(Opcode) || isSTX32(Opcode)) {
+if (isStore64(Opcode) || isStore32(Opcode)) {
   const MachineOperand &Opnd = DefInst->getOperand(0);
   if (Opnd.isReg() && Opnd.getReg() == MO.getReg())
 continue;

>From 4b3b71ffebca7fcc1c083e9f269f08332cbdff0b Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 07:31:54 +
Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions
 under -mcpu=v4

As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4.

The following new flags are defined:

  BPF_ATOMIC_LOAD   0x10
  BPF_ATOMIC_STORE  0x20

  BPF_RELAXED:   0x0
  BPF_CONSUME:   0x1
  BPF_ACQUIRE:   0x2
  BPF_RELEASE:   0x3
  BPF_ACQ_REL:   0x4
  BPF_SEQ_CST:   0x5

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_ATOMIC_LOAD | BPF_ACQUIRE (0x12).

Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_ATOMIC_STORE | BPF_RELEASE (0x23).

Unlike existing atomic operations that only support BPF_W (32-bit) and
BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or 16-bit load-acquire
zero-extends the value before writing it to a 32-bit register, just like
ARM64 instruction LDAPRH and friends.

As an example, for -march=bpfel (big-endian):

  long foo(long *ptr) {
  return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  db 10 00 00 12 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (

[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-23 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

Hi @arsenm, I'm trying to match `atomic_load_zext_{8,16}` in BPF backend but it 
doesn't work:
```
fatal error: error in backend: Cannot select: t8: i32,ch = AtomicLoad<(load 
acquire (s8) from %ir.ptr), zext from i8> t0, t2, demo.bpf.c:3:12
  t2: i64,ch = CopyFromReg t0, Register:i64 %0
t1: i64 = Register %0
...
```
After some digging, I think it is due to this TableGen 
(`llvm/utils/TableGen/CodeGenDAGPatterns.cpp`) change in commit 1ee6ce9bad4d 
("GlobalISel: Allow forming atomic/volatile G_ZEXTLOAD"):
```diff
@@ -1105,6 +1109,10 @@ std::string TreePredicateFn::getPredCode() const {
 Code += "if 
(isReleaseOrStronger(cast(N)->getMergedOrdering())) "
 "return false;\n";
 
+  // TODO: Handle atomic sextload/zextload normally when ATOMIC_LOAD is 
removed.
+  if (isAtomic() && (isZeroExtLoad() || isSignExtLoad()))
+Code += "return false;\n";
+
   if (isLoad() || isStore()) {
 StringRef SDNodeName = isLoad() ? "LoadSDNode" : "StoreSDNode";
```
So I'm having this in `BPFGenDAGISel.inc`:
```cpp
  case 34: {
// Predicate_atomic_load_zext
SDNode *N = Node;
(void)N;
return false;
return true;

  }
```
It unconditionally returns `false`, so nothing would match 
`atomic_load_zext_{8,16}`.
- - -
Could you please shed some light on how to match `atomic_load_{s,z}ext_{8,16}`? 
 Should I learn about GlobalISel?  I'm aware that there's 
`atomic_load_az_{8,16}` (used by AArch64), but I need the `*_zext_*` and 
`*_sext_*` versions here.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-25 Thread Peilin Ye via cfe-commits


@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, 
SelectionDAG &DAG) const {
   return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops);
 }
 
+SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op,
+SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic load is not supported";
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  if (cast(N)->getMergedOrdering() ==
+  AtomicOrdering::SequentiallyConsistent)
+fail(DL, DAG, Msg);
+
+  return Op;
+}
+
+SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op,
+ SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic store is not supported";
+  EVT VT = Op.getOperand(1).getValueType();
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  // Promote operand #1 (value to store) if necessary.
+  if (!isTypeLegal(VT))
+return SDValue();
+
+  if (cast(N)->getMergedOrdering() ==
+  AtomicOrdering::SequentiallyConsistent)
+fail(DL, DAG, Msg);

peilin-ye wrote:

We're starting from `ACQUIRE` and `RELEASE`.  `SEQ_CST` will be supported if 
needed.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-25 Thread Peilin Ye via cfe-commits


@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, 
SelectionDAG &DAG) const {
   return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops);
 }
 
+SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op,
+SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic load is not supported";
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  if (cast(N)->getMergedOrdering() ==
+  AtomicOrdering::SequentiallyConsistent)
+fail(DL, DAG, Msg);
+
+  return Op;
+}
+
+SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op,
+ SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic store is not supported";
+  EVT VT = Op.getOperand(1).getValueType();
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  // Promote operand #1 (value to store) if necessary.
+  if (!isTypeLegal(VT))
+return SDValue();

peilin-ye wrote:

Ah!  I see, thanks!  So:
```diff
--- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
+++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -93,7 +93,7 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM,
 setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, VT, Custom);
   }
 
-  for (auto VT : {MVT::i8, MVT::i16, MVT::i32, MVT::i64}) {
+  for (auto VT : {MVT::i32, MVT::i64}) {
 setOperationAction(ISD::ATOMIC_LOAD, VT, Custom);
 setOperationAction(ISD::ATOMIC_STORE, VT, Custom);
   }
@@ -737,10 +737,6 @@ SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op,
   SDNode *N = Op.getNode();
   SDLoc DL(N);
 
-  // Promote operand #1 (value to store) if necessary.
-  if (!isTypeLegal(VT))
-return SDValue();
-
   if (cast(N)->getMergedOrdering() ==
   AtomicOrdering::SequentiallyConsistent)
 fail(DL, DAG, Msg);
```

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-25 Thread Peilin Ye via cfe-commits


@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, 
SelectionDAG &DAG) const {
   return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops);
 }
 
+SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op,
+SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic load is not supported";

peilin-ye wrote:

OK, let me change it in the next push.

> Plus I think such an error should be relegated to an unsupported legalize 
> action.

Sorry, could you explain your suggestion a bit more?  Are you suggesting that 
we add an `Unsupported` enum value to `LegalizeAction`?

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-24 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

> So your change looks good. Once you get some kernel work in reasonable shape, 
> this patch can land. Thanks!

Sure, thanks for reviewing this!

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-24 Thread Peilin Ye via cfe-commits


@@ -67,6 +67,8 @@ void BPFTargetInfo::getTargetDefines(const LangOptions &Opts,
 Builder.defineMacro("__BPF_FEATURE_SDIV_SMOD");
 Builder.defineMacro("__BPF_FEATURE_GOTOL");
 Builder.defineMacro("__BPF_FEATURE_ST");
+Builder.defineMacro("__BPF_FEATURE_LOAD_ACQUIRE");
+Builder.defineMacro("__BPF_FEATURE_STORE_RELEASE");

peilin-ye wrote:

> But I don't have a good name for that.

+1, but I just noticed that we need to update 
`clang/test/Preprocessor/bpf-predefined-macros.c`.  I'll do it in the next push.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-23 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

@yonghong-song Hi Yonghong, about my earlier question:

> However, I wonder how do I generate a sign-extending (sext), acquiring 
> ATOMIC_LOAD in SelectionDAG?

After more digging, I found this in 
`llvm/include/llvm/CodeGen/TargetLowering.h`:

```cpp
  /// Returns how the platform's atomic operations are extended (ZERO_EXTEND,
  /// SIGN_EXTEND, or ANY_EXTEND).
  virtual ISD::NodeType getExtendForAtomicOps() const {
return ISD::ZERO_EXTEND;
  }
```

If I override it in `class BPFTargetLowering`:

```diff
--- a/llvm/lib/Target/BPF/BPFISelLowering.h
+++ b/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -46,6 +46,10 @@ public:
   // with the given GlobalAddress is legal.
   bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override;
 
+  ISD::NodeType getExtendForAtomicOps() const override {
+return ISD::SIGN_EXTEND;
+  }
+
   BPFTargetLowering::ConstraintType
   getConstraintType(StringRef Constraint) const override;
```
I can then get a `sext` sub-word `AtomicLoad` SelectionDAG node:
```
Optimized legalized selection DAG: %bb.0 'bar:entry'
SelectionDAG has 8 nodes:
  t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
  t8: i32,ch = AtomicLoad<(load acquire (s8) from %ir.ptr), sext from i8> t0, 
t2, demo.bpf.c:3:12
t12: i64 = sign_extend t8, demo.bpf.c:3:12
  t6: ch,glue = CopyToReg t8:1, Register:i64 $r0, t12, demo.bpf.c:3:5
  t7: ch = BPFISD::RET_GLUE t6, Register:i64 $r0, t6:1, demo.bpf.c:3:5
```
...but now I can't have `zext` :-)
- - -
Looks like we can't have both `sext` and `zext` sub-word atomic load, unless we 
change `getExtendForAtomicOps()` and how it's used.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-25 Thread Peilin Ye via cfe-commits


@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, 
SelectionDAG &DAG) const {
   return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops);
 }
 
+SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op,
+SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic load is not supported";
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  if (cast(N)->getMergedOrdering() ==
+  AtomicOrdering::SequentiallyConsistent)
+fail(DL, DAG, Msg);
+
+  return Op;
+}
+
+SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op,
+ SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic store is not supported";
+  EVT VT = Op.getOperand(1).getValueType();
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  // Promote operand #1 (value to store) if necessary.
+  if (!isTypeLegal(VT))
+return SDValue();
+
+  if (cast(N)->getMergedOrdering() ==
+  AtomicOrdering::SequentiallyConsistent)
+fail(DL, DAG, Msg);

peilin-ye wrote:

If user used `__atomic_store[_n]()` with `__ATOMIC_SEQ_CST`, we trigger this 
`if` clause and print out an error like:
```
demo.bpf.c:3:5: error: sequentially consistent (seq_cst) atomic store is not 
supported
3 | __atomic_store_n(ptr, val, __ATOMIC_SEQ_CST);
  | ^
```
Or are you asking about something else?

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-25 Thread Peilin Ye via cfe-commits


@@ -48,6 +48,13 @@ def BPF_END  : BPFArithOp<0xd>;
 def BPF_XCHG: BPFArithOp<0xe>;
 def BPF_CMPXCHG : BPFArithOp<0xf>;
 
+class BPFAtomicLoadStoreOp val> {
+  bits<4> Value = val;
+}
+
+def BPF_LOAD_ACQ : BPFAtomicLoadStoreOp<0x1>;
+def BPF_STORE_REL : BPFAtomicLoadStoreOp<0xb>;

peilin-ye wrote:

> So I tempt to use 0x2.

I see, let's just use `0x1` and `0x2`.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-26 Thread Peilin Ye via cfe-commits


@@ -0,0 +1,106 @@
+; RUN: llc < %s -march=bpfel -mcpu=v4 -verify-machineinstrs -show-mc-encoding \
+; RUN:   | FileCheck -check-prefixes=CHECK-LE %s
+; RUN: llc < %s -march=bpfeb -mcpu=v4 -verify-machineinstrs -show-mc-encoding \
+; RUN:   | FileCheck -check-prefixes=CHECK-BE %s
+
+; Source:
+;   char load_acquire_i8(char *p) {
+; return __atomic_load_n(p, __ATOMIC_ACQUIRE);
+;   }
+;   short load_acquire_i16(short *p) {
+; return __atomic_load_n(p, __ATOMIC_ACQUIRE);
+;   }
+;   int load_acquire_i32(int *p) {
+; return __atomic_load_n(p, __ATOMIC_ACQUIRE);
+;   }
+;   long load_acquire_i64(long *p) {
+; return __atomic_load_n(p, __ATOMIC_ACQUIRE);
+;   }
+;   void store_release_i8(char *p, char v) {
+; __atomic_store_n(p, v, __ATOMIC_RELEASE);
+;   }
+;   void store_release_i16(short *p, short v) {
+; __atomic_store_n(p, v, __ATOMIC_RELEASE);
+;   }
+;   void store_release_i32(int *p, int v) {
+; __atomic_store_n(p, v, __ATOMIC_RELEASE);
+;   }
+;   void store_release_i64(long *p, long v) {
+; __atomic_store_n(p, v, __ATOMIC_RELEASE);
+;   }
+
+; CHECK-LABEL: load_acquire_i8
+; CHECK-LE: w0 = load_acquire((u8 *)(r1 + 0)) # encoding: 
[0xd3,0x10,0x00,0x00,0x10,0x00,0x00,0x00]
+; CHECK-BE: w0 = load_acquire((u8 *)(r1 + 0)) # encoding: 
[0xd3,0x01,0x00,0x00,0x00,0x00,0x00,0x10]
+define dso_local i8 @load_acquire_i8(ptr nocapture noundef readonly %p) 
local_unnamed_addr {
+entry:

peilin-ye wrote:

Got it, thanks for reviewing this!  I'll run `acquire-release.ll` through 
`update_llc_test_checks.py`.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-26 Thread Peilin Ye via cfe-commits


@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, 
SelectionDAG &DAG) const {
   return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops);
 }
 
+SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op,
+SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic load is not supported";
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  if (cast(N)->getMergedOrdering() ==
+  AtomicOrdering::SequentiallyConsistent)
+fail(DL, DAG, Msg);
+
+  return Op;
+}
+
+SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op,
+ SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic store is not supported";
+  EVT VT = Op.getOperand(1).getValueType();
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  // Promote operand #1 (value to store) if necessary.
+  if (!isTypeLegal(VT))
+return SDValue();
+
+  if (cast(N)->getMergedOrdering() ==
+  AtomicOrdering::SequentiallyConsistent)
+fail(DL, DAG, Msg);

peilin-ye wrote:

I see.

> Currently as per my understanding the new instructions does not have this 
> ordering field (the asm, the encoding)?

Right now, `BPF_LOAD_ACQ` and `BPF_STORE_REL` use bit `4-7` of the `imm` field. 
 If we ever need `SEQ_CST`, we can add a new flag to bit `0-3` to indicate 
"this atomic load is sequentially consistent", similar to what we already do 
for other `BPF_ATOMIC` instructions: e.g. `BPF_XOR` in `imm{4-7}` means "it's 
an atomic XOR", then `BPF_FETCH` in `imm{0-3}` indicates whether "it returns 
the old value" or not.
- - -
Right now, this PR does:
```
   | imm{0-3} | imm{4-7}|
 - |  | --- |
 load-acquire  | 0x0  | BPF_LOAD_ACQ (0x1)  |
 store-release | 0x0  | BPF_STORE_REL (0x2) |

```
Let me change it to (encodings unchanged):
```
   | imm{0-3}  | imm{4-7}|
 - | - | --- |
 load-acquire  | BPF_ACQUIRE (0x0) | BPF_LOAD (0x1)  |
 store-release | BPF_RELEASE (0x0) | BPF_STORE (0x2) |
```
So that, in the future, we can have e.g.:
```
   | imm{0-3}  | imm{4-7}|
 - | - | --- |
 load-acquire  | BPF_ACQUIRE (0x0) | BPF_LOAD (0x1)  |
 load-seq_cst  | BPF_SEQ_CST (0x1) | BPF_LOAD (0x1)  |
 store-release | BPF_RELEASE (0x0) | BPF_STORE (0x2) |
 store-seq_cst | BPF_SEQ_CST (0x1) | BPF_STORE (0x2) |
```


https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-26 Thread Peilin Ye via cfe-commits


@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, 
SelectionDAG &DAG) const {
   return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops);
 }
 
+SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op,
+SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic load is not supported";
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  if (cast(N)->getMergedOrdering() ==
+  AtomicOrdering::SequentiallyConsistent)
+fail(DL, DAG, Msg);
+
+  return Op;
+}
+
+SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op,
+ SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic store is not supported";
+  EVT VT = Op.getOperand(1).getValueType();
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  // Promote operand #1 (value to store) if necessary.
+  if (!isTypeLegal(VT))
+return SDValue();
+
+  if (cast(N)->getMergedOrdering() ==
+  AtomicOrdering::SequentiallyConsistent)
+fail(DL, DAG, Msg);

peilin-ye wrote:

> Let me change it to (encodings unchanged):
> ``` 
>   | imm{0-3}  | imm{4-7}|
> - | - | --- |
> load-acquire  | BPF_ACQUIRE (0x0) | BPF_LOAD (0x1)  |
> store-release | BPF_RELEASE (0x0) | BPF_STORE (0x2) |
> ```

Actually, let's change the encoding and define [all 6 of 
them](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4950.pdf#page=1817)
 once and for all:
```
namespace std {
  enum class memory_order : unspecified {
relaxed, consume, acquire, release, acq_rel, seq_cst
  };
}
```
Will explain more after the push.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-25 Thread Peilin Ye via cfe-commits


@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, 
SelectionDAG &DAG) const {
   return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops);
 }
 
+SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op,
+SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic load is not supported";
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  if (cast(N)->getMergedOrdering() ==
+  AtomicOrdering::SequentiallyConsistent)
+fail(DL, DAG, Msg);
+
+  return Op;
+}
+
+SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op,
+ SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic store is not supported";
+  EVT VT = Op.getOperand(1).getValueType();
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  // Promote operand #1 (value to store) if necessary.
+  if (!isTypeLegal(VT))
+return SDValue();

peilin-ye wrote:

```c
// clang --target=bpf -mcpu=v4 -O2 -g -c -o demo.bpf.o demo.bpf.c
void bar(char *ptr, char val) {
__atomic_store_n(ptr, val, __ATOMIC_RELEASE);
}
```
Compiling the above program would trigger that if clause:
```
...
llvm::SelectionDAG::LegalizeTypes()
  llvm::DAGTypeLegalizer::run()
llvm::DAGTypeLegalizer::PromoteIntegerOperand()
  llvm::DAGTypeLegalizer::CustomLowerNode()
llvm::TargetLowering::LowerOperationWrapper()
  llvm::BPFTargetLowering::LowerATOMIC_STORE()
```
BPF doesn't have native support for `i8` (see `BPFISelLowering.cpp`) :
```cpp
  // Set up the register classes.
  addRegisterClass(MVT::i64, &BPF::GPRRegClass);
  if (STI.getHasAlu32())
addRegisterClass(MVT::i32, &BPF::GPR32RegClass);
```

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-31 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye updated 
https://github.com/llvm/llvm-project/pull/108636

>From 885d5141f6707a0fdf4be363351083f8fdf8fd54 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 06:44:21 +
Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in
 BPFMISimplifyPatchable.cpp (NFC)

We are planning to add load (specifically, atomic acquiring load, or
"load-acquire") instructions under the STX instruction class.  To make
that easier, rename the isST*() and isLD*() helper functions based on
what the instructions actually do, rather than their instruction class.
---
 .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +--
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp 
b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
index 39390e8c38f8c1..4a1684ccebb793 100644
--- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
+++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
@@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction 
&MFParm) {
   LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n");
 }
 
-static bool isST(unsigned Opcode) {
+static bool isStoreImm(unsigned Opcode) {
   return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm ||
  Opcode == BPF::STW_imm || Opcode == BPF::STD_imm;
 }
 
-static bool isSTX32(unsigned Opcode) {
+static bool isStore32(unsigned Opcode) {
   return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32;
 }
 
-static bool isSTX64(unsigned Opcode) {
+static bool isStore64(unsigned Opcode) {
   return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW ||
  Opcode == BPF::STD;
 }
 
-static bool isLDX32(unsigned Opcode) {
+static bool isLoad32(unsigned Opcode) {
   return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32;
 }
 
-static bool isLDX64(unsigned Opcode) {
+static bool isLoad64(unsigned Opcode) {
   return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW ||
  Opcode == BPF::LDD;
 }
 
-static bool isLDSX(unsigned Opcode) {
+static bool isLoadSext(unsigned Opcode) {
   return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX;
 }
 
 bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) {
-  return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode);
+  return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode);
 }
 
 void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
@@ -143,11 +143,11 @@ void 
BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
 MachineInstr *DefInst = MO.getParent();
 unsigned Opcode = DefInst->getOpcode();
 unsigned COREOp;
-if (isLDX64(Opcode) || isLDSX(Opcode))
+if (isLoad64(Opcode) || isLoadSext(Opcode))
   COREOp = BPF::CORE_LD64;
-else if (isLDX32(Opcode))
+else if (isLoad32(Opcode))
   COREOp = BPF::CORE_LD32;
-else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode))
+else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode))
   COREOp = BPF::CORE_ST;
 else
   continue;
@@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo 
*MRI,
 // Reject the form:
 //   %1 = ADD_rr %2, %3
 //   *(type *)(%2 + 0) = %1
-if (isSTX64(Opcode) || isSTX32(Opcode)) {
+if (isStore64(Opcode) || isStore32(Opcode)) {
   const MachineOperand &Opnd = DefInst->getOperand(0);
   if (Opnd.isReg() && Opnd.getReg() == MO.getReg())
 continue;

>From db3cef60b0b8194e8ebd0ce0a0bc792f114c15f3 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 07:31:54 +
Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions
 under -mcpu=v4

As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4.

The following new flags are defined:

  BPF_ATOMIC_LOAD   0x10
  BPF_ATOMIC_STORE  0x20

  BPF_RELAXED:   0x0
  BPF_ACQUIRE:   0x1
  BPF_RELEASE:   0x2
  BPF_ACQ_REL:   0x3
  BPF_SEQ_CST:   0x4

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_ATOMIC_LOAD | BPF_ACQUIRE (0x11).

Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_ATOMIC_STORE | BPF_RELEASE (0x22).

Unlike existing atomic operations that only support BPF_W (32-bit) and
BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or 16-bit load-acquire
zero-extends the value before writing it to a 32-bit register, just like
ARM64 instruction LDAPRH and friends.

As an example, for -march=bpfel (big-endian):

  long foo(long *ptr) {
  return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  db 10 00 00 11 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x0011): BPF_ATOMIC_L

[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-31 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

Pushed v9:
 - Rewrote `acquire-release.ll` to make it also test `__ATOMIC_RELAXED` 
(renamed to `atomic-load-store.ll`, just like e.g. 
`llvm/test/CodeGen/X86/atomic-load-store.ll`)
 - Style change as suggested by Matt

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-31 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye updated 
https://github.com/llvm/llvm-project/pull/108636

>From 885d5141f6707a0fdf4be363351083f8fdf8fd54 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 06:44:21 +
Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in
 BPFMISimplifyPatchable.cpp (NFC)

We are planning to add load (specifically, atomic acquiring load, or
"load-acquire") instructions under the STX instruction class.  To make
that easier, rename the isST*() and isLD*() helper functions based on
what the instructions actually do, rather than their instruction class.
---
 .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +--
 1 file changed, 11 insertions(+), 11 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp 
b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
index 39390e8c38f8c1..4a1684ccebb793 100644
--- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
+++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp
@@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction 
&MFParm) {
   LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n");
 }
 
-static bool isST(unsigned Opcode) {
+static bool isStoreImm(unsigned Opcode) {
   return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm ||
  Opcode == BPF::STW_imm || Opcode == BPF::STD_imm;
 }
 
-static bool isSTX32(unsigned Opcode) {
+static bool isStore32(unsigned Opcode) {
   return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32;
 }
 
-static bool isSTX64(unsigned Opcode) {
+static bool isStore64(unsigned Opcode) {
   return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW ||
  Opcode == BPF::STD;
 }
 
-static bool isLDX32(unsigned Opcode) {
+static bool isLoad32(unsigned Opcode) {
   return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32;
 }
 
-static bool isLDX64(unsigned Opcode) {
+static bool isLoad64(unsigned Opcode) {
   return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW ||
  Opcode == BPF::LDD;
 }
 
-static bool isLDSX(unsigned Opcode) {
+static bool isLoadSext(unsigned Opcode) {
   return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX;
 }
 
 bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) {
-  return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode);
+  return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode);
 }
 
 void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
@@ -143,11 +143,11 @@ void 
BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI,
 MachineInstr *DefInst = MO.getParent();
 unsigned Opcode = DefInst->getOpcode();
 unsigned COREOp;
-if (isLDX64(Opcode) || isLDSX(Opcode))
+if (isLoad64(Opcode) || isLoadSext(Opcode))
   COREOp = BPF::CORE_LD64;
-else if (isLDX32(Opcode))
+else if (isLoad32(Opcode))
   COREOp = BPF::CORE_LD32;
-else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode))
+else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode))
   COREOp = BPF::CORE_ST;
 else
   continue;
@@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo 
*MRI,
 // Reject the form:
 //   %1 = ADD_rr %2, %3
 //   *(type *)(%2 + 0) = %1
-if (isSTX64(Opcode) || isSTX32(Opcode)) {
+if (isStore64(Opcode) || isStore32(Opcode)) {
   const MachineOperand &Opnd = DefInst->getOperand(0);
   if (Opnd.isReg() && Opnd.getReg() == MO.getReg())
 continue;

>From cd4e9647f942a2f8d57fcd95219744d89f3e2cf3 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Sat, 5 Oct 2024 07:31:54 +
Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions
 under -mcpu=v4

As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v4.

The following new flags are defined:

  BPF_ATOMIC_LOAD   0x10
  BPF_ATOMIC_STORE  0x20

  BPF_RELAXED:   0x0
  BPF_ACQUIRE:   0x1
  BPF_RELEASE:   0x2
  BPF_ACQ_REL:   0x3
  BPF_SEQ_CST:   0x4

A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm'
field set to BPF_ATOMIC_LOAD | BPF_ACQUIRE (0x11).

Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with
the 'imm' field set to BPF_ATOMIC_STORE | BPF_RELEASE (0x22).

Unlike existing atomic operations that only support BPF_W (32-bit) and
BPF_DW (64-bit) size modifiers, load-acquires and store-releases also
support BPF_B (8-bit) and BPF_H (16-bit).  An 8- or 16-bit load-acquire
zero-extends the value before writing it to a 32-bit register, just like
ARM64 instruction LDAPRH and friends.

As an example, for -march=bpfel (big-endian):

  long foo(long *ptr) {
  return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
  }

foo() can be compiled to:

  db 10 00 00 11 00 00 00  r0 = load_acquire((u64 *)(r1 + 0x0))
  95 00 00 00 00 00 00 00  exit

  opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX
  imm (0x0011): BPF_ATOMIC_L

[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-31 Thread Peilin Ye via cfe-commits


@@ -703,6 +714,20 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, 
SelectionDAG &DAG) const {
   return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops);
 }
 
+SDValue BPFTargetLowering::LowerATOMIC_LOAD_STORE(SDValue Op,
+  SelectionDAG &DAG) const {
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  if (cast(N)->getMergedOrdering() ==
+  AtomicOrdering::SequentiallyConsistent)
+fail(
+DL, DAG,
+"sequentially consistent (seq_cst) atomic load/store is not 
supported");

peilin-ye wrote:

Got it, thanks!

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-09-18 Thread Peilin Ye via cfe-commits


@@ -69,29 +69,25 @@ void BPFSubtarget::initSubtargetFeatures(StringRef CPU, 
StringRef FS) {
 CPU = "v3";
   if (CPU == "probe")
 CPU = sys::detail::getHostCPUNameForBPF();
-  if (CPU == "generic" || CPU == "v1")
+  if (CPU.empty() || CPU == "generic" || CPU == "v1")

peilin-ye wrote:

Ah, right, because recently #107008 made it default to `"v3"`.  Thanks, I 
didn't notice it when rebasing.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-09-20 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

@eddyz87,

Thanks!  I didn't know about `XXXISelLowering.cpp`.

> But there should be a way to tweak existing `fail` function to stop after 
> errors are reported.

Can we `exit(1)` ? :-)

`fail()` calls `LLVMContext::diagnose()`, which already `exit(1)` when there's 
no "report handler", if "severity" is `DS_Error` :

```cpp
  if (DI.getSeverity() == DS_Error)
exit(1);
}
```
`fail()` uses `DiagnosticInfoUnsupported`, whose "severity" \_is\_ `DS_Error`, 
but our "report handler" (`pImpl->DiagHandler->handleDiagnostics()`) doesn't 
call `exit()` ...
- - -
I tried, based on your diff (`__ATOMIC_ACQ_REL` is illegal for 
`__atomic_{load,store}{,_n}()`, so we only need to handle 
`AtomicOrdering::SequentiallyConsistent`) :

```diff
--- a/llvm/lib/Target/BPF/BPFISelLowering.cpp
+++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp
@@ -93,6 +93,9 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM,
 setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, VT, Custom);
   }
 
+  for (auto VT : {MVT::i8, MVT::i16, MVT::i32, MVT::i64})
+setOperationAction(ISD::ATOMIC_LOAD, VT, Custom);
+
   for (auto VT : { MVT::i32, MVT::i64 }) {
 if (VT == MVT::i32 && !STI.getHasAlu32())
   continue;
@@ -291,6 +294,8 @@ void BPFTargetLowering::ReplaceNodeResults(
 else
   Msg = "unsupported atomic operation, please use 64 bit version";
 break;
+  case ISD::ATOMIC_LOAD:
+return;
   }
 
   SDLoc DL(N);
@@ -316,6 +321,8 @@ SDValue BPFTargetLowering::LowerOperation(SDValue Op, 
SelectionDAG &DAG) const {
 return LowerSDIVSREM(Op, DAG);
   case ISD::DYNAMIC_STACKALLOC:
 return LowerDYNAMIC_STACKALLOC(Op, DAG);
+  case ISD::ATOMIC_LOAD:
+return LowerATOMIC_LOAD(Op, DAG);
   }
 }
 
@@ -703,6 +710,22 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, 
SelectionDAG &DAG) const {
   return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops);
 }
 
+SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op,
+SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic load is not supported";
+  SDNode *N = Op.getNode();
+  SDLoc DL(N);
+
+  if (cast(N)->getMergedOrdering() ==
+  AtomicOrdering::SequentiallyConsistent) {
+fail(DL, DAG, Msg);
+exit(1);
+  }
+
+  return Op;
+}
+
 const char *BPFTargetLowering::getTargetNodeName(unsigned Opcode) const {
   switch ((BPFISD::NodeType)Opcode) {
   case BPFISD::FIRST_NUMBER:
```
```diff
--- a/llvm/lib/Target/BPF/BPFISelLowering.h
+++ b/llvm/lib/Target/BPF/BPFISelLowering.h
@@ -77,7 +77,7 @@ private:
   SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const;
-
+  SDValue LowerATOMIC_LOAD(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerConstantPool(SDValue Op, SelectionDAG &DAG) const;
   SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const;
 
```
which seems to work nicely:
```
$ cat bar.c
char foo(char *ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); }
$
$ clang --target=bpf -mcpu=v5 -g bar.c > /dev/null
bar.c:1:6: error: sequentially consistent (seq_cst) atomic load is not supported
1 | char foo(char *ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); }
  |  ^
$
```



https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-09-24 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

> lgtm, but please note CI failure:

Oops, I'll take a closer look later.  Thanks!

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-09-24 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

@eddyz87,

> Sorry for the delayed reply.

No worries at all!

> I'd say let's go with what you suggest, but avoid exit(1) call. We need to 
> figure out how to do fail() w/o backtrace, but that is outside of the scope 
> for this pull request.

Got it, rebased and added a 4th commit (can't say I like the one-commit-per-PR 
policy either :-) to do this in v3.  Please take another look.

- - -
Also, just a heads-up: I think this PR is blocked by [an issue that I mentioned 
earlier on LKML](https://lore.kernel.org/all/zujillqbegosx...@google.com/) (Cc: 
@4ast).  After that is resolved, I think we should change this PR to use `0x5` 
for the new mode modifiers (`BPF_MEMACQ`, `BPF_MEMREL`).

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-09-24 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye updated 
https://github.com/llvm/llvm-project/pull/108636

>From 68d003f7156b16656a11a1395b88b6fbed368401 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Thu, 15 Aug 2024 21:49:23 +
Subject: [PATCH 1/4] [BPF] Refactor BPFSubtarget::initSubtargetFeatures()
 (NFC)

Refactor it to make it match the style of
BPFTargetInfo::getTargetDefines(), so that when we add -mcpu=v5 in the
future, we can simply append e.g.:

  if (CpuVerNum >= 5)
HasFoo = true;

instead of saying:

  if (CPU == "v5") {
HasJmpExt = true;
HasJmp32 = true;
HasAlu32 = true;
HasLdsx = !Disable_ldsx;
HasMovsx = !Disable_movsx;
HasBswap = !Disable_bswap;
HasSdivSmod = !Disable_sdiv_smod;
HasGotol = !Disable_gotol;
HasStoreImm = !Disable_StoreImm;
HasFoo = true;
  }

This also makes it clearer that newer "CPU"s always support older
features.  No functional changes intended.
---
 llvm/lib/Target/BPF/BPFSubtarget.cpp | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFSubtarget.cpp 
b/llvm/lib/Target/BPF/BPFSubtarget.cpp
index 305e9a2bf2cda3..d5b20403d1e42d 100644
--- a/llvm/lib/Target/BPF/BPFSubtarget.cpp
+++ b/llvm/lib/Target/BPF/BPFSubtarget.cpp
@@ -71,27 +71,23 @@ void BPFSubtarget::initSubtargetFeatures(StringRef CPU, 
StringRef FS) {
 CPU = sys::detail::getHostCPUNameForBPF();
   if (CPU == "generic" || CPU == "v1")
 return;
-  if (CPU == "v2") {
-HasJmpExt = true;
-return;
-  }
-  if (CPU == "v3") {
+
+  int CpuVerNum = CPU.back() - '0';
+  if (CpuVerNum >= 2)
 HasJmpExt = true;
+
+  if (CpuVerNum >= 3) {
 HasJmp32 = true;
 HasAlu32 = true;
-return;
   }
-  if (CPU == "v4") {
-HasJmpExt = true;
-HasJmp32 = true;
-HasAlu32 = true;
+
+  if (CpuVerNum >= 4) {
 HasLdsx = !Disable_ldsx;
 HasMovsx = !Disable_movsx;
 HasBswap = !Disable_bswap;
 HasSdivSmod = !Disable_sdiv_smod;
 HasGotol = !Disable_gotol;
 HasStoreImm = !Disable_StoreImm;
-return;
   }
 }
 

>From a59b60c2e9d0b0f1e2ce3b25b70d46957af4d091 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Fri, 23 Aug 2024 19:11:05 +
Subject: [PATCH 2/4] [BPF] Refactor {LOAD,STORE}{,32} classes (NFC)

We will need different AsmString formats for load-acquire and
store-release instructions.  To make that easier, refactor
{LOAD,STORE}{,32} classes to take AsmString as an argument directly.
Add a BPFModeModifer parameter to STORE{,32} for similar reasons.

No functional changes intended.
---
 llvm/lib/Target/BPF/BPFInstrInfo.td | 35 ++---
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td 
b/llvm/lib/Target/BPF/BPFInstrInfo.td
index f7e17901c7ed5e..aab2291a70efc2 100644
--- a/llvm/lib/Target/BPF/BPFInstrInfo.td
+++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -497,12 +497,11 @@ def LD_pseudo
 }
 
 // STORE instructions
-class STORE Pattern>
-: TYPE_LD_ST Pattern>
+: TYPE_LD_ST {
+ AsmString, Pattern> {
   bits<4> src;
   bits<20> addr;
 
@@ -513,7 +512,7 @@ class STORE Pattern>
 }
 
 class STOREi64
-: STORE;
+: STORE;
 
 let Predicates = [BPFNoALU32] in {
   def STW : STOREi64;
@@ -567,12 +566,11 @@ let Predicates = [BPFHasALU32, BPFHasStoreImm] in {
 }
 
 // LOAD instructions
-class LOAD Pattern>
+class LOAD Pattern>
 : TYPE_LD_ST {
+ AsmString, Pattern> {
   bits<4> dst;
   bits<20> addr;
 
@@ -583,7 +581,8 @@ class LOAD
-: LOAD;
+: LOAD;
 
 let isCodeGenOnly = 1 in {
   class CORE_LD
@@ -1069,12 +1068,11 @@ def : Pat<(i32 (trunc GPR:$src)),
 def : Pat<(i64 (anyext GPR32:$src)),
   (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$src, sub_32)>;
 
-class STORE32 Pattern>
-: TYPE_LD_ST Pattern>
+: TYPE_LD_ST {
+ AsmString, Pattern> {
   bits<4> src;
   bits<20> addr;
 
@@ -1085,7 +1083,8 @@ class STORE32 Pattern>
 }
 
 class STOREi32
-: STORE32;
+: STORE32;
 
 let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
   def STW32 : STOREi32;
@@ -1093,12 +1092,11 @@ let Predicates = [BPFHasALU32], DecoderNamespace = 
"BPFALU32" in {
   def STB32 : STOREi32;
 }
 
-class LOAD32 Pattern>
+class LOAD32 Pattern>
 : TYPE_LD_ST {
+AsmString, Pattern> {
   bits<4> dst;
   bits<20> addr;
 
@@ -1109,7 +1107,8 @@ class LOAD32
-: LOAD32;
+: LOAD32;
 
 let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
   def LDW32 : LOADi32;

>From 4c209da8a1ec4bfda71ffd0b51accae84a39bdd5 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Tue, 10 Sep 2024 23:09:28 +
Subject: [PATCH 3/4] [BPF] Add load-acquire and store-release instructions
 under -mcpu=v5

As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v5.

A "load_acquire" is a BPF_LDX instruction with a new mode modifier,
BPF_MEMACQ ("acquiring atomic load").  Similarly, a "store_release" is a
BPF_STX 

[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-09-19 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye edited 
https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-09-19 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye updated 
https://github.com/llvm/llvm-project/pull/108636

>From 087c7eb6e5fdd6866ee5209ce9cea93864dd3c6d Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Thu, 15 Aug 2024 21:49:23 +
Subject: [PATCH 1/3] [BPF] Refactor BPFSubtarget::initSubtargetFeatures()
 (NFC)

Refactor it to make it match the style of
BPFTargetInfo::getTargetDefines(), so that when we add -mcpu=v5 in the
future, we can simply append e.g.:

  if (CpuVerNum >= 5)
HasFoo = true;

instead of saying:

  if (CPU == "v5") {
HasJmpExt = true;
HasJmp32 = true;
HasAlu32 = true;
HasLdsx = !Disable_ldsx;
HasMovsx = !Disable_movsx;
HasBswap = !Disable_bswap;
HasSdivSmod = !Disable_sdiv_smod;
HasGotol = !Disable_gotol;
HasStoreImm = !Disable_StoreImm;
HasFoo = true;
  }

This also makes it clearer that newer "CPU"s always support older
features.  No functional changes intended.
---
 llvm/lib/Target/BPF/BPFSubtarget.cpp | 18 +++---
 1 file changed, 7 insertions(+), 11 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFSubtarget.cpp 
b/llvm/lib/Target/BPF/BPFSubtarget.cpp
index 305e9a2bf2cda3..d5b20403d1e42d 100644
--- a/llvm/lib/Target/BPF/BPFSubtarget.cpp
+++ b/llvm/lib/Target/BPF/BPFSubtarget.cpp
@@ -71,27 +71,23 @@ void BPFSubtarget::initSubtargetFeatures(StringRef CPU, 
StringRef FS) {
 CPU = sys::detail::getHostCPUNameForBPF();
   if (CPU == "generic" || CPU == "v1")
 return;
-  if (CPU == "v2") {
-HasJmpExt = true;
-return;
-  }
-  if (CPU == "v3") {
+
+  int CpuVerNum = CPU.back() - '0';
+  if (CpuVerNum >= 2)
 HasJmpExt = true;
+
+  if (CpuVerNum >= 3) {
 HasJmp32 = true;
 HasAlu32 = true;
-return;
   }
-  if (CPU == "v4") {
-HasJmpExt = true;
-HasJmp32 = true;
-HasAlu32 = true;
+
+  if (CpuVerNum >= 4) {
 HasLdsx = !Disable_ldsx;
 HasMovsx = !Disable_movsx;
 HasBswap = !Disable_bswap;
 HasSdivSmod = !Disable_sdiv_smod;
 HasGotol = !Disable_gotol;
 HasStoreImm = !Disable_StoreImm;
-return;
   }
 }
 

>From 5b64851a977b51af96b1d86aba14ca70968ef4e0 Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Fri, 23 Aug 2024 19:11:05 +
Subject: [PATCH 2/3] [BPF] Refactor {LOAD,STORE}{,32} classes (NFC)

We will need different AsmString formats for load-acquire and
store-release instructions.  To make that easier, refactor
{LOAD,STORE}{,32} classes to take AsmString as an argument directly.
Add a BPFModeModifer parameter to STORE{,32} for similar reasons.

No functional changes intended.
---
 llvm/lib/Target/BPF/BPFInstrInfo.td | 35 ++---
 1 file changed, 17 insertions(+), 18 deletions(-)

diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td 
b/llvm/lib/Target/BPF/BPFInstrInfo.td
index f7e17901c7ed5e..aab2291a70efc2 100644
--- a/llvm/lib/Target/BPF/BPFInstrInfo.td
+++ b/llvm/lib/Target/BPF/BPFInstrInfo.td
@@ -497,12 +497,11 @@ def LD_pseudo
 }
 
 // STORE instructions
-class STORE Pattern>
-: TYPE_LD_ST Pattern>
+: TYPE_LD_ST {
+ AsmString, Pattern> {
   bits<4> src;
   bits<20> addr;
 
@@ -513,7 +512,7 @@ class STORE Pattern>
 }
 
 class STOREi64
-: STORE;
+: STORE;
 
 let Predicates = [BPFNoALU32] in {
   def STW : STOREi64;
@@ -567,12 +566,11 @@ let Predicates = [BPFHasALU32, BPFHasStoreImm] in {
 }
 
 // LOAD instructions
-class LOAD Pattern>
+class LOAD Pattern>
 : TYPE_LD_ST {
+ AsmString, Pattern> {
   bits<4> dst;
   bits<20> addr;
 
@@ -583,7 +581,8 @@ class LOAD
-: LOAD;
+: LOAD;
 
 let isCodeGenOnly = 1 in {
   class CORE_LD
@@ -1069,12 +1068,11 @@ def : Pat<(i32 (trunc GPR:$src)),
 def : Pat<(i64 (anyext GPR32:$src)),
   (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$src, sub_32)>;
 
-class STORE32 Pattern>
-: TYPE_LD_ST Pattern>
+: TYPE_LD_ST {
+ AsmString, Pattern> {
   bits<4> src;
   bits<20> addr;
 
@@ -1085,7 +1083,8 @@ class STORE32 Pattern>
 }
 
 class STOREi32
-: STORE32;
+: STORE32;
 
 let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
   def STW32 : STOREi32;
@@ -1093,12 +1092,11 @@ let Predicates = [BPFHasALU32], DecoderNamespace = 
"BPFALU32" in {
   def STB32 : STOREi32;
 }
 
-class LOAD32 Pattern>
+class LOAD32 Pattern>
 : TYPE_LD_ST {
+AsmString, Pattern> {
   bits<4> dst;
   bits<20> addr;
 
@@ -1109,7 +1107,8 @@ class LOAD32
-: LOAD32;
+: LOAD32;
 
 let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in {
   def LDW32 : LOADi32;

>From 692d1309c42711397feb780f2e37c1495cecc51c Mon Sep 17 00:00:00 2001
From: Peilin Ye 
Date: Tue, 10 Sep 2024 23:09:28 +
Subject: [PATCH 3/3] [BPF] Add load-acquire and store-release instructions
 under -mcpu=v5

As discussed in [1], introduce BPF instructions with load-acquire and
store-release semantics under -mcpu=v5.

A "load_acquire" is a BPF_LDX instruction with a new mode modifier,
BPF_MEMACQ ("acquiring atomic load").  Similarly, a "store_release" is a
BPF_STX 

[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-09-19 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

(pushed v2 to resolve the easier issues first :-)

- deleted redundant `CPU.empty()` check
- for now, fall back to `ACQUIRE` and `RELEASE` if user requested weaker memory 
orders (`RELAXED` or `CONSUME`), until we actually support them
  - took Eduard's suggestion in #107343 to use `foreach`, thanks!

I couldn't find a way to generate a better error message for 
`__ATOMIC_SEQ_CST`, however; it seems that CodeGen simply calls 
`CannotYetSelect()` if nothing in `MatcherTable` matches.  Any suggestions?

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-09-18 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

Hi @eddyz87, thanks for the review and context!

> ```
> lock *(u64 *)(r1 + 0x0) = r2 release
> lock r2 = *(u64 *)(r1 + 0x0) acquire
> ```

Appending `acquire` and `release` does sound nice to me since it makes the 
syntax more similar to LLVM IR (e.g. `store atomic i64 %v, ptr %p release, 
align 8`), and it'd also be a bit easier when we support other memory ordering 
types in the future e.g. we can just append `seq_cst` instead of having to say 
`load_seq_cst(...)`.

However, I didn't want to use `lock` because I feel like it's too similar to 
the x86 `LOCK` prefix (implies a full memory barrier, which could be confusing 
here).  WDYT?  Cc: @yonghong-song @4ast

> Also a question regarding 3 commits in one pull request. As far as I 
> understand [current](https://llvm.org/docs/GitHub.html) policy, the idea is 
> to have a separate pull request for each commit (and fork branches from one 
> another to build a stack). Otherwise the commits are squashed. What's the 
> idea with current pull request?

Got it, I'll split this into separate PRs later.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-09-18 Thread Peilin Ye via cfe-commits


@@ -621,6 +642,16 @@ let Predicates = [BPFHasLdsx] in {
 
 def LDD : LOADi64;
 
+class acquiring_load
+: PatFrag<(ops node:$ptr), (base node:$ptr)> {
+  let IsAtomic = 1;

peilin-ye wrote:

Got it, I'll make it generate acquire and release for weaker types, and look 
into how to make the error message more useful for stronger types.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-10-01 Thread Peilin Ye via cfe-commits


@@ -522,6 +526,28 @@ let Predicates = [BPFNoALU32] in {
 }
 def STD : STOREi64;
 
+class relaxed_store
+  : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> {
+  let IsAtomic = 1;
+  let IsAtomicOrderingReleaseOrStronger = 0;
+}
+
+class releasing_store
+  : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> {
+  let IsAtomic = 1;
+  let IsAtomicOrderingRelease = 1;
+}

peilin-ye wrote:

For `ATOMIC_STORE`: `IsAtomicOrderingRelease = 1` only matches `RELEASE`. 
 `IsAtomicOrderingReleaseOrStronger = 1` matches both `RELEASE` and `SEQ_CST` 
(sequentially consistent).
- - -
For ARM64, it looks like that `releasing_store<>` has been intended to match 
both `RELEASE` and `SEQ_CST` since at least commit 00ed9964c659 ("ARM64: 
initial backend import"):

```c
// A store operation that actually needs release semantics.
class releasing_store
  : PatFrag<(ops node:$ptr, node:$val), (base node:$ptr, node:$val), [{
  AtomicOrdering Ordering = cast(N)->getOrdering();
  assert(Ordering != AcquireRelease && "unexpected store ordering");
  return Ordering == Release || Ordering == SequentiallyConsistent;
}]>;
```
Looking at `llvm/test/CodeGen/AArch64/fast-isel-atomic.ll`, it seems that a 
"`store atomic` `release`" and a "`store atomic` `seq_cst`" can be turned into 
the same ARM64 insn:
```
; CHECK-LABEL: atomic_store_release_16:
; CHECK-NEXT: // %bb.0:
; CHECK-NEXT:  stlrh w1, [x0]
; CHECK-NEXT:  ret
define void @atomic_store_release_16(ptr %p, i16 %val) #0 {
  store atomic i16 %val, ptr %p release, align 2
  ret void
}
```
```
; CHECK-LABEL: atomic_store_seq_cst_16:
; CHECK-NEXT: // %bb.0:
; CHECK-NEXT:  stlrh w1, [x0]
; CHECK-NEXT:  ret
define void @atomic_store_seq_cst_16(ptr %p, i16 %val) #0 {
  store atomic i16 %val, ptr %p seq_cst, align 2
  ret void
}
```
I haven't read the manual to understand why yet, however.
- - -
For us, I used `IsAtomicOrderingRelease` since I didn't want to match `seq_cst` 
(the 4th commit in this PR makes it generate an "unsupported" error if user 
called `__atomic_{load,store}[_n]()` with `__ATOMIC_SEQ_CST`, as suggested by 
Eduard).

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-10-01 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

Hi @yonghong-song,

No worries, and thanks for taking a look at this!

> So far, I suggest to stick to cpu v4.

I see, I can do that.

> is it possible to add acquire/release support in BPF_ATOMIC? We might want to 
> reserve BPFModelModifier 7 for future use.

Sounds good, that way we also avoid the "`5` and `7` are already internally 
used by the verifier" problem mentioned earlier.  I'll try adding one or more 
new flags to the `imm` field.

> should we do sign-extension here? w0 = load_acquire((s16 *)(r1 + 0x0))?

Not sure about this one.  I'll learn about `MEMSX` and look at other backends.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)

2024-10-02 Thread Peilin Ye via cfe-commits


@@ -522,6 +526,28 @@ let Predicates = [BPFNoALU32] in {
 }
 def STD : STOREi64;
 
+class relaxed_store
+  : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> {
+  let IsAtomic = 1;
+  let IsAtomicOrderingReleaseOrStronger = 0;
+}
+
+class releasing_store
+  : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> {
+  let IsAtomic = 1;
+  let IsAtomicOrderingRelease = 1;
+}

peilin-ye wrote:

Eduard [suggested 
earlier](https://github.com/llvm/llvm-project/pull/108636#discussion_r1765597286)
 (referencing [Andrii's 
suggestion](https://github.com/llvm/llvm-project/pull/107343#issuecomment-2334664935)
 in #107343) that, for now, we "fall back" to `ACQUIRE` or `RELEASE` if user 
requested `RELAXED`, until we actually support `RELAXED` in the future.

> Is my interpretation correct or __atomic_load_n(ptr, __ATOMIC_RELAXED) could 
> have semantics beyond '*ptr'?

`__ATOMIC_RELAXED` still guarantees atomicity but `*ptr` does not, if i'm 
understanding correctly?

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-23 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

@yonghong-song, back to your example:

> ```
> $ cat t4.c
> short foo(short *ptr) {
>   return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
> }
> ```
> ```
>  :
>0:   e9 10 00 00 00 00 00 00 w0 = load_acquire((u16 *)(r1 + 0x0))
>1:   95 00 00 00 00 00 00 00 exit
> ```

Assuming that the above `load_acquire` zero-extends.  If the calling convention 
here is "`foo()`'s return value is represented by the entire `w0` 
half-register", then yes, we're doing it wrong.  However, if I do this:

```c
// clang --target=bpf -mcpu=v4 -O2 -g -c -o test.bpf.o test.bpf.c
__attribute__((noinline)) short bar(short *ptr) {
return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
}

long foo(short *ptr) {
return bar(ptr);
}
```
```
 :
; return __atomic_load_n(ptr, __ATOMIC_ACQUIRE);
   0:   cb 10 00 00 10 00 00 00 w0 = load_acquire((u16 *)(r1 + 0x0))
   1:   95 00 00 00 00 00 00 00 exit

0010 :
; return bar(ptr);
   2:   85 10 00 00 ff ff ff ff call -0x1
   3:   bf 00 10 00 00 00 00 00 r0 = (s16)r0
   4:   95 00 00 00 00 00 00 00 exit
```
Things still work, because `foo()` treats `bar()`'s return value as `s16`.
- - -
It's the same story for ARM64.  If I compile the above program with 
`--target=aarch64`:
```
 :
   0: 48dffc00  ldarh   w0, [x0]
   4: d65f03c0  ret

0008 :
   8: a9bf7bfd  stp x29, x30, [sp, #-0x10]!
   c: 910003fd  mov x29, sp
  10: 9400  bl  0x10 
  14: 93403c00  sxthx0, w0
  18: a8c17bfd  ldp x29, x30, [sp], #0x10
  1c: d65f03c0  ret
```
`ldarh` zero-extends the halfword:

> Load-Acquire Register Halfword derives an address from a base register value, 
> loads a halfword from memory,
zero-extends it, and writes it to a register.

Then the caller, `foo()`, does a `sxth` to sign-extend the halfword:

> Sign Extend Halfword extracts a 16-bit value, sign-extends it to the size of 
> the register, and writes the result to the destination register.
- - -
Is it fair to say that your example demonstrated expected behavior?

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-25 Thread Peilin Ye via cfe-commits


@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, 
SelectionDAG &DAG) const {
   return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops);
 }
 
+SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op,
+SelectionDAG &DAG) const {
+  const char *Msg =
+  "sequentially consistent (seq_cst) atomic load is not supported";

peilin-ye wrote:

Thanks for taking a look at this!
- - -
> Just pass this directly without the message variable?

Then after `git clang-format` it would look like:
```c
  if (cast(N)->getMergedOrdering() ==
  AtomicOrdering::SequentiallyConsistent)
fail(DL, DAG,
 "sequentially consistent (seq_cst) atomic store is not supported");
```
I'd prefer keeping it as-is.

https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2024-10-25 Thread Peilin Ye via cfe-commits

https://github.com/peilin-ye edited 
https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)

2025-02-19 Thread Peilin Ye via cfe-commits

peilin-ye wrote:

When landing, please simply land this PR as-is - considering that I've made 
several references to this PR on the mailing list, and the two other commits 
except the "main" one are pretty small, I no longer think it's worth it to 
restructure this into stacked PRs.  For future contributions, I'll make sure to 
use stacked PRs from the beginning and avoid squashing/force-pushing.  Thanks!
- - -
Pushed v11:
 - Rebased, `ninja check-all` passed
 - Changed encoding to `BPF_LOAD_ACQ=0x100`, `BPF_STORE_REL=0x110` per [mailing 
list discussion](https://lore.kernel.org/bpf/z7axcswd-topj...@google.com/)


https://github.com/llvm/llvm-project/pull/108636
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits