[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
https://github.com/peilin-ye updated https://github.com/llvm/llvm-project/pull/108636 >From c23b4d2dde0e03051332604d18eaf3bd434e4d4e Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Thu, 15 Aug 2024 21:49:23 + Subject: [PATCH 1/3] [BPF] Refactor BPFSubtarget::initSubtargetFeatures() (NFC) Refactor it to make it match the style of BPFTargetInfo::getTargetDefines(), so that when we add -mcpu=v5 in the future, we can simply append e.g.: if (CpuVerNum >= 5) HasFoo = true; instead of saying: if (CPU == "v5") { HasJmpExt = true; HasJmp32 = true; HasAlu32 = true; HasLdsx = !Disable_ldsx; HasMovsx = !Disable_movsx; HasBswap = !Disable_bswap; HasSdivSmod = !Disable_sdiv_smod; HasGotol = !Disable_gotol; HasStoreImm = !Disable_StoreImm; HasFoo = true; } This also makes it clearer that newer "CPU"s always support older features. No functional changes intended. --- llvm/lib/Target/BPF/BPFSubtarget.cpp | 20 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFSubtarget.cpp b/llvm/lib/Target/BPF/BPFSubtarget.cpp index 305e9a2bf2cda3..dbb7b669575b57 100644 --- a/llvm/lib/Target/BPF/BPFSubtarget.cpp +++ b/llvm/lib/Target/BPF/BPFSubtarget.cpp @@ -69,29 +69,25 @@ void BPFSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) { CPU = "v3"; if (CPU == "probe") CPU = sys::detail::getHostCPUNameForBPF(); - if (CPU == "generic" || CPU == "v1") + if (CPU.empty() || CPU == "generic" || CPU == "v1") return; - if (CPU == "v2") { -HasJmpExt = true; -return; - } - if (CPU == "v3") { + + int CpuVerNum = CPU.back() - '0'; + if (CpuVerNum >= 2) HasJmpExt = true; + + if (CpuVerNum >= 3) { HasJmp32 = true; HasAlu32 = true; -return; } - if (CPU == "v4") { -HasJmpExt = true; -HasJmp32 = true; -HasAlu32 = true; + + if (CpuVerNum >= 4) { HasLdsx = !Disable_ldsx; HasMovsx = !Disable_movsx; HasBswap = !Disable_bswap; HasSdivSmod = !Disable_sdiv_smod; HasGotol = !Disable_gotol; HasStoreImm = !Disable_StoreImm; -return; } } >From 680b668fc597728c23fa77926e11491cd0512ea1 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Fri, 23 Aug 2024 19:11:05 + Subject: [PATCH 2/3] [BPF] Refactor {LOAD,STORE}{,32} classes (NFC) We will need different AsmString formats for load-acquire and store-release instructions. To make that easier, refactor {LOAD,STORE}{,32} classes to take AsmString as an argument directly. Add a BPFModeModifer parameter to STORE{,32} for similar reasons. No functional changes intended. --- llvm/lib/Target/BPF/BPFInstrInfo.td | 35 ++--- 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td index f7e17901c7ed5e..aab2291a70efc2 100644 --- a/llvm/lib/Target/BPF/BPFInstrInfo.td +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td @@ -497,12 +497,11 @@ def LD_pseudo } // STORE instructions -class STORE Pattern> -: TYPE_LD_ST Pattern> +: TYPE_LD_ST { + AsmString, Pattern> { bits<4> src; bits<20> addr; @@ -513,7 +512,7 @@ class STORE Pattern> } class STOREi64 -: STORE; +: STORE; let Predicates = [BPFNoALU32] in { def STW : STOREi64; @@ -567,12 +566,11 @@ let Predicates = [BPFHasALU32, BPFHasStoreImm] in { } // LOAD instructions -class LOAD Pattern> +class LOAD Pattern> : TYPE_LD_ST { + AsmString, Pattern> { bits<4> dst; bits<20> addr; @@ -583,7 +581,8 @@ class LOAD -: LOAD; +: LOAD; let isCodeGenOnly = 1 in { class CORE_LD @@ -1069,12 +1068,11 @@ def : Pat<(i32 (trunc GPR:$src)), def : Pat<(i64 (anyext GPR32:$src)), (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$src, sub_32)>; -class STORE32 Pattern> -: TYPE_LD_ST Pattern> +: TYPE_LD_ST { + AsmString, Pattern> { bits<4> src; bits<20> addr; @@ -1085,7 +1083,8 @@ class STORE32 Pattern> } class STOREi32 -: STORE32; +: STORE32; let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in { def STW32 : STOREi32; @@ -1093,12 +1092,11 @@ let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in { def STB32 : STOREi32; } -class LOAD32 Pattern> +class LOAD32 Pattern> : TYPE_LD_ST { +AsmString, Pattern> { bits<4> dst; bits<20> addr; @@ -1109,7 +1107,8 @@ class LOAD32 -: LOAD32; +: LOAD32; let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in { def LDW32 : LOADi32; >From b154221a54d10e34e5009c44c9587582e6769482 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Tue, 10 Sep 2024 23:09:28 + Subject: [PATCH 3/3] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v5. A "load_acquire" is a BPF_LDX instruction with a new
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
https://github.com/peilin-ye ready_for_review https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -622,6 +665,47 @@ let Predicates = [BPFHasLdsx] in { def LDD : LOADi64; +class LOAD_ACQUIRE +: TYPE_LD_ST { + bits<4> dst; + bits<20> addr; + + let Inst{51-48} = dst; + let Inst{55-52} = addr{19-16}; // base reg + let Inst{47-32} = addr{15-0}; // offset + let Inst{7-4} = BPF_LOAD_ACQ.Value; peilin-ye wrote: Sure! (I'm on leave for the rest of the week; will get back to this first thing next week) https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
https://github.com/peilin-ye edited https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
https://github.com/peilin-ye updated https://github.com/llvm/llvm-project/pull/108636 >From 7a3c9741ec7a5680b0a59ed7c0316a71c78ccf48 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 06:44:21 + Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in BPFMISimplifyPatchable.cpp (NFC) We are planning to add load (specifically, atomic acquiring load, or "load-acquire") instructions under the STX instruction class. To make that easier, rename the isST*() and isLD*() helper functions based on what the instructions actually do, rather than their instruction class. --- .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +-- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp index 39390e8c38f8c1..4a1684ccebb793 100644 --- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp +++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp @@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction &MFParm) { LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n"); } -static bool isST(unsigned Opcode) { +static bool isStoreImm(unsigned Opcode) { return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm || Opcode == BPF::STW_imm || Opcode == BPF::STD_imm; } -static bool isSTX32(unsigned Opcode) { +static bool isStore32(unsigned Opcode) { return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32; } -static bool isSTX64(unsigned Opcode) { +static bool isStore64(unsigned Opcode) { return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW || Opcode == BPF::STD; } -static bool isLDX32(unsigned Opcode) { +static bool isLoad32(unsigned Opcode) { return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32; } -static bool isLDX64(unsigned Opcode) { +static bool isLoad64(unsigned Opcode) { return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW || Opcode == BPF::LDD; } -static bool isLDSX(unsigned Opcode) { +static bool isLoadSext(unsigned Opcode) { return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX; } bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) { - return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode); + return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode); } void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, @@ -143,11 +143,11 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, MachineInstr *DefInst = MO.getParent(); unsigned Opcode = DefInst->getOpcode(); unsigned COREOp; -if (isLDX64(Opcode) || isLDSX(Opcode)) +if (isLoad64(Opcode) || isLoadSext(Opcode)) COREOp = BPF::CORE_LD64; -else if (isLDX32(Opcode)) +else if (isLoad32(Opcode)) COREOp = BPF::CORE_LD32; -else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode)) +else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode)) COREOp = BPF::CORE_ST; else continue; @@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, // Reject the form: // %1 = ADD_rr %2, %3 // *(type *)(%2 + 0) = %1 -if (isSTX64(Opcode) || isSTX32(Opcode)) { +if (isStore64(Opcode) || isStore32(Opcode)) { const MachineOperand &Opnd = DefInst->getOperand(0); if (Opnd.isReg() && Opnd.getReg() == MO.getReg()) continue; >From b3e494c9e4a36c68134c3bfda81b2068c1c08bca Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 07:31:54 + Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v4. A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_LOAD_ACQ (0x1, acquiring atomic load). Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_STORE_REL (0xb, releasing atomic store). Unlike existing atomic operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDAPRH and friends. For example: long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: db 10 00 00 10 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX imm (0x0010): BPF_LOAD_ACQ Similarly: void bar(short *ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: cb 21 00 00 b0 00 00 00 store_release((u16 *)(r1 + 0x0), w2) 95 00 00 00 00 00 00 00 exit opco
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
https://github.com/peilin-ye edited https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: Back on this; rebased and pushed patchset v6 to make `acquire-release.ll` also cover `-march=bpfeb` as suggested by Eduard. Basically: ``` ; CHECK-LE: w0 = load_acquire((u8 *)(r1 + 0)) # encoding: [0xd3,0x10,0x00,0x00,0x10,0x00,0x00,0x00] ; CHECK-BE: w0 = load_acquire((u8 *)(r1 + 0)) # encoding: [0xd3,0x01,0x00,0x00,0x00,0x00,0x00,0x10] ``` `src_reg` and `dst_reg` are swapped, and `imm` is in big endian. Currently focusing on making kernel-side changes (verifier and JITs). - - - That weird `fatal: reference is not a tree:` CI error looks similar to [the one mentioned in PR 73506](https://github.com/llvm/llvm-project/pull/73506#issuecomment-1832909676). Running check-all locally for now... https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
https://github.com/peilin-ye edited https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
https://github.com/peilin-ye updated https://github.com/llvm/llvm-project/pull/108636 >From 32cc1b238ddd67a96627a1cc57c1b9ca5c6be938 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 06:44:21 + Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in BPFMISimplifyPatchable.cpp (NFC) We are planning to add load (specifically, atomic acquiring load, or "load-acquire") instructions under the STX instruction class. To make that easier, rename the isST*() and isLD*() helper functions based on what the instructions actually do, rather than their instruction class. --- .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +-- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp index 39390e8c38f8c1..4a1684ccebb793 100644 --- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp +++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp @@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction &MFParm) { LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n"); } -static bool isST(unsigned Opcode) { +static bool isStoreImm(unsigned Opcode) { return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm || Opcode == BPF::STW_imm || Opcode == BPF::STD_imm; } -static bool isSTX32(unsigned Opcode) { +static bool isStore32(unsigned Opcode) { return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32; } -static bool isSTX64(unsigned Opcode) { +static bool isStore64(unsigned Opcode) { return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW || Opcode == BPF::STD; } -static bool isLDX32(unsigned Opcode) { +static bool isLoad32(unsigned Opcode) { return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32; } -static bool isLDX64(unsigned Opcode) { +static bool isLoad64(unsigned Opcode) { return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW || Opcode == BPF::LDD; } -static bool isLDSX(unsigned Opcode) { +static bool isLoadSext(unsigned Opcode) { return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX; } bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) { - return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode); + return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode); } void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, @@ -143,11 +143,11 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, MachineInstr *DefInst = MO.getParent(); unsigned Opcode = DefInst->getOpcode(); unsigned COREOp; -if (isLDX64(Opcode) || isLDSX(Opcode)) +if (isLoad64(Opcode) || isLoadSext(Opcode)) COREOp = BPF::CORE_LD64; -else if (isLDX32(Opcode)) +else if (isLoad32(Opcode)) COREOp = BPF::CORE_LD32; -else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode)) +else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode)) COREOp = BPF::CORE_ST; else continue; @@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, // Reject the form: // %1 = ADD_rr %2, %3 // *(type *)(%2 + 0) = %1 -if (isSTX64(Opcode) || isSTX32(Opcode)) { +if (isStore64(Opcode) || isStore32(Opcode)) { const MachineOperand &Opnd = DefInst->getOperand(0); if (Opnd.isReg() && Opnd.getReg() == MO.getReg()) continue; >From ee7836ecbccd745417d145690d3abf822ecc85c0 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 07:31:54 + Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v4. A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_LOAD_ACQ (0x1, acquiring atomic load). Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_STORE_REL (0xb, releasing atomic store). Unlike existing atomic operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDAPRH and friends. As an example, for -march=bpfel: long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: db 10 00 00 10 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX imm (0x0010): BPF_LOAD_ACQ Similarly: void bar(short *ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: cb 21 00 00 b0 00 00 00 store_release((u16 *)(r1 + 0x0), w2) 95 00 00 00 00 00
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -1205,10 +1298,19 @@ class LOAD32 : LOAD32; +class LOAD_ACQUIREi32 +: LOAD_ACQUIRE; + let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in { def LDW32 : LOADi32; def LDH32 : LOADi32; def LDB32 : LOADi32; + + let Predicates = [BPFHasLoadAcquire] in { peilin-ye wrote: > Is there a reason to define these (and stores) behind `BPFHasALU32` flag? Not really. Looks like there is no way to turn off `HasALU32` for v4. I just wanted to make it clear in the `.td` file that we are using half-registers for 8-, 16- and 32-bit insns. Also I wanted to define them behind `DecoderNamespace = "BPFALU32"` because currently `BPFDisassembler::getInstruction()` uses `DecoderTableBPFALU3264` for all non-64-bit `BPF_STX | BPF_ATOMIC` insns: ```cpp if ((InstClass == BPF_LDX || InstClass == BPF_STX) && getInstSize(Insn) != BPF_DW && (InstMode == BPF_MEM || InstMode == BPF_ATOMIC) && STI.hasFeature(BPF::ALU32)) Result = decodeInstruction(DecoderTableBPFALU3264, Instr, Insn, Address, this, STI); else Result = decodeInstruction(DecoderTableBPF64, Instr, Insn, Address, this, STI); ``` So if I move them out of `HasALU32` and keep BPFDisassembler.cpp as-is, llvm-objdump would give me: ``` : ; __atomic_store_n(ptr, val, __ATOMIC_RELEASE); 0: cb 21 00 00 b0 00 00 00 ; } 1: 95 00 00 00 00 00 00 00 exit ``` - - - > From instruction encoding pow we use STX class, which has no 32/64 > sub-division. What I'm doing is similar to `XXOR*`: right now we only have `XXORD` and `XXORW32` (both `BPF_STX`), and `XXORW32` is behind `Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32"`. There is no "ALU64 version" of `XXORW32`. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
https://github.com/peilin-ye updated https://github.com/llvm/llvm-project/pull/108636 >From fda5dadf6af584fab9eef9fa588efe5454a9d1d1 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 06:44:21 + Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in BPFMISimplifyPatchable.cpp (NFC) We are planning to add load (specifically, atomic acquiring load, or "load-acquire") instructions under the STX instruction class. To make that easier, rename the isST*() and isLD*() helper functions based on what the instructions actually do, rather than their instruction class. --- .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +-- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp index 39390e8c38f8c1..4a1684ccebb793 100644 --- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp +++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp @@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction &MFParm) { LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n"); } -static bool isST(unsigned Opcode) { +static bool isStoreImm(unsigned Opcode) { return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm || Opcode == BPF::STW_imm || Opcode == BPF::STD_imm; } -static bool isSTX32(unsigned Opcode) { +static bool isStore32(unsigned Opcode) { return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32; } -static bool isSTX64(unsigned Opcode) { +static bool isStore64(unsigned Opcode) { return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW || Opcode == BPF::STD; } -static bool isLDX32(unsigned Opcode) { +static bool isLoad32(unsigned Opcode) { return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32; } -static bool isLDX64(unsigned Opcode) { +static bool isLoad64(unsigned Opcode) { return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW || Opcode == BPF::LDD; } -static bool isLDSX(unsigned Opcode) { +static bool isLoadSext(unsigned Opcode) { return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX; } bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) { - return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode); + return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode); } void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, @@ -143,11 +143,11 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, MachineInstr *DefInst = MO.getParent(); unsigned Opcode = DefInst->getOpcode(); unsigned COREOp; -if (isLDX64(Opcode) || isLDSX(Opcode)) +if (isLoad64(Opcode) || isLoadSext(Opcode)) COREOp = BPF::CORE_LD64; -else if (isLDX32(Opcode)) +else if (isLoad32(Opcode)) COREOp = BPF::CORE_LD32; -else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode)) +else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode)) COREOp = BPF::CORE_ST; else continue; @@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, // Reject the form: // %1 = ADD_rr %2, %3 // *(type *)(%2 + 0) = %1 -if (isSTX64(Opcode) || isSTX32(Opcode)) { +if (isStore64(Opcode) || isStore32(Opcode)) { const MachineOperand &Opnd = DefInst->getOperand(0); if (Opnd.isReg() && Opnd.getReg() == MO.getReg()) continue; >From 963e522de5a6ddbd7c1aa386e70d2997c6df2359 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 07:31:54 + Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v4. A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_LOAD_ACQ (0x1, acquiring atomic load). Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_STORE_REL (0xb, releasing atomic store). Unlike existing atomic operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDAPRH and friends. As an example, for -march=bpfel: long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: db 10 00 00 10 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX imm (0x0010): BPF_LOAD_ACQ Similarly: void bar(short *ptr, short val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } bar() can be compiled to: cb 21 00 00 b0 00 00 00 store_release((u16 *)(r1 + 0x0), w2) 95 00 00 00 00 00
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: Thanks for all your suggestions! Main changes in patchset v5: 1. use `-mcpu=v4` instead of `-mcpu=v5` 2. make both load-acquire and store-release `BPF_STX | BPF_ATOMIC` insns, instead of introducing new mode modifiers: `BPF_STX | BPF_ATOMIC` insns use a subset of `BPFArithOp<>` in bit `4-7` of the `imm` field. Currently these are in use: ``` def BPF_ADD : BPFArithOp<0x0>; def BPF_OR : BPFArithOp<0x4>; def BPF_AND : BPFArithOp<0x5>; def BPF_XOR : BPFArithOp<0xa>; def BPF_XCHG: BPFArithOp<0xe>; def BPF_CMPXCHG : BPFArithOp<0xf>; ``` On top of this, I tentatively chose `0x1` for `BPF_LOAD_ACQ`, and `0xb` for `BPF_STORE_REL`, because: - `0x1` is `BPF_SUB` in `BPFArithOp<>`, but atomic SUB is implemented using a NEG + ADD: ``` // atomic_load_sub can be represented as a neg followed // by an atomic_load_add. ``` - `0xb` is `BPF_MOV` in `BPFArithOp<>`, which is "moving value between registers" and isn't applicable for atomic (memory) operations Please let me know if you have better suggestions for encoding (or if I'm overthinking this and we can simply use `0x1` and `0x2`). - - - @eddyz87, do my changes to `BPFMISimplifyPatchable.cpp` still look good to you? Now that load-acquires are `STX` insns, I wanted to make sure that `BPFMISimplifyPatchable::checkADDrr()` can still handle them correctly for CO-RE. I'll add tests to `llvm/test/CodeGen/BPF/CORE/` later, like what you did in commit 08d92dedd26c ("[BPF] Fix in/out argument constraints for CORE_MEM instructions"). - - - About whether we should sign-extend for 8- and 16-bit load-acquires (brought up by Yonghong): All ARM64 insns that match `acquiring_load` seem to zero-extend the value before writing it to register, like, `LDAPRH`: > Load-Acquire RCpc Register Halfword derives an address from a base register > value, loads a halfword from the derived address in memory, zero-extends it > and writes it to a register. So right now I'm keeping our `LDBACQ32` and `LDHACQ32` to zero-extend. I'll take a look at other archs later. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
@@ -522,6 +526,28 @@ let Predicates = [BPFNoALU32] in { } def STD : STOREi64; +class relaxed_store + : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> { + let IsAtomic = 1; + let IsAtomicOrderingReleaseOrStronger = 0; +} + +class releasing_store + : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> { + let IsAtomic = 1; + let IsAtomicOrderingRelease = 1; +} peilin-ye wrote: Got it, thanks for the input! I also found this in [Instruction-Level BPF Memory Model](https://docs.google.com/document/d/1TaSEfWfLnRUi5KqkavUQyL2tThJXYWHS15qcbxIsFb0/edit?usp=sharing), section "Atomic Loads" : > For `__ATOMIC_RELAXED`, these can simply be implemented as plain BPF load > instructions. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: > In the above, you will do unsigned load extension. But what about signed extension load variant? @yonghong-song, if we ever need that, we can add a new flag to bit `0-3` of `imm` to indicate "this 8- or 16-bit load-acquire is sign-extending", just like `BPF_FETCH` for `BPF_XCHG` and `BPF_CMPXCHG`. However, I wonder how do I generate a sign-extending (`sext`), acquiring `ATOMIC_LOAD` in SelectionDAG? If I compile the following program: ```c // clang --target=bpf -mcpu=v4 -mllvm -debug-only=isel-dump -O2 -g -c -o demo.bpf.o demo.bpf.c long bar(char *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } ``` I get this: ``` Optimized legalized selection DAG: %bb.0 'bar:entry' SelectionDAG has 10 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t8: i32,ch = AtomicLoad<(load acquire (s8) from %ir.ptr), zext from i8> t0, t2, demo.bpf.c:3:12 t9: i64 = any_extend t8, demo.bpf.c:3:12 t11: i64 = sign_extend_inreg t9, ValueType:ch:i8, demo.bpf.c:3:12 t6: ch,glue = CopyToReg t8:1, Register:i64 $r0, t11, demo.bpf.c:3:5 t7: ch = BPFISD::RET_GLUE t6, Register:i64 $r0, t6:1, demo.bpf.c:3:5 ``` The `AtomicLoad` node (`t8`) still says `zext` (zero-extending). The program would still work because it does an 8-to-64-bit `MOVSX` after the load-acquire: ``` ; return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); 0: d3 11 00 00 10 00 00 00 w1 = load_acquire((u8 *)(r1 + 0x0)) 1: bf 10 08 00 00 00 00 00 r0 = (s8)r1 2: 95 00 00 00 00 00 00 00 exit ``` `r0 = (s8)r1` takes the lowest byte of `r1` and sign-extends it into 64-bit, so it doesn't matter if the load-acquire zero- or sign-extended the char into 32-bit. I still wonder if there's a way to change that `AtomicLoad` node (`t8`) from `zext` to `sext` though. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: Another problem is, right now, if I do this: ```diff --- a/llvm/lib/Target/BPF/BPFInstrInfo.td +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td @@ -1343,11 +1343,11 @@ let Predicates = [BPFHasALU32] in { let Predicates = [BPFHasLoadAcquire] in { foreach P = [[relaxed_load, LDWACQ32], - [relaxed_load, LDHACQ32], - [relaxed_load, LDBACQ32], + [relaxed_load, LDHACQ32], + [relaxed_load, LDBACQ32], [acquiring_load, LDWACQ32], - [acquiring_load, LDHACQ32], - [acquiring_load, LDBACQ32], + [acquiring_load, LDHACQ32], + [acquiring_load, LDBACQ32], ] in { def : Pat<(P[0] ADDRri:$addr), (P[1] ADDRri:$addr)>; } ``` The above toy program no longer compiles: ``` fatal error: error in backend: Cannot select: t9: i32,ch = AtomicLoad<(load acquire (s8) from %ir.ptr), zext from i8> t0, t2, demo.bpf.c:3:12 t2: i64,ch = CopyFromReg t0, Register:i64 %0 t1: i64 = Register %0 In function: bar ... ``` This is because commit 1ee6ce9bad4d ("GlobalISel: Allow forming atomic/volatile G_ZEXTLOAD") did this in `llvm/utils/TableGen/CodeGenDAGPatterns.cpp`: ```diff @@ -1105,6 +1109,10 @@ std::string TreePredicateFn::getPredCode() const { Code += "if (isReleaseOrStronger(cast(N)->getMergedOrdering())) " "return false;\n"; + // TODO: Handle atomic sextload/zextload normally when ATOMIC_LOAD is removed. + if (isAtomic() && (isZeroExtLoad() || isSignExtLoad())) +Code += "return false;\n"; + if (isLoad() || isStore()) { StringRef SDNodeName = isLoad() ? "LoadSDNode" : "StoreSDNode"; ``` As a result, nothing would match `atomic_load_zext_{8,16}` because these patterns are **explicitly** marked as `zext` (`isZeroExtLoad()`). - - - If our plan is to: - generate `zext` BPF 8- and 16-bit load-acquire for `atomic_load_zext_{8,16}` - (if needed) generate `sext` BPF 8- and 16-bit load-acquire for `atomic_load_sext_{8,16}` Then this problem needs to be solved, and I can include the author of 1ee6ce9bad4d into discussion. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -48,6 +48,24 @@ def BPF_END : BPFArithOp<0xd>; def BPF_XCHG: BPFArithOp<0xe>; def BPF_CMPXCHG : BPFArithOp<0xf>; +class BPFAtomicLoadStoreOp val> { + bits<4> Value = val; +} + +def BPF_ATOMIC_LOAD : BPFAtomicLoadStoreOp<0x1>; +def BPF_ATOMIC_STORE : BPFAtomicLoadStoreOp<0x2>; + +class BPFAtomicOrdering val> { + bits<4> Value = val; +} + +def BPF_RELAXED : BPFAtomicOrdering<0x0>; +def BPF_CONSUME : BPFAtomicOrdering<0x1>; peilin-ye wrote: Sure, let me delete it. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
https://github.com/peilin-ye updated https://github.com/llvm/llvm-project/pull/108636 >From 885d5141f6707a0fdf4be363351083f8fdf8fd54 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 06:44:21 + Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in BPFMISimplifyPatchable.cpp (NFC) We are planning to add load (specifically, atomic acquiring load, or "load-acquire") instructions under the STX instruction class. To make that easier, rename the isST*() and isLD*() helper functions based on what the instructions actually do, rather than their instruction class. --- .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +-- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp index 39390e8c38f8c1..4a1684ccebb793 100644 --- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp +++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp @@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction &MFParm) { LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n"); } -static bool isST(unsigned Opcode) { +static bool isStoreImm(unsigned Opcode) { return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm || Opcode == BPF::STW_imm || Opcode == BPF::STD_imm; } -static bool isSTX32(unsigned Opcode) { +static bool isStore32(unsigned Opcode) { return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32; } -static bool isSTX64(unsigned Opcode) { +static bool isStore64(unsigned Opcode) { return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW || Opcode == BPF::STD; } -static bool isLDX32(unsigned Opcode) { +static bool isLoad32(unsigned Opcode) { return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32; } -static bool isLDX64(unsigned Opcode) { +static bool isLoad64(unsigned Opcode) { return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW || Opcode == BPF::LDD; } -static bool isLDSX(unsigned Opcode) { +static bool isLoadSext(unsigned Opcode) { return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX; } bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) { - return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode); + return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode); } void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, @@ -143,11 +143,11 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, MachineInstr *DefInst = MO.getParent(); unsigned Opcode = DefInst->getOpcode(); unsigned COREOp; -if (isLDX64(Opcode) || isLDSX(Opcode)) +if (isLoad64(Opcode) || isLoadSext(Opcode)) COREOp = BPF::CORE_LD64; -else if (isLDX32(Opcode)) +else if (isLoad32(Opcode)) COREOp = BPF::CORE_LD32; -else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode)) +else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode)) COREOp = BPF::CORE_ST; else continue; @@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, // Reject the form: // %1 = ADD_rr %2, %3 // *(type *)(%2 + 0) = %1 -if (isSTX64(Opcode) || isSTX32(Opcode)) { +if (isStore64(Opcode) || isStore32(Opcode)) { const MachineOperand &Opnd = DefInst->getOperand(0); if (Opnd.isReg() && Opnd.getReg() == MO.getReg()) continue; >From 77807685b76607db97c18bb5dd146203c675ab98 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 07:31:54 + Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v4. The following new flags are defined: BPF_ATOMIC_LOAD 0x10 BPF_ATOMIC_STORE 0x20 BPF_RELAXED: 0x0 BPF_ACQUIRE: 0x1 BPF_RELEASE: 0x2 BPF_ACQ_REL: 0x3 BPF_SEQ_CST: 0x4 A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_ATOMIC_LOAD | BPF_ACQUIRE (0x11). Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_ATOMIC_STORE | BPF_RELEASE (0x22). Unlike existing atomic operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDAPRH and friends. As an example, for -march=bpfel (big-endian): long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: db 10 00 00 11 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX imm (0x0011): BPF_ATOMIC_L
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: Changes in v7: 1. Change encoding to make it easier to support `SEQ_CST` in the future (Yingchi) ``` (before) | imm{7-4}| imm{3-0} | - | --- | | load-acquire | BPF_LOAD_ACQ (0x1) | 0x0 | store-release | BPF_STORE_REL (0x2) | 0x0 | ``` ``` (after) | imm{7-4}| imm{3-0} | - | --- | - | load-acquire | BPF_LOAD (0x1) | BPF_ACQUIRE (0x1) | store-release | BPF_STORE (0x2) | BPF_RELEASE (0x2) | ``` Introduce the following new flags for memory orders: ``` def BPF_RELAXED : BPFAtomicOrdering<0x0>; def BPF_ACQUIRE : BPFAtomicOrdering<0x1>; def BPF_RELEASE : BPFAtomicOrdering<0x2>; def BPF_ACQ_REL : BPFAtomicOrdering<0x3>; def BPF_SEQ_CST : BPFAtomicOrdering<0x4>; ``` Include all orders defined by the C++ standard except `CONSUME` (Yonghong) 2. `BPFISelLowering.cpp` changes (Matt) - Avoid unnecessarily setting `MVT::i{8,16}` `ATOMIC_{LOAD,STORE}` to `Custom` - Coding style change 3. `acquire-release.ll` changes (Yingchi) - Move `; CHECK` labels into their corresponding function bodies - Delete the `; Source:` section 4. Update `bpf-predefined-macros.c` https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -48,6 +48,13 @@ def BPF_END : BPFArithOp<0xd>; def BPF_XCHG: BPFArithOp<0xe>; def BPF_CMPXCHG : BPFArithOp<0xf>; +class BPFAtomicLoadStoreOp val> { + bits<4> Value = val; +} + +def BPF_LOAD_ACQ : BPFAtomicLoadStoreOp<0x1>; +def BPF_STORE_REL : BPFAtomicLoadStoreOp<0xb>; peilin-ye wrote: @yonghong-song, my thinking was: Right now for `BPF_ATOMIC` insns, we use a subset of `BPFArithOp<>` in `imm`: ``` def BPF_ADD : BPFArithOp<0x0>; def BPF_OR : BPFArithOp<0x4>; def BPF_AND : BPFArithOp<0x5>; def BPF_XOR : BPFArithOp<0xa>; def BPF_XCHG: BPFArithOp<0xe>; def BPF_CMPXCHG : BPFArithOp<0xf>; ``` Will we ever want to support other `BPFArithOp<>` ops in `BPF_ATOMIC` ? Like, `0x1` is `BPF_SUB`, but it looks like currently atomic SUB is implemented using NEG + ADD: ``` // atomic_load_sub can be represented as a neg followed // by an atomic_load_add. ``` So I thought, "OK, if we don't need to reserve `0x1` for `BPF_SUB`, then I can use it". Similarly, `0xb` is `BPF_MOV` (moving value between registers), and I think it's safe to say we'll never need it for `BPF_ATOMIC`. But `0x2` is `BPF_MUL`; will we ever support atomic multiplication? :-) - - - Am I overthinking this? Should I simply use `0x1` and `0x2` ? https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -0,0 +1,142 @@ +; RUN: llc < %s -march=bpfel -mcpu=v4 -verify-machineinstrs -show-mc-encoding \ +; RUN: | FileCheck -check-prefixes=CHECK-LE %s +; RUN: llc < %s -march=bpfeb -mcpu=v4 -verify-machineinstrs -show-mc-encoding \ peilin-ye wrote: (replied above) https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -0,0 +1,142 @@ +; RUN: llc < %s -march=bpfel -mcpu=v4 -verify-machineinstrs -show-mc-encoding \ peilin-ye wrote: > again Thanks for pointing this out, but I do want to test the MC encoding. It looks like at the moment a lot of tests under `llvm/test/CodeGen/BPF/` are doing this: ```shell $ git grep --files-with-matches '\-show-mc-encoding' -- llvm/test/CodeGen/BPF/ | wc -l 18 ``` I think we can do a separate PR to move all the "test MC encoding" bits from `llvm/test/CodeGen/BPF/` to `llvm/test/MC/BPF/`, if needed. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: Pushed v8 to make it generate plain (`BPF_MEM`) loads and stores if user requested `__ATOMIC_RELAXED`, as suggested by Yonghong. Updated commit message accordingly. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: Sure Yonghong, I can include that in this PR. Similarly, since we cannot have both `relaxed_load` and `relaxed_load`, I'll keep it `zext` (`BPF_MEM` | `BPF_LDX`) for now. For example: ```c int foo(char *ptr) { return __atomic_load_n(ptr, __ATOMIC_RELAXED); } ``` This'll be compiled into: ``` : 0: 71 11 00 00 00 00 00 00 w1 = *(u8 *)(r1 + 0x0) 1: bc 10 08 00 00 00 00 00 w0 = (s8)w1 2: 95 00 00 00 00 00 00 00 exit ``` https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
https://github.com/peilin-ye updated https://github.com/llvm/llvm-project/pull/108636 >From 8ab5d1c2e36dfa44be637478deeabaa4f586dcee Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 06:44:21 + Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in BPFMISimplifyPatchable.cpp (NFC) We are planning to add load (specifically, atomic acquiring load, or "load-acquire") instructions under the STX instruction class. To make that easier, rename the isST*() and isLD*() helper functions based on what the instructions actually do, rather than their instruction class. --- .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +-- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp index 39390e8c38f8c1..4a1684ccebb793 100644 --- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp +++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp @@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction &MFParm) { LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n"); } -static bool isST(unsigned Opcode) { +static bool isStoreImm(unsigned Opcode) { return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm || Opcode == BPF::STW_imm || Opcode == BPF::STD_imm; } -static bool isSTX32(unsigned Opcode) { +static bool isStore32(unsigned Opcode) { return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32; } -static bool isSTX64(unsigned Opcode) { +static bool isStore64(unsigned Opcode) { return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW || Opcode == BPF::STD; } -static bool isLDX32(unsigned Opcode) { +static bool isLoad32(unsigned Opcode) { return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32; } -static bool isLDX64(unsigned Opcode) { +static bool isLoad64(unsigned Opcode) { return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW || Opcode == BPF::LDD; } -static bool isLDSX(unsigned Opcode) { +static bool isLoadSext(unsigned Opcode) { return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX; } bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) { - return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode); + return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode); } void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, @@ -143,11 +143,11 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, MachineInstr *DefInst = MO.getParent(); unsigned Opcode = DefInst->getOpcode(); unsigned COREOp; -if (isLDX64(Opcode) || isLDSX(Opcode)) +if (isLoad64(Opcode) || isLoadSext(Opcode)) COREOp = BPF::CORE_LD64; -else if (isLDX32(Opcode)) +else if (isLoad32(Opcode)) COREOp = BPF::CORE_LD32; -else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode)) +else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode)) COREOp = BPF::CORE_ST; else continue; @@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, // Reject the form: // %1 = ADD_rr %2, %3 // *(type *)(%2 + 0) = %1 -if (isSTX64(Opcode) || isSTX32(Opcode)) { +if (isStore64(Opcode) || isStore32(Opcode)) { const MachineOperand &Opnd = DefInst->getOperand(0); if (Opnd.isReg() && Opnd.getReg() == MO.getReg()) continue; >From 4b3b71ffebca7fcc1c083e9f269f08332cbdff0b Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 07:31:54 + Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v4. The following new flags are defined: BPF_ATOMIC_LOAD 0x10 BPF_ATOMIC_STORE 0x20 BPF_RELAXED: 0x0 BPF_CONSUME: 0x1 BPF_ACQUIRE: 0x2 BPF_RELEASE: 0x3 BPF_ACQ_REL: 0x4 BPF_SEQ_CST: 0x5 A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_ATOMIC_LOAD | BPF_ACQUIRE (0x12). Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_ATOMIC_STORE | BPF_RELEASE (0x23). Unlike existing atomic operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDAPRH and friends. As an example, for -march=bpfel (big-endian): long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: db 10 00 00 12 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX imm (
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: Hi @arsenm, I'm trying to match `atomic_load_zext_{8,16}` in BPF backend but it doesn't work: ``` fatal error: error in backend: Cannot select: t8: i32,ch = AtomicLoad<(load acquire (s8) from %ir.ptr), zext from i8> t0, t2, demo.bpf.c:3:12 t2: i64,ch = CopyFromReg t0, Register:i64 %0 t1: i64 = Register %0 ... ``` After some digging, I think it is due to this TableGen (`llvm/utils/TableGen/CodeGenDAGPatterns.cpp`) change in commit 1ee6ce9bad4d ("GlobalISel: Allow forming atomic/volatile G_ZEXTLOAD"): ```diff @@ -1105,6 +1109,10 @@ std::string TreePredicateFn::getPredCode() const { Code += "if (isReleaseOrStronger(cast(N)->getMergedOrdering())) " "return false;\n"; + // TODO: Handle atomic sextload/zextload normally when ATOMIC_LOAD is removed. + if (isAtomic() && (isZeroExtLoad() || isSignExtLoad())) +Code += "return false;\n"; + if (isLoad() || isStore()) { StringRef SDNodeName = isLoad() ? "LoadSDNode" : "StoreSDNode"; ``` So I'm having this in `BPFGenDAGISel.inc`: ```cpp case 34: { // Predicate_atomic_load_zext SDNode *N = Node; (void)N; return false; return true; } ``` It unconditionally returns `false`, so nothing would match `atomic_load_zext_{8,16}`. - - - Could you please shed some light on how to match `atomic_load_{s,z}ext_{8,16}`? Should I learn about GlobalISel? I'm aware that there's `atomic_load_az_{8,16}` (used by AArch64), but I need the `*_zext_*` and `*_sext_*` versions here. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const { return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops); } +SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op, +SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic load is not supported"; + SDNode *N = Op.getNode(); + SDLoc DL(N); + + if (cast(N)->getMergedOrdering() == + AtomicOrdering::SequentiallyConsistent) +fail(DL, DAG, Msg); + + return Op; +} + +SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op, + SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic store is not supported"; + EVT VT = Op.getOperand(1).getValueType(); + SDNode *N = Op.getNode(); + SDLoc DL(N); + + // Promote operand #1 (value to store) if necessary. + if (!isTypeLegal(VT)) +return SDValue(); + + if (cast(N)->getMergedOrdering() == + AtomicOrdering::SequentiallyConsistent) +fail(DL, DAG, Msg); peilin-ye wrote: We're starting from `ACQUIRE` and `RELEASE`. `SEQ_CST` will be supported if needed. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const { return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops); } +SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op, +SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic load is not supported"; + SDNode *N = Op.getNode(); + SDLoc DL(N); + + if (cast(N)->getMergedOrdering() == + AtomicOrdering::SequentiallyConsistent) +fail(DL, DAG, Msg); + + return Op; +} + +SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op, + SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic store is not supported"; + EVT VT = Op.getOperand(1).getValueType(); + SDNode *N = Op.getNode(); + SDLoc DL(N); + + // Promote operand #1 (value to store) if necessary. + if (!isTypeLegal(VT)) +return SDValue(); peilin-ye wrote: Ah! I see, thanks! So: ```diff --- a/llvm/lib/Target/BPF/BPFISelLowering.cpp +++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp @@ -93,7 +93,7 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM, setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, VT, Custom); } - for (auto VT : {MVT::i8, MVT::i16, MVT::i32, MVT::i64}) { + for (auto VT : {MVT::i32, MVT::i64}) { setOperationAction(ISD::ATOMIC_LOAD, VT, Custom); setOperationAction(ISD::ATOMIC_STORE, VT, Custom); } @@ -737,10 +737,6 @@ SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op, SDNode *N = Op.getNode(); SDLoc DL(N); - // Promote operand #1 (value to store) if necessary. - if (!isTypeLegal(VT)) -return SDValue(); - if (cast(N)->getMergedOrdering() == AtomicOrdering::SequentiallyConsistent) fail(DL, DAG, Msg); ``` https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const { return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops); } +SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op, +SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic load is not supported"; peilin-ye wrote: OK, let me change it in the next push. > Plus I think such an error should be relegated to an unsupported legalize > action. Sorry, could you explain your suggestion a bit more? Are you suggesting that we add an `Unsupported` enum value to `LegalizeAction`? https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: > So your change looks good. Once you get some kernel work in reasonable shape, > this patch can land. Thanks! Sure, thanks for reviewing this! https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -67,6 +67,8 @@ void BPFTargetInfo::getTargetDefines(const LangOptions &Opts, Builder.defineMacro("__BPF_FEATURE_SDIV_SMOD"); Builder.defineMacro("__BPF_FEATURE_GOTOL"); Builder.defineMacro("__BPF_FEATURE_ST"); +Builder.defineMacro("__BPF_FEATURE_LOAD_ACQUIRE"); +Builder.defineMacro("__BPF_FEATURE_STORE_RELEASE"); peilin-ye wrote: > But I don't have a good name for that. +1, but I just noticed that we need to update `clang/test/Preprocessor/bpf-predefined-macros.c`. I'll do it in the next push. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: @yonghong-song Hi Yonghong, about my earlier question: > However, I wonder how do I generate a sign-extending (sext), acquiring > ATOMIC_LOAD in SelectionDAG? After more digging, I found this in `llvm/include/llvm/CodeGen/TargetLowering.h`: ```cpp /// Returns how the platform's atomic operations are extended (ZERO_EXTEND, /// SIGN_EXTEND, or ANY_EXTEND). virtual ISD::NodeType getExtendForAtomicOps() const { return ISD::ZERO_EXTEND; } ``` If I override it in `class BPFTargetLowering`: ```diff --- a/llvm/lib/Target/BPF/BPFISelLowering.h +++ b/llvm/lib/Target/BPF/BPFISelLowering.h @@ -46,6 +46,10 @@ public: // with the given GlobalAddress is legal. bool isOffsetFoldingLegal(const GlobalAddressSDNode *GA) const override; + ISD::NodeType getExtendForAtomicOps() const override { +return ISD::SIGN_EXTEND; + } + BPFTargetLowering::ConstraintType getConstraintType(StringRef Constraint) const override; ``` I can then get a `sext` sub-word `AtomicLoad` SelectionDAG node: ``` Optimized legalized selection DAG: %bb.0 'bar:entry' SelectionDAG has 8 nodes: t0: ch,glue = EntryToken t2: i64,ch = CopyFromReg t0, Register:i64 %0 t8: i32,ch = AtomicLoad<(load acquire (s8) from %ir.ptr), sext from i8> t0, t2, demo.bpf.c:3:12 t12: i64 = sign_extend t8, demo.bpf.c:3:12 t6: ch,glue = CopyToReg t8:1, Register:i64 $r0, t12, demo.bpf.c:3:5 t7: ch = BPFISD::RET_GLUE t6, Register:i64 $r0, t6:1, demo.bpf.c:3:5 ``` ...but now I can't have `zext` :-) - - - Looks like we can't have both `sext` and `zext` sub-word atomic load, unless we change `getExtendForAtomicOps()` and how it's used. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const { return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops); } +SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op, +SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic load is not supported"; + SDNode *N = Op.getNode(); + SDLoc DL(N); + + if (cast(N)->getMergedOrdering() == + AtomicOrdering::SequentiallyConsistent) +fail(DL, DAG, Msg); + + return Op; +} + +SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op, + SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic store is not supported"; + EVT VT = Op.getOperand(1).getValueType(); + SDNode *N = Op.getNode(); + SDLoc DL(N); + + // Promote operand #1 (value to store) if necessary. + if (!isTypeLegal(VT)) +return SDValue(); + + if (cast(N)->getMergedOrdering() == + AtomicOrdering::SequentiallyConsistent) +fail(DL, DAG, Msg); peilin-ye wrote: If user used `__atomic_store[_n]()` with `__ATOMIC_SEQ_CST`, we trigger this `if` clause and print out an error like: ``` demo.bpf.c:3:5: error: sequentially consistent (seq_cst) atomic store is not supported 3 | __atomic_store_n(ptr, val, __ATOMIC_SEQ_CST); | ^ ``` Or are you asking about something else? https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -48,6 +48,13 @@ def BPF_END : BPFArithOp<0xd>; def BPF_XCHG: BPFArithOp<0xe>; def BPF_CMPXCHG : BPFArithOp<0xf>; +class BPFAtomicLoadStoreOp val> { + bits<4> Value = val; +} + +def BPF_LOAD_ACQ : BPFAtomicLoadStoreOp<0x1>; +def BPF_STORE_REL : BPFAtomicLoadStoreOp<0xb>; peilin-ye wrote: > So I tempt to use 0x2. I see, let's just use `0x1` and `0x2`. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -0,0 +1,106 @@ +; RUN: llc < %s -march=bpfel -mcpu=v4 -verify-machineinstrs -show-mc-encoding \ +; RUN: | FileCheck -check-prefixes=CHECK-LE %s +; RUN: llc < %s -march=bpfeb -mcpu=v4 -verify-machineinstrs -show-mc-encoding \ +; RUN: | FileCheck -check-prefixes=CHECK-BE %s + +; Source: +; char load_acquire_i8(char *p) { +; return __atomic_load_n(p, __ATOMIC_ACQUIRE); +; } +; short load_acquire_i16(short *p) { +; return __atomic_load_n(p, __ATOMIC_ACQUIRE); +; } +; int load_acquire_i32(int *p) { +; return __atomic_load_n(p, __ATOMIC_ACQUIRE); +; } +; long load_acquire_i64(long *p) { +; return __atomic_load_n(p, __ATOMIC_ACQUIRE); +; } +; void store_release_i8(char *p, char v) { +; __atomic_store_n(p, v, __ATOMIC_RELEASE); +; } +; void store_release_i16(short *p, short v) { +; __atomic_store_n(p, v, __ATOMIC_RELEASE); +; } +; void store_release_i32(int *p, int v) { +; __atomic_store_n(p, v, __ATOMIC_RELEASE); +; } +; void store_release_i64(long *p, long v) { +; __atomic_store_n(p, v, __ATOMIC_RELEASE); +; } + +; CHECK-LABEL: load_acquire_i8 +; CHECK-LE: w0 = load_acquire((u8 *)(r1 + 0)) # encoding: [0xd3,0x10,0x00,0x00,0x10,0x00,0x00,0x00] +; CHECK-BE: w0 = load_acquire((u8 *)(r1 + 0)) # encoding: [0xd3,0x01,0x00,0x00,0x00,0x00,0x00,0x10] +define dso_local i8 @load_acquire_i8(ptr nocapture noundef readonly %p) local_unnamed_addr { +entry: peilin-ye wrote: Got it, thanks for reviewing this! I'll run `acquire-release.ll` through `update_llc_test_checks.py`. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const { return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops); } +SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op, +SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic load is not supported"; + SDNode *N = Op.getNode(); + SDLoc DL(N); + + if (cast(N)->getMergedOrdering() == + AtomicOrdering::SequentiallyConsistent) +fail(DL, DAG, Msg); + + return Op; +} + +SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op, + SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic store is not supported"; + EVT VT = Op.getOperand(1).getValueType(); + SDNode *N = Op.getNode(); + SDLoc DL(N); + + // Promote operand #1 (value to store) if necessary. + if (!isTypeLegal(VT)) +return SDValue(); + + if (cast(N)->getMergedOrdering() == + AtomicOrdering::SequentiallyConsistent) +fail(DL, DAG, Msg); peilin-ye wrote: I see. > Currently as per my understanding the new instructions does not have this > ordering field (the asm, the encoding)? Right now, `BPF_LOAD_ACQ` and `BPF_STORE_REL` use bit `4-7` of the `imm` field. If we ever need `SEQ_CST`, we can add a new flag to bit `0-3` to indicate "this atomic load is sequentially consistent", similar to what we already do for other `BPF_ATOMIC` instructions: e.g. `BPF_XOR` in `imm{4-7}` means "it's an atomic XOR", then `BPF_FETCH` in `imm{0-3}` indicates whether "it returns the old value" or not. - - - Right now, this PR does: ``` | imm{0-3} | imm{4-7}| - | | --- | load-acquire | 0x0 | BPF_LOAD_ACQ (0x1) | store-release | 0x0 | BPF_STORE_REL (0x2) | ``` Let me change it to (encodings unchanged): ``` | imm{0-3} | imm{4-7}| - | - | --- | load-acquire | BPF_ACQUIRE (0x0) | BPF_LOAD (0x1) | store-release | BPF_RELEASE (0x0) | BPF_STORE (0x2) | ``` So that, in the future, we can have e.g.: ``` | imm{0-3} | imm{4-7}| - | - | --- | load-acquire | BPF_ACQUIRE (0x0) | BPF_LOAD (0x1) | load-seq_cst | BPF_SEQ_CST (0x1) | BPF_LOAD (0x1) | store-release | BPF_RELEASE (0x0) | BPF_STORE (0x2) | store-seq_cst | BPF_SEQ_CST (0x1) | BPF_STORE (0x2) | ``` https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const { return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops); } +SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op, +SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic load is not supported"; + SDNode *N = Op.getNode(); + SDLoc DL(N); + + if (cast(N)->getMergedOrdering() == + AtomicOrdering::SequentiallyConsistent) +fail(DL, DAG, Msg); + + return Op; +} + +SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op, + SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic store is not supported"; + EVT VT = Op.getOperand(1).getValueType(); + SDNode *N = Op.getNode(); + SDLoc DL(N); + + // Promote operand #1 (value to store) if necessary. + if (!isTypeLegal(VT)) +return SDValue(); + + if (cast(N)->getMergedOrdering() == + AtomicOrdering::SequentiallyConsistent) +fail(DL, DAG, Msg); peilin-ye wrote: > Let me change it to (encodings unchanged): > ``` > | imm{0-3} | imm{4-7}| > - | - | --- | > load-acquire | BPF_ACQUIRE (0x0) | BPF_LOAD (0x1) | > store-release | BPF_RELEASE (0x0) | BPF_STORE (0x2) | > ``` Actually, let's change the encoding and define [all 6 of them](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/n4950.pdf#page=1817) once and for all: ``` namespace std { enum class memory_order : unspecified { relaxed, consume, acquire, release, acq_rel, seq_cst }; } ``` Will explain more after the push. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const { return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops); } +SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op, +SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic load is not supported"; + SDNode *N = Op.getNode(); + SDLoc DL(N); + + if (cast(N)->getMergedOrdering() == + AtomicOrdering::SequentiallyConsistent) +fail(DL, DAG, Msg); + + return Op; +} + +SDValue BPFTargetLowering::LowerATOMIC_STORE(SDValue Op, + SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic store is not supported"; + EVT VT = Op.getOperand(1).getValueType(); + SDNode *N = Op.getNode(); + SDLoc DL(N); + + // Promote operand #1 (value to store) if necessary. + if (!isTypeLegal(VT)) +return SDValue(); peilin-ye wrote: ```c // clang --target=bpf -mcpu=v4 -O2 -g -c -o demo.bpf.o demo.bpf.c void bar(char *ptr, char val) { __atomic_store_n(ptr, val, __ATOMIC_RELEASE); } ``` Compiling the above program would trigger that if clause: ``` ... llvm::SelectionDAG::LegalizeTypes() llvm::DAGTypeLegalizer::run() llvm::DAGTypeLegalizer::PromoteIntegerOperand() llvm::DAGTypeLegalizer::CustomLowerNode() llvm::TargetLowering::LowerOperationWrapper() llvm::BPFTargetLowering::LowerATOMIC_STORE() ``` BPF doesn't have native support for `i8` (see `BPFISelLowering.cpp`) : ```cpp // Set up the register classes. addRegisterClass(MVT::i64, &BPF::GPRRegClass); if (STI.getHasAlu32()) addRegisterClass(MVT::i32, &BPF::GPR32RegClass); ``` https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
https://github.com/peilin-ye updated https://github.com/llvm/llvm-project/pull/108636 >From 885d5141f6707a0fdf4be363351083f8fdf8fd54 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 06:44:21 + Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in BPFMISimplifyPatchable.cpp (NFC) We are planning to add load (specifically, atomic acquiring load, or "load-acquire") instructions under the STX instruction class. To make that easier, rename the isST*() and isLD*() helper functions based on what the instructions actually do, rather than their instruction class. --- .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +-- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp index 39390e8c38f8c1..4a1684ccebb793 100644 --- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp +++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp @@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction &MFParm) { LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n"); } -static bool isST(unsigned Opcode) { +static bool isStoreImm(unsigned Opcode) { return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm || Opcode == BPF::STW_imm || Opcode == BPF::STD_imm; } -static bool isSTX32(unsigned Opcode) { +static bool isStore32(unsigned Opcode) { return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32; } -static bool isSTX64(unsigned Opcode) { +static bool isStore64(unsigned Opcode) { return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW || Opcode == BPF::STD; } -static bool isLDX32(unsigned Opcode) { +static bool isLoad32(unsigned Opcode) { return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32; } -static bool isLDX64(unsigned Opcode) { +static bool isLoad64(unsigned Opcode) { return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW || Opcode == BPF::LDD; } -static bool isLDSX(unsigned Opcode) { +static bool isLoadSext(unsigned Opcode) { return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX; } bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) { - return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode); + return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode); } void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, @@ -143,11 +143,11 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, MachineInstr *DefInst = MO.getParent(); unsigned Opcode = DefInst->getOpcode(); unsigned COREOp; -if (isLDX64(Opcode) || isLDSX(Opcode)) +if (isLoad64(Opcode) || isLoadSext(Opcode)) COREOp = BPF::CORE_LD64; -else if (isLDX32(Opcode)) +else if (isLoad32(Opcode)) COREOp = BPF::CORE_LD32; -else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode)) +else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode)) COREOp = BPF::CORE_ST; else continue; @@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, // Reject the form: // %1 = ADD_rr %2, %3 // *(type *)(%2 + 0) = %1 -if (isSTX64(Opcode) || isSTX32(Opcode)) { +if (isStore64(Opcode) || isStore32(Opcode)) { const MachineOperand &Opnd = DefInst->getOperand(0); if (Opnd.isReg() && Opnd.getReg() == MO.getReg()) continue; >From db3cef60b0b8194e8ebd0ce0a0bc792f114c15f3 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 07:31:54 + Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v4. The following new flags are defined: BPF_ATOMIC_LOAD 0x10 BPF_ATOMIC_STORE 0x20 BPF_RELAXED: 0x0 BPF_ACQUIRE: 0x1 BPF_RELEASE: 0x2 BPF_ACQ_REL: 0x3 BPF_SEQ_CST: 0x4 A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_ATOMIC_LOAD | BPF_ACQUIRE (0x11). Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_ATOMIC_STORE | BPF_RELEASE (0x22). Unlike existing atomic operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDAPRH and friends. As an example, for -march=bpfel (big-endian): long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: db 10 00 00 11 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX imm (0x0011): BPF_ATOMIC_L
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: Pushed v9: - Rewrote `acquire-release.ll` to make it also test `__ATOMIC_RELAXED` (renamed to `atomic-load-store.ll`, just like e.g. `llvm/test/CodeGen/X86/atomic-load-store.ll`) - Style change as suggested by Matt https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
https://github.com/peilin-ye updated https://github.com/llvm/llvm-project/pull/108636 >From 885d5141f6707a0fdf4be363351083f8fdf8fd54 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 06:44:21 + Subject: [PATCH 1/3] [BPF] Rename isST*() and isLD*() functions in BPFMISimplifyPatchable.cpp (NFC) We are planning to add load (specifically, atomic acquiring load, or "load-acquire") instructions under the STX instruction class. To make that easier, rename the isST*() and isLD*() helper functions based on what the instructions actually do, rather than their instruction class. --- .../lib/Target/BPF/BPFMISimplifyPatchable.cpp | 22 +-- 1 file changed, 11 insertions(+), 11 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp index 39390e8c38f8c1..4a1684ccebb793 100644 --- a/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp +++ b/llvm/lib/Target/BPF/BPFMISimplifyPatchable.cpp @@ -94,35 +94,35 @@ void BPFMISimplifyPatchable::initialize(MachineFunction &MFParm) { LLVM_DEBUG(dbgs() << "*** BPF simplify patchable insts pass ***\n\n"); } -static bool isST(unsigned Opcode) { +static bool isStoreImm(unsigned Opcode) { return Opcode == BPF::STB_imm || Opcode == BPF::STH_imm || Opcode == BPF::STW_imm || Opcode == BPF::STD_imm; } -static bool isSTX32(unsigned Opcode) { +static bool isStore32(unsigned Opcode) { return Opcode == BPF::STB32 || Opcode == BPF::STH32 || Opcode == BPF::STW32; } -static bool isSTX64(unsigned Opcode) { +static bool isStore64(unsigned Opcode) { return Opcode == BPF::STB || Opcode == BPF::STH || Opcode == BPF::STW || Opcode == BPF::STD; } -static bool isLDX32(unsigned Opcode) { +static bool isLoad32(unsigned Opcode) { return Opcode == BPF::LDB32 || Opcode == BPF::LDH32 || Opcode == BPF::LDW32; } -static bool isLDX64(unsigned Opcode) { +static bool isLoad64(unsigned Opcode) { return Opcode == BPF::LDB || Opcode == BPF::LDH || Opcode == BPF::LDW || Opcode == BPF::LDD; } -static bool isLDSX(unsigned Opcode) { +static bool isLoadSext(unsigned Opcode) { return Opcode == BPF::LDBSX || Opcode == BPF::LDHSX || Opcode == BPF::LDWSX; } bool BPFMISimplifyPatchable::isLoadInst(unsigned Opcode) { - return isLDX32(Opcode) || isLDX64(Opcode) || isLDSX(Opcode); + return isLoad32(Opcode) || isLoad64(Opcode) || isLoadSext(Opcode); } void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, @@ -143,11 +143,11 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, MachineInstr *DefInst = MO.getParent(); unsigned Opcode = DefInst->getOpcode(); unsigned COREOp; -if (isLDX64(Opcode) || isLDSX(Opcode)) +if (isLoad64(Opcode) || isLoadSext(Opcode)) COREOp = BPF::CORE_LD64; -else if (isLDX32(Opcode)) +else if (isLoad32(Opcode)) COREOp = BPF::CORE_LD32; -else if (isSTX64(Opcode) || isSTX32(Opcode) || isST(Opcode)) +else if (isStore64(Opcode) || isStore32(Opcode) || isStoreImm(Opcode)) COREOp = BPF::CORE_ST; else continue; @@ -160,7 +160,7 @@ void BPFMISimplifyPatchable::checkADDrr(MachineRegisterInfo *MRI, // Reject the form: // %1 = ADD_rr %2, %3 // *(type *)(%2 + 0) = %1 -if (isSTX64(Opcode) || isSTX32(Opcode)) { +if (isStore64(Opcode) || isStore32(Opcode)) { const MachineOperand &Opnd = DefInst->getOperand(0); if (Opnd.isReg() && Opnd.getReg() == MO.getReg()) continue; >From cd4e9647f942a2f8d57fcd95219744d89f3e2cf3 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Sat, 5 Oct 2024 07:31:54 + Subject: [PATCH 2/3] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v4. The following new flags are defined: BPF_ATOMIC_LOAD 0x10 BPF_ATOMIC_STORE 0x20 BPF_RELAXED: 0x0 BPF_ACQUIRE: 0x1 BPF_RELEASE: 0x2 BPF_ACQ_REL: 0x3 BPF_SEQ_CST: 0x4 A "load-acquire" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_ATOMIC_LOAD | BPF_ACQUIRE (0x11). Similarly, a "store-release" is a BPF_STX | BPF_ATOMIC instruction with the 'imm' field set to BPF_ATOMIC_STORE | BPF_RELEASE (0x22). Unlike existing atomic operations that only support BPF_W (32-bit) and BPF_DW (64-bit) size modifiers, load-acquires and store-releases also support BPF_B (8-bit) and BPF_H (16-bit). An 8- or 16-bit load-acquire zero-extends the value before writing it to a 32-bit register, just like ARM64 instruction LDAPRH and friends. As an example, for -march=bpfel (big-endian): long foo(long *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } foo() can be compiled to: db 10 00 00 11 00 00 00 r0 = load_acquire((u64 *)(r1 + 0x0)) 95 00 00 00 00 00 00 00 exit opcode (0xdb): BPF_ATOMIC | BPF_DW | BPF_STX imm (0x0011): BPF_ATOMIC_L
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -703,6 +714,20 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const { return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops); } +SDValue BPFTargetLowering::LowerATOMIC_LOAD_STORE(SDValue Op, + SelectionDAG &DAG) const { + SDNode *N = Op.getNode(); + SDLoc DL(N); + + if (cast(N)->getMergedOrdering() == + AtomicOrdering::SequentiallyConsistent) +fail( +DL, DAG, +"sequentially consistent (seq_cst) atomic load/store is not supported"); peilin-ye wrote: Got it, thanks! https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
@@ -69,29 +69,25 @@ void BPFSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) { CPU = "v3"; if (CPU == "probe") CPU = sys::detail::getHostCPUNameForBPF(); - if (CPU == "generic" || CPU == "v1") + if (CPU.empty() || CPU == "generic" || CPU == "v1") peilin-ye wrote: Ah, right, because recently #107008 made it default to `"v3"`. Thanks, I didn't notice it when rebasing. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
peilin-ye wrote: @eddyz87, Thanks! I didn't know about `XXXISelLowering.cpp`. > But there should be a way to tweak existing `fail` function to stop after > errors are reported. Can we `exit(1)` ? :-) `fail()` calls `LLVMContext::diagnose()`, which already `exit(1)` when there's no "report handler", if "severity" is `DS_Error` : ```cpp if (DI.getSeverity() == DS_Error) exit(1); } ``` `fail()` uses `DiagnosticInfoUnsupported`, whose "severity" \_is\_ `DS_Error`, but our "report handler" (`pImpl->DiagHandler->handleDiagnostics()`) doesn't call `exit()` ... - - - I tried, based on your diff (`__ATOMIC_ACQ_REL` is illegal for `__atomic_{load,store}{,_n}()`, so we only need to handle `AtomicOrdering::SequentiallyConsistent`) : ```diff --- a/llvm/lib/Target/BPF/BPFISelLowering.cpp +++ b/llvm/lib/Target/BPF/BPFISelLowering.cpp @@ -93,6 +93,9 @@ BPFTargetLowering::BPFTargetLowering(const TargetMachine &TM, setOperationAction(ISD::ATOMIC_CMP_SWAP_WITH_SUCCESS, VT, Custom); } + for (auto VT : {MVT::i8, MVT::i16, MVT::i32, MVT::i64}) +setOperationAction(ISD::ATOMIC_LOAD, VT, Custom); + for (auto VT : { MVT::i32, MVT::i64 }) { if (VT == MVT::i32 && !STI.getHasAlu32()) continue; @@ -291,6 +294,8 @@ void BPFTargetLowering::ReplaceNodeResults( else Msg = "unsupported atomic operation, please use 64 bit version"; break; + case ISD::ATOMIC_LOAD: +return; } SDLoc DL(N); @@ -316,6 +321,8 @@ SDValue BPFTargetLowering::LowerOperation(SDValue Op, SelectionDAG &DAG) const { return LowerSDIVSREM(Op, DAG); case ISD::DYNAMIC_STACKALLOC: return LowerDYNAMIC_STACKALLOC(Op, DAG); + case ISD::ATOMIC_LOAD: +return LowerATOMIC_LOAD(Op, DAG); } } @@ -703,6 +710,22 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const { return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops); } +SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op, +SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic load is not supported"; + SDNode *N = Op.getNode(); + SDLoc DL(N); + + if (cast(N)->getMergedOrdering() == + AtomicOrdering::SequentiallyConsistent) { +fail(DL, DAG, Msg); +exit(1); + } + + return Op; +} + const char *BPFTargetLowering::getTargetNodeName(unsigned Opcode) const { switch ((BPFISD::NodeType)Opcode) { case BPFISD::FIRST_NUMBER: ``` ```diff --- a/llvm/lib/Target/BPF/BPFISelLowering.h +++ b/llvm/lib/Target/BPF/BPFISelLowering.h @@ -77,7 +77,7 @@ private: SDValue LowerDYNAMIC_STACKALLOC(SDValue Op, SelectionDAG &DAG) const; SDValue LowerBR_CC(SDValue Op, SelectionDAG &DAG) const; SDValue LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const; - + SDValue LowerATOMIC_LOAD(SDValue Op, SelectionDAG &DAG) const; SDValue LowerConstantPool(SDValue Op, SelectionDAG &DAG) const; SDValue LowerGlobalAddress(SDValue Op, SelectionDAG &DAG) const; ``` which seems to work nicely: ``` $ cat bar.c char foo(char *ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); } $ $ clang --target=bpf -mcpu=v5 -g bar.c > /dev/null bar.c:1:6: error: sequentially consistent (seq_cst) atomic load is not supported 1 | char foo(char *ptr) { return __atomic_load_n(ptr, __ATOMIC_SEQ_CST); } | ^ $ ``` https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
peilin-ye wrote: > lgtm, but please note CI failure: Oops, I'll take a closer look later. Thanks! https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
peilin-ye wrote: @eddyz87, > Sorry for the delayed reply. No worries at all! > I'd say let's go with what you suggest, but avoid exit(1) call. We need to > figure out how to do fail() w/o backtrace, but that is outside of the scope > for this pull request. Got it, rebased and added a 4th commit (can't say I like the one-commit-per-PR policy either :-) to do this in v3. Please take another look. - - - Also, just a heads-up: I think this PR is blocked by [an issue that I mentioned earlier on LKML](https://lore.kernel.org/all/zujillqbegosx...@google.com/) (Cc: @4ast). After that is resolved, I think we should change this PR to use `0x5` for the new mode modifiers (`BPF_MEMACQ`, `BPF_MEMREL`). https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
https://github.com/peilin-ye updated https://github.com/llvm/llvm-project/pull/108636 >From 68d003f7156b16656a11a1395b88b6fbed368401 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Thu, 15 Aug 2024 21:49:23 + Subject: [PATCH 1/4] [BPF] Refactor BPFSubtarget::initSubtargetFeatures() (NFC) Refactor it to make it match the style of BPFTargetInfo::getTargetDefines(), so that when we add -mcpu=v5 in the future, we can simply append e.g.: if (CpuVerNum >= 5) HasFoo = true; instead of saying: if (CPU == "v5") { HasJmpExt = true; HasJmp32 = true; HasAlu32 = true; HasLdsx = !Disable_ldsx; HasMovsx = !Disable_movsx; HasBswap = !Disable_bswap; HasSdivSmod = !Disable_sdiv_smod; HasGotol = !Disable_gotol; HasStoreImm = !Disable_StoreImm; HasFoo = true; } This also makes it clearer that newer "CPU"s always support older features. No functional changes intended. --- llvm/lib/Target/BPF/BPFSubtarget.cpp | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFSubtarget.cpp b/llvm/lib/Target/BPF/BPFSubtarget.cpp index 305e9a2bf2cda3..d5b20403d1e42d 100644 --- a/llvm/lib/Target/BPF/BPFSubtarget.cpp +++ b/llvm/lib/Target/BPF/BPFSubtarget.cpp @@ -71,27 +71,23 @@ void BPFSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) { CPU = sys::detail::getHostCPUNameForBPF(); if (CPU == "generic" || CPU == "v1") return; - if (CPU == "v2") { -HasJmpExt = true; -return; - } - if (CPU == "v3") { + + int CpuVerNum = CPU.back() - '0'; + if (CpuVerNum >= 2) HasJmpExt = true; + + if (CpuVerNum >= 3) { HasJmp32 = true; HasAlu32 = true; -return; } - if (CPU == "v4") { -HasJmpExt = true; -HasJmp32 = true; -HasAlu32 = true; + + if (CpuVerNum >= 4) { HasLdsx = !Disable_ldsx; HasMovsx = !Disable_movsx; HasBswap = !Disable_bswap; HasSdivSmod = !Disable_sdiv_smod; HasGotol = !Disable_gotol; HasStoreImm = !Disable_StoreImm; -return; } } >From a59b60c2e9d0b0f1e2ce3b25b70d46957af4d091 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Fri, 23 Aug 2024 19:11:05 + Subject: [PATCH 2/4] [BPF] Refactor {LOAD,STORE}{,32} classes (NFC) We will need different AsmString formats for load-acquire and store-release instructions. To make that easier, refactor {LOAD,STORE}{,32} classes to take AsmString as an argument directly. Add a BPFModeModifer parameter to STORE{,32} for similar reasons. No functional changes intended. --- llvm/lib/Target/BPF/BPFInstrInfo.td | 35 ++--- 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td index f7e17901c7ed5e..aab2291a70efc2 100644 --- a/llvm/lib/Target/BPF/BPFInstrInfo.td +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td @@ -497,12 +497,11 @@ def LD_pseudo } // STORE instructions -class STORE Pattern> -: TYPE_LD_ST Pattern> +: TYPE_LD_ST { + AsmString, Pattern> { bits<4> src; bits<20> addr; @@ -513,7 +512,7 @@ class STORE Pattern> } class STOREi64 -: STORE; +: STORE; let Predicates = [BPFNoALU32] in { def STW : STOREi64; @@ -567,12 +566,11 @@ let Predicates = [BPFHasALU32, BPFHasStoreImm] in { } // LOAD instructions -class LOAD Pattern> +class LOAD Pattern> : TYPE_LD_ST { + AsmString, Pattern> { bits<4> dst; bits<20> addr; @@ -583,7 +581,8 @@ class LOAD -: LOAD; +: LOAD; let isCodeGenOnly = 1 in { class CORE_LD @@ -1069,12 +1068,11 @@ def : Pat<(i32 (trunc GPR:$src)), def : Pat<(i64 (anyext GPR32:$src)), (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$src, sub_32)>; -class STORE32 Pattern> -: TYPE_LD_ST Pattern> +: TYPE_LD_ST { + AsmString, Pattern> { bits<4> src; bits<20> addr; @@ -1085,7 +1083,8 @@ class STORE32 Pattern> } class STOREi32 -: STORE32; +: STORE32; let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in { def STW32 : STOREi32; @@ -1093,12 +1092,11 @@ let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in { def STB32 : STOREi32; } -class LOAD32 Pattern> +class LOAD32 Pattern> : TYPE_LD_ST { +AsmString, Pattern> { bits<4> dst; bits<20> addr; @@ -1109,7 +1107,8 @@ class LOAD32 -: LOAD32; +: LOAD32; let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in { def LDW32 : LOADi32; >From 4c209da8a1ec4bfda71ffd0b51accae84a39bdd5 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Tue, 10 Sep 2024 23:09:28 + Subject: [PATCH 3/4] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v5. A "load_acquire" is a BPF_LDX instruction with a new mode modifier, BPF_MEMACQ ("acquiring atomic load"). Similarly, a "store_release" is a BPF_STX
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
https://github.com/peilin-ye edited https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
https://github.com/peilin-ye updated https://github.com/llvm/llvm-project/pull/108636 >From 087c7eb6e5fdd6866ee5209ce9cea93864dd3c6d Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Thu, 15 Aug 2024 21:49:23 + Subject: [PATCH 1/3] [BPF] Refactor BPFSubtarget::initSubtargetFeatures() (NFC) Refactor it to make it match the style of BPFTargetInfo::getTargetDefines(), so that when we add -mcpu=v5 in the future, we can simply append e.g.: if (CpuVerNum >= 5) HasFoo = true; instead of saying: if (CPU == "v5") { HasJmpExt = true; HasJmp32 = true; HasAlu32 = true; HasLdsx = !Disable_ldsx; HasMovsx = !Disable_movsx; HasBswap = !Disable_bswap; HasSdivSmod = !Disable_sdiv_smod; HasGotol = !Disable_gotol; HasStoreImm = !Disable_StoreImm; HasFoo = true; } This also makes it clearer that newer "CPU"s always support older features. No functional changes intended. --- llvm/lib/Target/BPF/BPFSubtarget.cpp | 18 +++--- 1 file changed, 7 insertions(+), 11 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFSubtarget.cpp b/llvm/lib/Target/BPF/BPFSubtarget.cpp index 305e9a2bf2cda3..d5b20403d1e42d 100644 --- a/llvm/lib/Target/BPF/BPFSubtarget.cpp +++ b/llvm/lib/Target/BPF/BPFSubtarget.cpp @@ -71,27 +71,23 @@ void BPFSubtarget::initSubtargetFeatures(StringRef CPU, StringRef FS) { CPU = sys::detail::getHostCPUNameForBPF(); if (CPU == "generic" || CPU == "v1") return; - if (CPU == "v2") { -HasJmpExt = true; -return; - } - if (CPU == "v3") { + + int CpuVerNum = CPU.back() - '0'; + if (CpuVerNum >= 2) HasJmpExt = true; + + if (CpuVerNum >= 3) { HasJmp32 = true; HasAlu32 = true; -return; } - if (CPU == "v4") { -HasJmpExt = true; -HasJmp32 = true; -HasAlu32 = true; + + if (CpuVerNum >= 4) { HasLdsx = !Disable_ldsx; HasMovsx = !Disable_movsx; HasBswap = !Disable_bswap; HasSdivSmod = !Disable_sdiv_smod; HasGotol = !Disable_gotol; HasStoreImm = !Disable_StoreImm; -return; } } >From 5b64851a977b51af96b1d86aba14ca70968ef4e0 Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Fri, 23 Aug 2024 19:11:05 + Subject: [PATCH 2/3] [BPF] Refactor {LOAD,STORE}{,32} classes (NFC) We will need different AsmString formats for load-acquire and store-release instructions. To make that easier, refactor {LOAD,STORE}{,32} classes to take AsmString as an argument directly. Add a BPFModeModifer parameter to STORE{,32} for similar reasons. No functional changes intended. --- llvm/lib/Target/BPF/BPFInstrInfo.td | 35 ++--- 1 file changed, 17 insertions(+), 18 deletions(-) diff --git a/llvm/lib/Target/BPF/BPFInstrInfo.td b/llvm/lib/Target/BPF/BPFInstrInfo.td index f7e17901c7ed5e..aab2291a70efc2 100644 --- a/llvm/lib/Target/BPF/BPFInstrInfo.td +++ b/llvm/lib/Target/BPF/BPFInstrInfo.td @@ -497,12 +497,11 @@ def LD_pseudo } // STORE instructions -class STORE Pattern> -: TYPE_LD_ST Pattern> +: TYPE_LD_ST { + AsmString, Pattern> { bits<4> src; bits<20> addr; @@ -513,7 +512,7 @@ class STORE Pattern> } class STOREi64 -: STORE; +: STORE; let Predicates = [BPFNoALU32] in { def STW : STOREi64; @@ -567,12 +566,11 @@ let Predicates = [BPFHasALU32, BPFHasStoreImm] in { } // LOAD instructions -class LOAD Pattern> +class LOAD Pattern> : TYPE_LD_ST { + AsmString, Pattern> { bits<4> dst; bits<20> addr; @@ -583,7 +581,8 @@ class LOAD -: LOAD; +: LOAD; let isCodeGenOnly = 1 in { class CORE_LD @@ -1069,12 +1068,11 @@ def : Pat<(i32 (trunc GPR:$src)), def : Pat<(i64 (anyext GPR32:$src)), (INSERT_SUBREG (i64 (IMPLICIT_DEF)), GPR32:$src, sub_32)>; -class STORE32 Pattern> -: TYPE_LD_ST Pattern> +: TYPE_LD_ST { + AsmString, Pattern> { bits<4> src; bits<20> addr; @@ -1085,7 +1083,8 @@ class STORE32 Pattern> } class STOREi32 -: STORE32; +: STORE32; let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in { def STW32 : STOREi32; @@ -1093,12 +1092,11 @@ let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in { def STB32 : STOREi32; } -class LOAD32 Pattern> +class LOAD32 Pattern> : TYPE_LD_ST { +AsmString, Pattern> { bits<4> dst; bits<20> addr; @@ -1109,7 +1107,8 @@ class LOAD32 -: LOAD32; +: LOAD32; let Predicates = [BPFHasALU32], DecoderNamespace = "BPFALU32" in { def LDW32 : LOADi32; >From 692d1309c42711397feb780f2e37c1495cecc51c Mon Sep 17 00:00:00 2001 From: Peilin Ye Date: Tue, 10 Sep 2024 23:09:28 + Subject: [PATCH 3/3] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 As discussed in [1], introduce BPF instructions with load-acquire and store-release semantics under -mcpu=v5. A "load_acquire" is a BPF_LDX instruction with a new mode modifier, BPF_MEMACQ ("acquiring atomic load"). Similarly, a "store_release" is a BPF_STX
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
peilin-ye wrote: (pushed v2 to resolve the easier issues first :-) - deleted redundant `CPU.empty()` check - for now, fall back to `ACQUIRE` and `RELEASE` if user requested weaker memory orders (`RELAXED` or `CONSUME`), until we actually support them - took Eduard's suggestion in #107343 to use `foreach`, thanks! I couldn't find a way to generate a better error message for `__ATOMIC_SEQ_CST`, however; it seems that CodeGen simply calls `CannotYetSelect()` if nothing in `MatcherTable` matches. Any suggestions? https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
peilin-ye wrote: Hi @eddyz87, thanks for the review and context! > ``` > lock *(u64 *)(r1 + 0x0) = r2 release > lock r2 = *(u64 *)(r1 + 0x0) acquire > ``` Appending `acquire` and `release` does sound nice to me since it makes the syntax more similar to LLVM IR (e.g. `store atomic i64 %v, ptr %p release, align 8`), and it'd also be a bit easier when we support other memory ordering types in the future e.g. we can just append `seq_cst` instead of having to say `load_seq_cst(...)`. However, I didn't want to use `lock` because I feel like it's too similar to the x86 `LOCK` prefix (implies a full memory barrier, which could be confusing here). WDYT? Cc: @yonghong-song @4ast > Also a question regarding 3 commits in one pull request. As far as I > understand [current](https://llvm.org/docs/GitHub.html) policy, the idea is > to have a separate pull request for each commit (and fork branches from one > another to build a stack). Otherwise the commits are squashed. What's the > idea with current pull request? Got it, I'll split this into separate PRs later. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
@@ -621,6 +642,16 @@ let Predicates = [BPFHasLdsx] in { def LDD : LOADi64; +class acquiring_load +: PatFrag<(ops node:$ptr), (base node:$ptr)> { + let IsAtomic = 1; peilin-ye wrote: Got it, I'll make it generate acquire and release for weaker types, and look into how to make the error message more useful for stronger types. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
@@ -522,6 +526,28 @@ let Predicates = [BPFNoALU32] in { } def STD : STOREi64; +class relaxed_store + : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> { + let IsAtomic = 1; + let IsAtomicOrderingReleaseOrStronger = 0; +} + +class releasing_store + : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> { + let IsAtomic = 1; + let IsAtomicOrderingRelease = 1; +} peilin-ye wrote: For `ATOMIC_STORE`: `IsAtomicOrderingRelease = 1` only matches `RELEASE`. `IsAtomicOrderingReleaseOrStronger = 1` matches both `RELEASE` and `SEQ_CST` (sequentially consistent). - - - For ARM64, it looks like that `releasing_store<>` has been intended to match both `RELEASE` and `SEQ_CST` since at least commit 00ed9964c659 ("ARM64: initial backend import"): ```c // A store operation that actually needs release semantics. class releasing_store : PatFrag<(ops node:$ptr, node:$val), (base node:$ptr, node:$val), [{ AtomicOrdering Ordering = cast(N)->getOrdering(); assert(Ordering != AcquireRelease && "unexpected store ordering"); return Ordering == Release || Ordering == SequentiallyConsistent; }]>; ``` Looking at `llvm/test/CodeGen/AArch64/fast-isel-atomic.ll`, it seems that a "`store atomic` `release`" and a "`store atomic` `seq_cst`" can be turned into the same ARM64 insn: ``` ; CHECK-LABEL: atomic_store_release_16: ; CHECK-NEXT: // %bb.0: ; CHECK-NEXT: stlrh w1, [x0] ; CHECK-NEXT: ret define void @atomic_store_release_16(ptr %p, i16 %val) #0 { store atomic i16 %val, ptr %p release, align 2 ret void } ``` ``` ; CHECK-LABEL: atomic_store_seq_cst_16: ; CHECK-NEXT: // %bb.0: ; CHECK-NEXT: stlrh w1, [x0] ; CHECK-NEXT: ret define void @atomic_store_seq_cst_16(ptr %p, i16 %val) #0 { store atomic i16 %val, ptr %p seq_cst, align 2 ret void } ``` I haven't read the manual to understand why yet, however. - - - For us, I used `IsAtomicOrderingRelease` since I didn't want to match `seq_cst` (the 4th commit in this PR makes it generate an "unsupported" error if user called `__atomic_{load,store}[_n]()` with `__ATOMIC_SEQ_CST`, as suggested by Eduard). https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
peilin-ye wrote: Hi @yonghong-song, No worries, and thanks for taking a look at this! > So far, I suggest to stick to cpu v4. I see, I can do that. > is it possible to add acquire/release support in BPF_ATOMIC? We might want to > reserve BPFModelModifier 7 for future use. Sounds good, that way we also avoid the "`5` and `7` are already internally used by the verifier" problem mentioned earlier. I'll try adding one or more new flags to the `imm` field. > should we do sign-extension here? w0 = load_acquire((s16 *)(r1 + 0x0))? Not sure about this one. I'll learn about `MEMSX` and look at other backends. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v5 (PR #108636)
@@ -522,6 +526,28 @@ let Predicates = [BPFNoALU32] in { } def STD : STOREi64; +class relaxed_store + : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> { + let IsAtomic = 1; + let IsAtomicOrderingReleaseOrStronger = 0; +} + +class releasing_store + : PatFrag<(ops node:$val, node:$ptr), (base node:$val, node:$ptr)> { + let IsAtomic = 1; + let IsAtomicOrderingRelease = 1; +} peilin-ye wrote: Eduard [suggested earlier](https://github.com/llvm/llvm-project/pull/108636#discussion_r1765597286) (referencing [Andrii's suggestion](https://github.com/llvm/llvm-project/pull/107343#issuecomment-2334664935) in #107343) that, for now, we "fall back" to `ACQUIRE` or `RELEASE` if user requested `RELAXED`, until we actually support `RELAXED` in the future. > Is my interpretation correct or __atomic_load_n(ptr, __ATOMIC_RELAXED) could > have semantics beyond '*ptr'? `__ATOMIC_RELAXED` still guarantees atomicity but `*ptr` does not, if i'm understanding correctly? https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: @yonghong-song, back to your example: > ``` > $ cat t4.c > short foo(short *ptr) { > return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); > } > ``` > ``` > : >0: e9 10 00 00 00 00 00 00 w0 = load_acquire((u16 *)(r1 + 0x0)) >1: 95 00 00 00 00 00 00 00 exit > ``` Assuming that the above `load_acquire` zero-extends. If the calling convention here is "`foo()`'s return value is represented by the entire `w0` half-register", then yes, we're doing it wrong. However, if I do this: ```c // clang --target=bpf -mcpu=v4 -O2 -g -c -o test.bpf.o test.bpf.c __attribute__((noinline)) short bar(short *ptr) { return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); } long foo(short *ptr) { return bar(ptr); } ``` ``` : ; return __atomic_load_n(ptr, __ATOMIC_ACQUIRE); 0: cb 10 00 00 10 00 00 00 w0 = load_acquire((u16 *)(r1 + 0x0)) 1: 95 00 00 00 00 00 00 00 exit 0010 : ; return bar(ptr); 2: 85 10 00 00 ff ff ff ff call -0x1 3: bf 00 10 00 00 00 00 00 r0 = (s16)r0 4: 95 00 00 00 00 00 00 00 exit ``` Things still work, because `foo()` treats `bar()`'s return value as `s16`. - - - It's the same story for ARM64. If I compile the above program with `--target=aarch64`: ``` : 0: 48dffc00 ldarh w0, [x0] 4: d65f03c0 ret 0008 : 8: a9bf7bfd stp x29, x30, [sp, #-0x10]! c: 910003fd mov x29, sp 10: 9400 bl 0x10 14: 93403c00 sxthx0, w0 18: a8c17bfd ldp x29, x30, [sp], #0x10 1c: d65f03c0 ret ``` `ldarh` zero-extends the halfword: > Load-Acquire Register Halfword derives an address from a base register value, > loads a halfword from memory, zero-extends it, and writes it to a register. Then the caller, `foo()`, does a `sxth` to sign-extend the halfword: > Sign Extend Halfword extracts a 16-bit value, sign-extends it to the size of > the register, and writes the result to the destination register. - - - Is it fair to say that your example demonstrated expected behavior? https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
@@ -703,6 +715,39 @@ SDValue BPFTargetLowering::LowerSELECT_CC(SDValue Op, SelectionDAG &DAG) const { return DAG.getNode(BPFISD::SELECT_CC, DL, VTs, Ops); } +SDValue BPFTargetLowering::LowerATOMIC_LOAD(SDValue Op, +SelectionDAG &DAG) const { + const char *Msg = + "sequentially consistent (seq_cst) atomic load is not supported"; peilin-ye wrote: Thanks for taking a look at this! - - - > Just pass this directly without the message variable? Then after `git clang-format` it would look like: ```c if (cast(N)->getMergedOrdering() == AtomicOrdering::SequentiallyConsistent) fail(DL, DAG, "sequentially consistent (seq_cst) atomic store is not supported"); ``` I'd prefer keeping it as-is. https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
https://github.com/peilin-ye edited https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[clang] [llvm] [BPF] Add load-acquire and store-release instructions under -mcpu=v4 (PR #108636)
peilin-ye wrote: When landing, please simply land this PR as-is - considering that I've made several references to this PR on the mailing list, and the two other commits except the "main" one are pretty small, I no longer think it's worth it to restructure this into stacked PRs. For future contributions, I'll make sure to use stacked PRs from the beginning and avoid squashing/force-pushing. Thanks! - - - Pushed v11: - Rebased, `ninja check-all` passed - Changed encoding to `BPF_LOAD_ACQ=0x100`, `BPF_STORE_REL=0x110` per [mailing list discussion](https://lore.kernel.org/bpf/z7axcswd-topj...@google.com/) https://github.com/llvm/llvm-project/pull/108636 ___ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits