| Issue |
175030
|
| Summary |
SelectionDAG Scheduler Caused Unnecessary Large Ppills For Bpf Programs
|
| Labels |
new issue
|
| Assignees |
|
| Reporter |
yonghong-song
|
The issue is reported by @dramforever:
https://github.com/llvm/llvm-project/issues/164792
The original code:
```
$ cat t.c
int meow[512 / sizeof(int)];
void cpy(char *dest)
{
__builtin_memcpy(dest, meow, 512);
}
```
With latest llvm22 or (llvm20, llvm21),
```
$ clang --target=bpf -O2 -c t.c -mcpu=v1 <=== success
$ clang --target=bpf -O2 -c t.c -mcpu=v3 <=== error
<source>:3:6: error: Looks like the BPF stack limit is exceeded. Please move large on stack variables into BPF per-cpu array map. For non-kernel uses, the stack can be increased using -mllvm -bpf-stack-size.
...
```
So at cpu v1, the stack usage is 0, while at cpu v3, the stack usage is 968. The kernel bpf allows
maximum stack size of 512. So the stack usage 968 exceeds 512 and bpf back issues an error for it.
To simply the debugging, I changed the code as below
```
$ cat t.c
#define SIZE 40
int meow[SIZE / sizeof(int)];
void cpy(char *dest)
{
__builtin_memcpy(dest, (char *)meow, SIZE);
}
```
With below cpu v1 with max allowed stack size 8 bytes:
```
$ clang --target=bpf -O2 -c t.c -mcpu=v1 -mllvm -bpf-stack-size=8 -mllvm -debug-_only_=isel
...
===== Instruction selection ends:
Selected selection DAG: %bb.0 'cpy:entry'
SelectionDAG has 127 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t88: i64 = LDIMM64 TargetGlobalAddress:i64<ptr @meow> 0
t81: i64,ch = LDW<Mem:(dereferenceable load (s32) from @meow)> t88, TargetConstant:i64<0>, t0
t54: i64,ch = LDW<Mem:(dereferenceable load (s32) from @meow + 36)> t88, TargetConstant:i64<36>, t0
t60: i64,ch = LDW<Mem:(dereferenceable load (s32) from @meow + 28)> t88, TargetConstant:i64<28>, t0
t66: i64,ch = LDW<Mem:(dereferenceable load (s32) from @meow + 20)> t88, TargetConstant:i64<20>, t0
t72: i64,ch = LDW<Mem:(dereferenceable load (s32) from @meow + 12)> t88, TargetConstant:i64<12>, t0
t70: i64,ch = LDW<Mem:(dereferenceable load (s32) from @meow + 8)> t88, TargetConstant:i64<8>, t0
t64: i64,ch = LDW<Mem:(dereferenceable load (s32) from @meow + 16)> t88, TargetConstant:i64<16>, t0
t58: i64,ch = LDW<Mem:(dereferenceable load (s32) from @meow + 24)> t88, TargetConstant:i64<24>, t0
t52: i64,ch = LDW<Mem:(dereferenceable load (s32) from @meow + 32)> t88, TargetConstant:i64<32>, t0
t83: i64,ch = LDW<Mem:(dereferenceable load (s32) from @meow + 4)> t88, TargetConstant:i64<4>, t0
t254: ch = STB<Mem:(store (s8) into %ir.dest)> t81, t2, TargetConstant:i64<0>, t0
t253: i64 = SRL_ri t81, TargetConstant:i64<8>
t229: ch = STB<Mem:(store (s8) into %ir.dest + 1)> t253, t2, TargetConstant:i64<1>, t0
t95: i64 = SRL_ri t81, TargetConstant:i64<16>
t222: ch = STB<Mem:(store (s8) into %ir.dest + 2)> t95, t2, TargetConstant:i64<2>, t0
t258: i64 = SRL_ri t81, TargetConstant:i64<24>
t224: ch = STB<Mem:(store (s8) into %ir.dest + 3)> t258, t2, TargetConstant:i64<3>, t0
t237: ch = STB<Mem:(store (s8) into %ir.dest + 4)> t83, t2, TargetConstant:i64<4>, t0
t236: i64 = SRL_ri t83, TargetConstant:i64<8>
t239: ch = STB<Mem:(store (s8) into %ir.dest + 5)> t236, t2, TargetConstant:i64<5>, t0
t89: i64 = SRL_ri t83, TargetConstant:i64<16>
t232: ch = STB<Mem:(store (s8) into %ir.dest + 6)> t89, t2, TargetConstant:i64<6>, t0
t252: i64 = SRL_ri t83, TargetConstant:i64<24>
t234: ch = STB<Mem:(store (s8) into %ir.dest + 7)> t252, t2, TargetConstant:i64<7>, t0
t275: ch = STB<Mem:(store (s8) into %ir.dest + 8)> t70, t2, TargetConstant:i64<8>, t0
t272: i64 = SRL_ri t70, TargetConstant:i64<8>
t209: ch = STB<Mem:(store (s8) into %ir.dest + 9)> t272, t2, TargetConstant:i64<9>, t0
t105: i64 = SRL_ri t70, TargetConstant:i64<16>
t202: ch = STB<Mem:(store (s8) into %ir.dest + 10)> t105, t2, TargetConstant:i64<10>, t0
t280: i64 = SRL_ri t70, TargetConstant:i64<24>
t204: ch = STB<Mem:(store (s8) into %ir.dest + 11)> t280, t2, TargetConstant:i64<11>, t0
t217: ch = STB<Mem:(store (s8) into %ir.dest + 12)> t72, t2, TargetConstant:i64<12>, t0
t216: i64 = SRL_ri t72, TargetConstant:i64<8>
t219: ch = STB<Mem:(store (s8) into %ir.dest + 13)> t216, t2, TargetConstant:i64<13>, t0
t100: i64 = SRL_ri t72, TargetConstant:i64<16>
t212: ch = STB<Mem:(store (s8) into %ir.dest + 14)> t100, t2, TargetConstant:i64<14>, t0
t271: i64 = SRL_ri t72, TargetConstant:i64<24>
t214: ch = STB<Mem:(store (s8) into %ir.dest + 15)> t271, t2, TargetConstant:i64<15>, t0
t297: ch = STB<Mem:(store (s8) into %ir.dest + 16)> t64, t2, TargetConstant:i64<16>, t0
t294: i64 = SRL_ri t64, TargetConstant:i64<8>
t189: ch = STB<Mem:(store (s8) into %ir.dest + 17)> t294, t2, TargetConstant:i64<17>, t0
t115: i64 = SRL_ri t64, TargetConstant:i64<16>
t182: ch = STB<Mem:(store (s8) into %ir.dest + 18)> t115, t2, TargetConstant:i64<18>, t0
t302: i64 = SRL_ri t64, TargetConstant:i64<24>
t184: ch = STB<Mem:(store (s8) into %ir.dest + 19)> t302, t2, TargetConstant:i64<19>, t0
t197: ch = STB<Mem:(store (s8) into %ir.dest + 20)> t66, t2, TargetConstant:i64<20>, t0
t196: i64 = SRL_ri t66, TargetConstant:i64<8>
t199: ch = STB<Mem:(store (s8) into %ir.dest + 21)> t196, t2, TargetConstant:i64<21>, t0
t110: i64 = SRL_ri t66, TargetConstant:i64<16>
t192: ch = STB<Mem:(store (s8) into %ir.dest + 22)> t110, t2, TargetConstant:i64<22>, t0
t293: i64 = SRL_ri t66, TargetConstant:i64<24>
t194: ch = STB<Mem:(store (s8) into %ir.dest + 23)> t293, t2, TargetConstant:i64<23>, t0
t319: ch = STB<Mem:(store (s8) into %ir.dest + 24)> t58, t2, TargetConstant:i64<24>, t0
t316: i64 = SRL_ri t58, TargetConstant:i64<8>
t169: ch = STB<Mem:(store (s8) into %ir.dest + 25)> t316, t2, TargetConstant:i64<25>, t0
t125: i64 = SRL_ri t58, TargetConstant:i64<16>
t162: ch = STB<Mem:(store (s8) into %ir.dest + 26)> t125, t2, TargetConstant:i64<26>, t0
t324: i64 = SRL_ri t58, TargetConstant:i64<24>
t164: ch = STB<Mem:(store (s8) into %ir.dest + 27)> t324, t2, TargetConstant:i64<27>, t0
t177: ch = STB<Mem:(store (s8) into %ir.dest + 28)> t60, t2, TargetConstant:i64<28>, t0
t176: i64 = SRL_ri t60, TargetConstant:i64<8>
t179: ch = STB<Mem:(store (s8) into %ir.dest + 29)> t176, t2, TargetConstant:i64<29>, t0
t120: i64 = SRL_ri t60, TargetConstant:i64<16>
t172: ch = STB<Mem:(store (s8) into %ir.dest + 30)> t120, t2, TargetConstant:i64<30>, t0
t315: i64 = SRL_ri t60, TargetConstant:i64<24>
t174: ch = STB<Mem:(store (s8) into %ir.dest + 31)> t315, t2, TargetConstant:i64<31>, t0
t341: ch = STB<Mem:(store (s8) into %ir.dest + 32)> t52, t2, TargetConstant:i64<32>, t0
t338: i64 = SRL_ri t52, TargetConstant:i64<8>
t149: ch = STB<Mem:(store (s8) into %ir.dest + 33)> t338, t2, TargetConstant:i64<33>, t0
t135: i64 = SRL_ri t52, TargetConstant:i64<16>
t141: ch = STB<Mem:(store (s8) into %ir.dest + 34)> t135, t2, TargetConstant:i64<34>, t0
t346: i64 = SRL_ri t52, TargetConstant:i64<24>
t144: ch = STB<Mem:(store (s8) into %ir.dest + 35)> t346, t2, TargetConstant:i64<35>, t0
t157: ch = STB<Mem:(store (s8) into %ir.dest + 36)> t54, t2, TargetConstant:i64<36>, t0
t156: i64 = SRL_ri t54, TargetConstant:i64<8>
t159: ch = STB<Mem:(store (s8) into %ir.dest + 37)> t156, t2, TargetConstant:i64<37>, t0
t130: i64 = SRL_ri t54, TargetConstant:i64<16>
t152: ch = STB<Mem:(store (s8) into %ir.dest + 38)> t130, t2, TargetConstant:i64<38>, t0
t337: i64 = SRL_ri t54, TargetConstant:i64<24>
t154: ch = STB<Mem:(store (s8) into %ir.dest + 39)> t337, t2, TargetConstant:i64<39>, t0
t405: ch = TokenFactor t81:1, t83:1, t254, t229, t222, t224, t237, t239, t232, t234, t70:1, t72:1, t275, t209, t202, t204, t217, t219, t212, t214, t64:1, t66:1, t297, t189, t182, t184, t197, t199, t192, t194, t58:1, t60:1, t319, t169, t162, t164, t177, t179, t172, t174, t52:1, t54:1, t341, t149, t141, t144, t157, t159, t152, t154
t30: ch = RET t405
Total amount of phi nodes to update: 0
*** MachineFunction at end of ISel ***
# Machine code for function cpy: IsSSA, TracksLiveness
Function Live Ins: $r1 in %0
bb.0.entry:
liveins: $r1
%0:gpr = COPY $r1
%1:gpr = LDIMM64 @meow
%2:gpr = LDW %1:gpr, 36 :: (dereferenceable load (s32) from @meow + 36)
%3:gpr = SRL_ri %2:gpr(tied-def 0), 24
STB killed %3:gpr, %0:gpr, 39 :: (store (s8) into %ir.dest + 39)
%4:gpr = SRL_ri %2:gpr(tied-def 0), 16
STB killed %4:gpr, %0:gpr, 38 :: (store (s8) into %ir.dest + 38)
STB %2:gpr, %0:gpr, 36 :: (store (s8) into %ir.dest + 36)
%5:gpr = SRL_ri %2:gpr(tied-def 0), 8
STB killed %5:gpr, %0:gpr, 37 :: (store (s8) into %ir.dest + 37)
%6:gpr = LDW %1:gpr, 32 :: (dereferenceable load (s32) from @meow + 32)
%7:gpr = SRL_ri %6:gpr(tied-def 0), 24
STB killed %7:gpr, %0:gpr, 35 :: (store (s8) into %ir.dest + 35)
%8:gpr = SRL_ri %6:gpr(tied-def 0), 16
STB killed %8:gpr, %0:gpr, 34 :: (store (s8) into %ir.dest + 34)
STB %6:gpr, %0:gpr, 32 :: (store (s8) into %ir.dest + 32)
%9:gpr = SRL_ri %6:gpr(tied-def 0), 8
STB killed %9:gpr, %0:gpr, 33 :: (store (s8) into %ir.dest + 33)
%10:gpr = LDW %1:gpr, 28 :: (dereferenceable load (s32) from @meow + 28)
%11:gpr = SRL_ri %10:gpr(tied-def 0), 24
STB killed %11:gpr, %0:gpr, 31 :: (store (s8) into %ir.dest + 31)
%12:gpr = SRL_ri %10:gpr(tied-def 0), 16
STB killed %12:gpr, %0:gpr, 30 :: (store (s8) into %ir.dest + 30)
STB %10:gpr, %0:gpr, 28 :: (store (s8) into %ir.dest + 28)
%13:gpr = SRL_ri %10:gpr(tied-def 0), 8
STB killed %13:gpr, %0:gpr, 29 :: (store (s8) into %ir.dest + 29)
%14:gpr = LDW %1:gpr, 24 :: (dereferenceable load (s32) from @meow + 24)
%15:gpr = SRL_ri %14:gpr(tied-def 0), 24
STB killed %15:gpr, %0:gpr, 27 :: (store (s8) into %ir.dest + 27)
%16:gpr = SRL_ri %14:gpr(tied-def 0), 16
STB killed %16:gpr, %0:gpr, 26 :: (store (s8) into %ir.dest + 26)
STB %14:gpr, %0:gpr, 24 :: (store (s8) into %ir.dest + 24)
%17:gpr = SRL_ri %14:gpr(tied-def 0), 8
STB killed %17:gpr, %0:gpr, 25 :: (store (s8) into %ir.dest + 25)
%18:gpr = LDW %1:gpr, 20 :: (dereferenceable load (s32) from @meow + 20)
%19:gpr = SRL_ri %18:gpr(tied-def 0), 24
STB killed %19:gpr, %0:gpr, 23 :: (store (s8) into %ir.dest + 23)
%20:gpr = SRL_ri %18:gpr(tied-def 0), 16
STB killed %20:gpr, %0:gpr, 22 :: (store (s8) into %ir.dest + 22)
STB %18:gpr, %0:gpr, 20 :: (store (s8) into %ir.dest + 20)
%21:gpr = SRL_ri %18:gpr(tied-def 0), 8
STB killed %21:gpr, %0:gpr, 21 :: (store (s8) into %ir.dest + 21)
%22:gpr = LDW %1:gpr, 16 :: (dereferenceable load (s32) from @meow + 16)
%23:gpr = SRL_ri %22:gpr(tied-def 0), 24
STB killed %23:gpr, %0:gpr, 19 :: (store (s8) into %ir.dest + 19)
%24:gpr = SRL_ri %22:gpr(tied-def 0), 16
STB killed %24:gpr, %0:gpr, 18 :: (store (s8) into %ir.dest + 18)
STB %22:gpr, %0:gpr, 16 :: (store (s8) into %ir.dest + 16)
%25:gpr = SRL_ri %22:gpr(tied-def 0), 8
STB killed %25:gpr, %0:gpr, 17 :: (store (s8) into %ir.dest + 17)
%26:gpr = LDW %1:gpr, 12 :: (dereferenceable load (s32) from @meow + 12)
%27:gpr = SRL_ri %26:gpr(tied-def 0), 24
STB killed %27:gpr, %0:gpr, 15 :: (store (s8) into %ir.dest + 15)
%28:gpr = SRL_ri %26:gpr(tied-def 0), 16
STB killed %28:gpr, %0:gpr, 14 :: (store (s8) into %ir.dest + 14)
STB %26:gpr, %0:gpr, 12 :: (store (s8) into %ir.dest + 12)
%29:gpr = SRL_ri %26:gpr(tied-def 0), 8
STB killed %29:gpr, %0:gpr, 13 :: (store (s8) into %ir.dest + 13)
%30:gpr = LDW %1:gpr, 8 :: (dereferenceable load (s32) from @meow + 8)
%31:gpr = SRL_ri %30:gpr(tied-def 0), 24
STB killed %31:gpr, %0:gpr, 11 :: (store (s8) into %ir.dest + 11)
%32:gpr = SRL_ri %30:gpr(tied-def 0), 16
STB killed %32:gpr, %0:gpr, 10 :: (store (s8) into %ir.dest + 10)
STB %30:gpr, %0:gpr, 8 :: (store (s8) into %ir.dest + 8)
%33:gpr = SRL_ri %30:gpr(tied-def 0), 8
STB killed %33:gpr, %0:gpr, 9 :: (store (s8) into %ir.dest + 9)
%34:gpr = LDW %1:gpr, 4 :: (dereferenceable load (s32) from @meow + 4)
%35:gpr = SRL_ri %34:gpr(tied-def 0), 24
STB killed %35:gpr, %0:gpr, 7 :: (store (s8) into %ir.dest + 7)
%36:gpr = SRL_ri %34:gpr(tied-def 0), 16
STB killed %36:gpr, %0:gpr, 6 :: (store (s8) into %ir.dest + 6)
STB %34:gpr, %0:gpr, 4 :: (store (s8) into %ir.dest + 4)
%37:gpr = SRL_ri %34:gpr(tied-def 0), 8
STB killed %37:gpr, %0:gpr, 5 :: (store (s8) into %ir.dest + 5)
%38:gpr = LDW %1:gpr, 0 :: (dereferenceable load (s32) from @meow)
%39:gpr = SRL_ri %38:gpr(tied-def 0), 24
STB killed %39:gpr, %0:gpr, 3 :: (store (s8) into %ir.dest + 3)
%40:gpr = SRL_ri %38:gpr(tied-def 0), 16
STB killed %40:gpr, %0:gpr, 2 :: (store (s8) into %ir.dest + 2)
STB %38:gpr, %0:gpr, 0 :: (store (s8) into %ir.dest)
%41:gpr = SRL_ri %38:gpr(tied-def 0), 8
STB killed %41:gpr, %0:gpr, 1 :: (store (s8) into %ir.dest + 1)
RET
# End machine code for function cpy.
```
The below cpu v3 with max allowed stack size 8 bytes:
```
$ clang --target=bpf -O2 -c t.c -mcpu=v3 -mllvm -bpf-stack-size=8 -mllvm -debug-_only_=isel
...
===== Instruction selection ends:
Selected selection DAG: %bb.0 'cpy:entry'
SelectionDAG has 178 nodes:
t0: ch,glue = EntryToken
t2: i64,ch = CopyFromReg t0, Register:i64 %0
t88: i64 = LDIMM64 TargetGlobalAddress:i64<ptr @meow> 0
t81: i64,ch = SUBREG_TO_REG TargetConstant:i64<0>, t496, TargetConstant:i32<1>, t496:1
t54: i64,ch = SUBREG_TO_REG TargetConstant:i64<0>, t495, TargetConstant:i32<1>, t495:1
t60: i64,ch = SUBREG_TO_REG TargetConstant:i64<0>, t494, TargetConstant:i32<1>, t494:1
t66: i64,ch = SUBREG_TO_REG TargetConstant:i64<0>, t493, TargetConstant:i32<1>, t493:1
t72: i64,ch = SUBREG_TO_REG TargetConstant:i64<0>, t492, TargetConstant:i32<1>, t492:1
t70: i64,ch = SUBREG_TO_REG TargetConstant:i64<0>, t491, TargetConstant:i32<1>, t491:1
t64: i64,ch = SUBREG_TO_REG TargetConstant:i64<0>, t490, TargetConstant:i32<1>, t490:1
t58: i64,ch = SUBREG_TO_REG TargetConstant:i64<0>, t489, TargetConstant:i32<1>, t489:1
t52: i64,ch = SUBREG_TO_REG TargetConstant:i64<0>, t488, TargetConstant:i32<1>, t488:1
t83: i64,ch = SUBREG_TO_REG TargetConstant:i64<0>, t486, TargetConstant:i32<1>, t486:1
t486: i32,ch = LDW32<Mem:(dereferenceable load (s32) from @meow + 4)> t88, TargetConstant:i64<4>, t0
t488: i32,ch = LDW32<Mem:(dereferenceable load (s32) from @meow + 32)> t88, TargetConstant:i64<32>, t0
t489: i32,ch = LDW32<Mem:(dereferenceable load (s32) from @meow + 24)> t88, TargetConstant:i64<24>, t0
t490: i32,ch = LDW32<Mem:(dereferenceable load (s32) from @meow + 16)> t88, TargetConstant:i64<16>, t0
t491: i32,ch = LDW32<Mem:(dereferenceable load (s32) from @meow + 8)> t88, TargetConstant:i64<8>, t0
t492: i32,ch = LDW32<Mem:(dereferenceable load (s32) from @meow + 12)> t88, TargetConstant:i64<12>, t0
t493: i32,ch = LDW3<truncated>Please see the issue for the entire body.
_______________________________________________
llvm-bugs mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs