Re: [cfe-users] eBPF: Odd optimization results with clang-5.0

Yonghong Song via cfe-users Mon, 15 Jan 2018 10:08:37 -0800


On 1/8/18 5:45 AM, Jiong Wang wrote:

On 05/01/2018 20:05, Alexei Starovoitov wrote:

On Fri, Jan 5, 2018 at 7:01 AM, Charlemagne Lasse
<[email protected]> wrote:

First thing is the odd way how 8 bit loads to an uint8_t are handled
(see bug1_sec):


I could reproduce both issues on other targets on latest LLVm trunk@321940,
for example AArch64 (need to remove asm("llvm.bpf.load.byte") from the
testcase.

For the first issue, it seems to be i8/i16 will be promoted to i32, so for
bug1_sec, the sequence is:

         t6: i32 = truncate t5
       t8: i32 = and t6, Constant:i32<255>
     t9: i64 = any_extend t8

while for ok1, it is;

         t6: i32 = truncate t5
     t9: i64 = any_extend t6

For ok1 sequence, LLVM is doing (aext (trunx x)) -> x, while for bug1_sec
sequence, LLVM is not doing combination which is correct as it doesn't
understand the return value of llvm.bpf.load.byte is zero extended to i64
so combine the bug1_sec sequence will lost the effect of and instruction.


Thanks for investigation.

Looks like the IR before "and" operation is introduced, IR looks like
  %call = call i64 @llvm.bpf.load.byte(i8* %0, i64 0)
  %conv = trunc i64 %call to i8
  %conv1 = zext i8 %conv to i32
  ret i32 %conv1

and the "Combine redundant instructions" phase changes it to:
  %call = call i64 @llvm.bpf.load.byte(i8* %0, i64 0)
  %conv = trunc i64 %call to i32
  %conv1 = and i32 %conv, 255
  ret i32 %conv1

while for ok1, IR looks like:
  %call = call i64 @llvm.bpf.load.byte(i8* %0, i64 0)
  %conv = trunc i64 %call to i32
  ret i32 %conv

One thing we could do is to do this optimization at BPF backend during
DAG2DAG transformation, since it understands the llvm.bpf.load.byte
semantics.

For unknown reasons, the line "6:" was changed from an JNE to an JEQ.


LLVM is doing geneirc canonicalizations inside

lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp:

// If the lhs block is the next block, invert the condition so that we can
// fall through to the lhs instead of the rhs block.


I disabled this optimization and original condition "==" is preserved,
we still have the inefficient code:
Disassembly of section bug2_sec:
bug2:
       0:       bf 16 00 00 00 00 00 00         r6 = r1
       1:       b7 07 00 00 00 00 00 00         r7 = 0
       2:       30 00 00 00 00 00 00 00         r0 = *(u8 *)skb[0]
       3:       15 00 01 00 01 00 00 00         if r0 == 1 goto +1 <LBB4_1>
       4:       05 00 04 00 00 00 00 00         goto +4 <LBB4_3>

LBB4_1:
       5:       30 00 00 00 01 00 00 00         r0 = *(u8 *)skb[1]
       6:       b7 07 00 00 15 00 00 00         r7 = 21
       7:       15 00 01 00 01 00 00 00         if r0 == 1 goto +1 <LBB4_3>
       8:       b7 07 00 00 00 00 00 00         r7 = 0

LBB4_3:
       9:       bf 70 00 00 00 00 00 00         r0 = r7
      10:       95 00 00 00 00 00 00 00         exit

Right, the insn 7 and 8 can be removed. But since the "switch" to "cond"
transformation happens in insn selection, it may be too late for
the redundant condition elimination...
_______________________________________________
cfe-users mailing list
[email protected]
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-users

Re: [cfe-users] eBPF: Odd optimization results with clang-5.0

Reply via email to