Patryk27 added a comment.

Now that I think about it, there //might// be a simpler way!

So, the underlying issue is that sometimes LLVM spawns new shifts during the 
instruction selection pass - for instance given this IR¹:

  define i64 @test(i64 %x, i32 %y) {
  start:
    %0 = or i32 %y, 38
    %1 = zext i32 %0 to i64
    %2 = lshr i64 %x, %1
    ret i64 %2
  }

... this 64-bit shift will not get considered by our shift-expansion pass due 
to a condition here:

https://github.com/llvm/llvm-project/blob/eaaacc3c651e5b2c23bfa9648b6b0d69aab64d00/llvm/lib/Target/AVR/AVRShiftExpand.cpp#L54

... but later, during isel, LLVM **will** reduce this 64-bit shift into a 
32-bit shift² and then ask us to expand this brand new 32-bit shift - that 
causes a panic since we expect those to have been already expanded.

In order to solve this issue, I think we could simply expand //all// 
variable-shifts greater than i8 (instead of expanding only 32-bit shifts):

  diff --git a/llvm/lib/Target/AVR/AVRShiftExpand.cpp 
b/llvm/lib/Target/AVR/AVRShiftExpand.cpp
  index b7dcd860467d..4ea6d9fdb57c 100644
  --- a/llvm/lib/Target/AVR/AVRShiftExpand.cpp
  +++ b/llvm/lib/Target/AVR/AVRShiftExpand.cpp
  @@ -51,8 +51,7 @@ bool AVRShiftExpand::runOnFunction(Function &F) {
       if (!I.isShift())
         // Only expand shift instructions (shl, lshr, ashr).
         continue;
  -    if (I.getType() != Type::getInt32Ty(Ctx))
  -      // Only expand plain i32 types.
  +    if (I.getType() == Type::getInt8Ty(Ctx))
         continue;
       if (isa<ConstantInt>(I.getOperand(1)))
         // Only expand when the shift amount is not known.
  @@ -75,7 +74,7 @@ bool AVRShiftExpand::runOnFunction(Function &F) {
   void AVRShiftExpand::expand(BinaryOperator *BI) {
     auto &Ctx = BI->getContext();
     IRBuilder<> Builder(BI);
  -  Type *Int32Ty = Type::getInt32Ty(Ctx);
  +  Type *InputTy = cast<Instruction>(BI)->getType();
     Type *Int8Ty = Type::getInt8Ty(Ctx);
     Value *Int8Zero = ConstantInt::get(Int8Ty, 0);
   
  @@ -101,7 +100,7 @@ void AVRShiftExpand::expand(BinaryOperator *BI) {
     Builder.SetInsertPoint(LoopBB);
     PHINode *ShiftAmountPHI = Builder.CreatePHI(Int8Ty, 2);
     ShiftAmountPHI->addIncoming(ShiftAmount, BB);
  -  PHINode *ValuePHI = Builder.CreatePHI(Int32Ty, 2);
  +  PHINode *ValuePHI = Builder.CreatePHI(InputTy, 2);
     ValuePHI->addIncoming(BI->getOperand(0), BB);
   
     // Subtract the shift amount by one, as we're shifting one this loop
  @@ -116,13 +115,13 @@ void AVRShiftExpand::expand(BinaryOperator *BI) {
     Value *ValueShifted;
     switch (BI->getOpcode()) {
     case Instruction::Shl:
  -    ValueShifted = Builder.CreateShl(ValuePHI, ConstantInt::get(Int32Ty, 1));
  +    ValueShifted = Builder.CreateShl(ValuePHI, ConstantInt::get(InputTy, 1));
       break;
     case Instruction::LShr:
  -    ValueShifted = Builder.CreateLShr(ValuePHI, ConstantInt::get(Int32Ty, 
1));
  +    ValueShifted = Builder.CreateLShr(ValuePHI, ConstantInt::get(InputTy, 
1));
       break;
     case Instruction::AShr:
  -    ValueShifted = Builder.CreateAShr(ValuePHI, ConstantInt::get(Int32Ty, 
1));
  +    ValueShifted = Builder.CreateAShr(ValuePHI, ConstantInt::get(InputTy, 
1));
       break;
     default:
       llvm_unreachable("asked to expand an instruction that is not a shift");
  @@ -137,7 +136,7 @@ void AVRShiftExpand::expand(BinaryOperator *BI) {
     // Collect the resulting value. This is necessary in the IR but won't 
produce
     // any actual instructions.
     Builder.SetInsertPoint(BI);
  -  PHINode *Result = Builder.CreatePHI(Int32Ty, 2);
  +  PHINode *Result = Builder.CreatePHI(InputTy, 2);
     Result->addIncoming(BI->getOperand(0), BB);
     Result->addIncoming(ValueShifted, LoopBB);

Overall, this solution seems to work - I've checked it on a couple of Rust 
applications and the binaries behave correctly, both some simple and more 
complex ones.

I think the only disadvantage of this approach, as compared to expanding shifts 
during isel, are optimizations: expanding shifts eagerly means that we'll lose 
some of the optimizations we could have applied otherwise.

For instance, following that first example from Rust's standard library, we 
will expand that instruction into a 64-bit shift even though in principle a 
32-bit shift would suffice (but we don't know that yet during shift-expansion 
pass).

For safety, we could implement this simpler approach first, as presented in the 
diff here, and maybe come back to merging shift-expansion with isel in the 
future - what do you think about it?

¹ minimized case from Rust's standard library - originally: 
`_ZN4core3num7dec2flt6lemire13compute_float17hc1d4de6247502c96E`
² I'm not 100% sure why, though - seems to be somehow related to this `or + 
zext` combination


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D153197/new/

https://reviews.llvm.org/D153197

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to