nikic added a comment.

In D150670#4352055 <https://reviews.llvm.org/D150670#4352055>, @pmatos wrote:

> In D150670#4351147 <https://reviews.llvm.org/D150670#4351147>, @nikic wrote:
>
>> This doesn't looks like a wasm specific problem. You get essentially the 
>> same issue on any target that has a rotate instruction but no funnel shift 
>> instruction. Here are just a couple examples: https://godbolt.org/z/8v6nfaax9
>
> Yes, I am indeed aware this is not specific to wasm. What's specific to wasm 
> afaiu is that the code generated is much worse when expanding fshl. That's 
> what I mentioned in the bug discussion here: 
> https://github.com/llvm/llvm-project/issues/62703#issuecomment-1548474310
>
>> I believe this needs to be either solved by preventing demanded bits 
>> simplifications that break a rotate pattern (though I'm not sure if that 
>> would break any other optimizations we care about) or by adding a special 
>> case for this in the backend when lowering FSH to ROT.
>
> Preventing the simplification means adding target specific code in 
> instcombine which seems even worse than adding it here given as @dschuff 
> pointed out, there's precedent with x86.

I'm not suggesting to add any target specific code to instcombine. I think 
there are actually quite a few different ways this could be solved. See 
https://llvm.godbolt.org/z/f55K7K17W for three possible representations of the 
same rotate pattern.

1. Say that we prefer preserving rotates over "simplifying" funnel shifts 
(ending up with the rot2 pattern). Basically by skipping the optimization at 
https://github.com/llvm/llvm-project/blob/7f54b38e28b3b66195de672848f2b5366d0d51e3/llvm/lib/Transforms/InstCombine/InstCombineSimplifyDemanded.cpp#L927-L931
 if both fsh operands are the same. Assuming this doesn't cause test 
regressions, I think this would be acceptable to do. From a backend 
perspective, even for targets that have a native funnel shift (aarch64, x86), 
the difference between the rot1/rot2 patterns looks pretty neutral.

2. Undo the transform in the backend. It is bidrectional 
(https://alive2.llvm.org/ce/z/Chb85F), so this is possible.  This would need an 
extra legalization/combiner pattern (depending on where we form ROTs). 
Advantage of undoing the pattern is the usual one: Works if it was in the 
undesirable form in the first place. (E.g. this could happen if the rotate did 
not use a builtin but was implemented as `x << 8 | x >> 24`, which is probably 
much more widespread than the builtin. Though I checked, and we don't form a 
funnel shift for it in this case right now.)

3. Move the and from the fsh arguments to the result. This is the rot3 pattern. 
This seems to produce the best codegen on average, because it can use uxtb16 on 
ARM. Moving the and from args to return is a bit unusual for unary ops, but if 
we see this as moving two ands on both fsh arguments (which happen to be the 
same) to one on the result, that would be a pretty standard transform.

I think all of those options are viable, and couldn't say for certain which one 
is best. I think any of them would be better than making clang emit a special 
intrinsic just for wasm though.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D150670/new/

https://reviews.llvm.org/D150670

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to