HaohaiWen wrote:

> > Thanks for the updated example!
> > 
> > To explain what I meant in first comment using this example: We would 
> > perform the transform https://alive2.llvm.org/ce/z/nllcB_, which does not 
> > depend at all on how `%yx` is constructed, and whether there is any way to 
> > form the `fshl` separately. If the `%yx` is appropriately constructed, the 
> > `fshl` can be removed (https://alive2.llvm.org/ce/z/B_KOwv, another missing 
> > transform).
> > 
> > Is this not a viable approach? Is there a concern here that generating both 
> > fshl and bitreverse may be non-profitable for targets without bitreverse? 
> > Or maybe supporting this makes the matching too expensive?
> 
> It's absolutely a feasible solution.
> 
> --------------------------------------
> 
> Solution1:
> First optimize bitreverse then eliminate redundant fshl: 
> https://alive2.llvm.org/ce/z/g_gWf3
> This requires
> a) First teach collectBitParts to not only search until unknown opcode, but 
> also try to use itself as root.
> b) Teach recognizeBSwapOrBitReverseIdiom to recognize bit pattern [n/2-1, 
> ..., 1, 0, n-1, n-2, .... n/2]. Then insert bitreverse and fshl.
> c) Teach instcombine to remove redundant fshl if opposite concat exists. This 
> requires to scan def-users chains.
> 
> Advantage: 
> 1). Even if we can't eliminate fshl, we can still optimize a bunch of IR to 
> fshl+bitreverse. Don't know whether its profitable for most targets.
> 
> --------------------------------------
> Solution2:
> First optimize or to fshl then optimize bitreverse: 
> https://alive2.llvm.org/ce/z/WbzJVo
> This requires
> a) What we did in this PR. This requires to scan def-users chains.
> b) same as step a) in Solution 1.
> 
> Advantage: 
> 1). Can optimize more opposite concat pattern to fshl. It's beneficial for 
> targets with cycle rotate instruction (e.g. rol in x86).
> 2). More easily for implementation. Do not requires step b) in Solution1.
> 
> --------------------------------------
> 
> Both solutions requires to scan def-users chains. I don't think this is an 
> issue.
> Both solutions can handle my cases. Solution2 is easier to implementation. 
> Any concern about this PR?
> I think b) in solution1 can be implemented in the future if we want both 
> advantages of solution1 and 2. InstCombine will always first try to match 
> fshl then bitreverse. Therefore with solution2 and b) of solution1, we don't 
> need to implent c) in solution1 at all.
> 
> Ref for bitreverse optimization: 
> https://github.com/llvm/llvm-project/blob/38b34c61e028751b6778493d6185d07a8af1a3b5/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L2686

Comments?

https://github.com/llvm/llvm-project/pull/68502
_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to