HaohaiWen wrote: > > Thanks for the updated example! > > > > To explain what I meant in first comment using this example: We would > > perform the transform https://alive2.llvm.org/ce/z/nllcB_, which does not > > depend at all on how `%yx` is constructed, and whether there is any way to > > form the `fshl` separately. If the `%yx` is appropriately constructed, the > > `fshl` can be removed (https://alive2.llvm.org/ce/z/B_KOwv, another missing > > transform). > > > > Is this not a viable approach? Is there a concern here that generating both > > fshl and bitreverse may be non-profitable for targets without bitreverse? > > Or maybe supporting this makes the matching too expensive? > > It's absolutely a feasible solution. > > -------------------------------------- > > Solution1: > First optimize bitreverse then eliminate redundant fshl: > https://alive2.llvm.org/ce/z/g_gWf3 > This requires > a) First teach collectBitParts to not only search until unknown opcode, but > also try to use itself as root. > b) Teach recognizeBSwapOrBitReverseIdiom to recognize bit pattern [n/2-1, > ..., 1, 0, n-1, n-2, .... n/2]. Then insert bitreverse and fshl. > c) Teach instcombine to remove redundant fshl if opposite concat exists. This > requires to scan def-users chains. > > Advantage: > 1). Even if we can't eliminate fshl, we can still optimize a bunch of IR to > fshl+bitreverse. Don't know whether its profitable for most targets. > > -------------------------------------- > Solution2: > First optimize or to fshl then optimize bitreverse: > https://alive2.llvm.org/ce/z/WbzJVo > This requires > a) What we did in this PR. This requires to scan def-users chains. > b) same as step a) in Solution 1. > > Advantage: > 1). Can optimize more opposite concat pattern to fshl. It's beneficial for > targets with cycle rotate instruction (e.g. rol in x86). > 2). More easily for implementation. Do not requires step b) in Solution1. > > -------------------------------------- > > Both solutions requires to scan def-users chains. I don't think this is an > issue. > Both solutions can handle my cases. Solution2 is easier to implementation. > Any concern about this PR? > I think b) in solution1 can be implemented in the future if we want both > advantages of solution1 and 2. InstCombine will always first try to match > fshl then bitreverse. Therefore with solution2 and b) of solution1, we don't > need to implent c) in solution1 at all. > > Ref for bitreverse optimization: > https://github.com/llvm/llvm-project/blob/38b34c61e028751b6778493d6185d07a8af1a3b5/llvm/lib/Transforms/InstCombine/InstCombineAndOrXor.cpp#L2686
Comments? https://github.com/llvm/llvm-project/pull/68502 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits