On Mon, 14 Oct 2024 12:12:58 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:
>>> I am having a similar idea that is to group those transformations together >>> into a `Phase` called `PhaseLowering` >> >> I think such a phase could be quite useful in general. Recently I was trying >> to implement the BMI1 instruction `bextr` for better performance with bit >> masks, but ran into a problem where it doesn't have an immediate encoding so >> we'd need to manifest a constant into a temporary register every time. With >> an (x86-specific) ideal node, we could simply let the register allocator >> handle placing the constant. It would also be nice to avoid needing to put >> similar backend-specific lowerings (such as `MacroLogicV`) in shared code. > >> > I am having a similar idea that is to group those transformations together >> > into a `Phase` called `PhaseLowering` >> >> I think such a phase could be quite useful in general. Recently I was trying >> to implement the BMI1 instruction `bextr` for better performance with bit >> masks, but ran into a problem where it doesn't have an immediate encoding so >> we'd need to manifest a constant into a temporary register every time. With >> an (x86-specific) ideal node, we could simply let the register allocator >> handle placing the constant. It would also be nice to avoid needing to put >> similar backend-specific lowerings (such as `MacroLogicV`) in shared code. > > Hey @jaskarth , @merykitty , we already have an infrastructure where during > parsing we create Macro Nodes which can be lowered / expanded to multiple IRs > nodes during macro expansion, what we need in this case is a target specific > IR pattern check since not all targets may support 32x32 multiplication with > quadword saturation, idea is to avoid creating a new IR and piggyback needed > information on existing MulVL IR, we already use such tricks for relaxed > unsafe reductions. Going forward, infusion of KnownBits into our data flow > analysis infrastructure will streamline such optimizations, this patch is > performing point optimization for specific set of constrained multiplication > patterns. @jatin-bhateja That is machine-independent lowering, we are talking about machine-dependent lowering to which `MacroLogicV` transformation belongs. You can have `phaselowering_x86` and not have to add another method to `Matcher` as well as add default implementations to various architecture files. You can reuse `MulVL` node for that but I believe these transformations should be done as late as possible. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411389030