Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

Jatin Bhateja Mon, 14 Oct 2024 05:15:27 -0700

On Fri, 11 Oct 2024 17:12:49 GMT, Jasmine Karthikeyan 
<[email protected]> wrote:


> > I am having a similar idea that is to group those transformations together 
> > into a `Phase` called `PhaseLowering`
> 
> I think such a phase could be quite useful in general. Recently I was trying 
> to implement the BMI1 instruction `bextr` for better performance with bit 
> masks, but ran into a problem where it doesn't have an immediate encoding so 
> we'd need to manifest a constant into a temporary register every time. With 
> an (x86-specific) ideal node, we could simply let the register allocator 
> handle placing the constant. It would also be nice to avoid needing to put 
> similar backend-specific lowerings (such as `MacroLogicV`) in shared code.

Hey @jaskarth , @merykitty ,  we already have an infrastructure where during 
parsing we create Macro Nodes which can be lowered / expanded to multiple IRs 
nodes during macro expansion, what we need in this case is a target specific IR 
pattern check since not all targets may support 32x32 multiplication with 
quadword saturation, idea is to avoid creating a new IR and piggyback needed on 
existing MulVL IR, we already use such tricks for relaxed unsafe reductions. 
Patch is performing point optimization for specific set of constrained 
multiplication patterns.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2411053693

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

Reply via email to