On Thu, 17 Oct 2024 19:40:52 GMT, Jatin Bhateja <jbhat...@openjdk.org> wrote:

>> MulVL (VectorCastI2X src1) (VectorCastI2X src2)

> It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, ...

Hm, I don't see any problems with it if `VPMULDQ` is used. Sign extension 
becomes redundant when 64-bit multiplication is strength-reduced to 32-bit one 
(32x32->64). Am I missing something important here?

>> IMO until C2 type system starts to track bitwise constant information 
>> ([JDK-8001436](https://bugs.openjdk.org/browse/JDK-8001436) et al), there 
>> are not enough benefits to rely on IGVN here. So far, all the discussed 
>> patterns are simple enough for matcher to handle them without too much 
>> tweaking.
>
> Hi @iwanowww , 
> I have implemented additional pattern you suggested.
> In addition re-wiring pattern inputs to MulVL IR to avoid emitting upper 
> doubleword clearing logic in applicable scenarios. 
> 
> Hi @jaskarth , @merykitty ,
> As discussed, waiting on PhaseLowering skeleton to move some part of this 
> patch to x86 specific lowering pass.

Thanks, @jatin-bhateja. I took a look at the latest version and still think 
that IGVN is not the best place for it. 

First of all, flags on MulVL feel too adhoc and irregular. The original IR 
structure is still there (except the cases when inputs are rewired), so can be 
easily recomputed on demand.

I noticed that the patterns can be generalized: what matters is whether upper 
half is filled with zeros/sign bits or not, so small enough masks (and large 
enough shifts) are amenable to the same optimization. But, in such case, input 
rewiring becomes applicable only to particular constant inputs.

(BTW signed right shifts can be optimized in a similar way, since they populate 
upper half with the sign-bit.)

So, IMO the best way to move this particular enhancement forward is:
* perform the transformation during matching;
* match a single MulVL node and shape the checks on argument shape as 
predicates on AD instructions 
   * setting lower instruction costs should tell the matcher to prefer new 
specific instructions over generic ones;
* avoid input rewiring for now (VPMULDQ/VPMULUDQ give enough performance 
improvement on its own).

-------------

PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2420668490
PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2436528498

Reply via email to