On Tue, 12 Nov 2024 21:49:22 GMT, Vladimir Ivanov <vliva...@openjdk.org> wrote:
>>> In the latest version you added new Ideal nodes (`MulIVL` and `MulUIVL`). I >>> don't see a compelling reason to do so. IMO matcher functionality is more >>> than enough to cover `VPMULDQ` case. `MulIVL` is equivalent to `MulVL` + >>> `has_int_inputs()` predicate. For `MulUIVL` you additionally do input >>> rewiring (using `forward_masked_input`), but (1) `AndV src (Replicate >>> 0xFFFFFFFF))` operands can be easily detected on matcher side (with an >>> extra AD instruction); and (2) such optimization is limited because it is >>> valid only for `0xFFFFFFFF` case while `has_uint_inputs() == true` for `C >>> <= 0xFFFFFFFF`. >>> >>> So, IMO `MulIVL` and `MulUIVL` nodes just add noise in Ideal graph without >>> improving situation during matching. >> >> Hi @iwanowww , >> Problem occurs only if AndV gets shared; in such a case, matcher will not be >> able to identify the constrained multiplication pattern and absorb the >> masking pattern. Specialized IR overrules such limitations and shields the >> pattern from downstream optimization passes, thereby removing any >> non-determinism. In addition, it facilitates forwarding inputs to the >> multiplier, the new IR is explicit in its semantics of considering only >> lower doublewords of quadword lanes for multiplication, hence we can safely >> save emitting redundant input masking instructions. We already have >> specialized IR nodes like MulAddVS2VINode and I see these new IR nodes >> similar to it. > > @jatin-bhateja in case when `AndV` is shared, it can't be eliminated unless > all users absorb it. For such cases, matcher can perform adhoc node cloning, > but in this particular case it looks like an overkill either way. IMO the > pattern is too niche to focus on it (either to justify input forwarding or > adhoc handling on matcher side). > > It's good you mentioned `MulAddVS2VI`. On one hand, VNNI operations are more > complex (similar to FMA), so such complexity *may* be justified there. On the > other hand, it doesn't look like VNNI support in C2 age well. It is tied to > auto-vectorizer and, by now, Vector API doesn't benefit from it. So, instead > of doubling down on `MulAddVS2VI` path, I'd prefer to leave it aside and > reimplement it later in a more maintainable manner. Hi @iwanowww , your closing comments addressed, kindly re-approve, the change is mainly in newly added test file. ------------- PR Comment: https://git.openjdk.org/jdk/pull/21244#issuecomment-2490437406