On Thu, 12 Sep 2024 13:13:01 GMT, Roman Kennke <rken...@openjdk.org> wrote:

>>> @rkennke Can you please explain the changes in these tests:
>>> 
>>> ```
>>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
>>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
>>> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
>>> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
>>> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
>>> ```
>>> 
>>> You added these IR rule restriction: `@IR(applyIf = 
>>> {"UseCompactObjectHeaders", "false"},`
>>> 
>>> This means that if `UseCompactObjectHeaders` is enabled, vectorization 
>>> seems to be impacted - that could be concerning because it has a 
>>> performance impact.
>>> 
>>> I have recently changed a few things in SuperWord, so maybe some of them 
>>> can be removed, because they now vectorize anyway?
>>> 
>>> Of course some special tests may just rely on `UseCompactObjectHeaders == 
>>> false` - but I would need some comments in the tests where you added it to 
>>> justify why we add the restriction.
>>> 
>>> Please also test this patch with the cross combinations of 
>>> `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add 
>>> `VerifyAlignVector` as well).
>> 
>> IIRC (it has been a while), the problem is that with Lilliput (and also 
>> without compact headers, but disabling compressed class-pointers 
>> -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] 
>> start at different offsets (12 and 16, respectively). That is because with 
>> compact headers, we are using the 4 bytes after the arraylength, but 
>> long-arrays cannot do that because of alignment constraints. The impact is 
>> that these tests don't work as expected, because vectorization triggers 
>> differently. I don't remember the details, TBH, but I believe they would now 
>> generate pre-loops, or some might even not vectorize at all. Those seemed to 
>> be use-cases that did not look very important, but I may be wrong. It would 
>> be nice to properly fix those tests, or make corresponding tests for compact 
>> headers, instead, or improve vectorization to better deal with the offset 
>> mismatch, if necessary/possible.
>> 
>> I will re-evaluate those tests, and add comments or remove the restrictions.
>
>> > > @rkennke Can you please explain the changes in these tests:
>> > > ```
>> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java
>> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java
>> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java
>> > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java
>> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java
>> > > ```
>> > > 
>> > > 
>> > >     
>> > >       
>> > >     
>> > > 
>> > >       
>> > >     
>> > > 
>> > >     
>> > >   
>> > > You added these IR rule restriction: `@IR(applyIf = 
>> > > {"UseCompactObjectHeaders", "false"},`
>> > > This means that if `UseCompactObjectHeaders` is enabled, vectorization 
>> > > seems to be impacted - that could be concerning because it has a 
>> > > performance impact.
>> > > I have recently changed a few things in SuperWord, so maybe some of them 
>> > > can be removed, because they now vectorize anyway?
>> > > Of course some special tests may just rely on `UseCompactObjectHeaders 
>> > > == false` - but I would need some comments in the tests where you added 
>> > > it to justify why we add the restriction.
>> > > Please also test this patch with the cross combinations of 
>> > > `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and 
>> > > add `VerifyAlignVector` as well).
>> > 
>> > 
>> > IIRC (it has been a while), the problem is that with Lilliput (and also 
>> > without compact headers, but disabling compressed class-pointers 
>> > -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] 
>> > start at different offsets (12 and 16, respectively). That is because with 
>> > compact headers, we are using the 4 bytes after the arraylength, but 
>> > long-arrays cannot do that because of alignment constraints. The impact is 
>> > that these tests don't work as expected, because vectorization triggers 
>> > differently. I don't remember the details, TBH, but I believe they would 
>> > now generate pre-loops, or some might even not vectorize at all. Those 
>> > seemed to be use-cases that did not look very important, but I may be 
>> > wrong. It would be nice to properly fix those tests, or make corresponding 
>> > tests for compact headers, instead, or improve vectorization to better 
>> > deal with the offset mismatch, if necessary/possible.
>> > I will re-evaluate those tests, and add comments or remove the 
>> > restrictions.
>> 
>> If it has indeed been a while, then it might well be that some of them work 
>> now, since I did make some improvements to auto-vectorization ...

> `LoadNKlass` nodes can then be expanded into more primitive operations (load 
> and shift for compact headers, load with `klass_offset_in_bytes()` for 
> original headers) within C2's back-end or even during code emission as 
> sketched 
> [here](https://github.com/robcasloz/jdk/commit/6cb4219f101e3be982264071c2cb1d0af1c6d754).
>  @rkennke is this similar to what you tried out ("Expanding it as a macro")?

No, this is not what I tried. I tried to completely expand LoadNKlass, and 
replace it with the lower nodes that load and shift the mark-word right there, 
in ideal graph. But your approach is saner: there is so much implicit knowledge 
about Load(N)Klass, and even klass_offset_in_bytes(), all over the place, it 
would be very hard to get this right without breaking something.

-------------

PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2352926265

Reply via email to