On Thu, 12 Sep 2024 13:13:01 GMT, Roman Kennke <rken...@openjdk.org> wrote:
>>> @rkennke Can you please explain the changes in these tests: >>> >>> ``` >>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >>> test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >>> test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >>> ``` >>> >>> You added these IR rule restriction: `@IR(applyIf = >>> {"UseCompactObjectHeaders", "false"},` >>> >>> This means that if `UseCompactObjectHeaders` is enabled, vectorization >>> seems to be impacted - that could be concerning because it has a >>> performance impact. >>> >>> I have recently changed a few things in SuperWord, so maybe some of them >>> can be removed, because they now vectorize anyway? >>> >>> Of course some special tests may just rely on `UseCompactObjectHeaders == >>> false` - but I would need some comments in the tests where you added it to >>> justify why we add the restriction. >>> >>> Please also test this patch with the cross combinations of >>> `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and add >>> `VerifyAlignVector` as well). >> >> IIRC (it has been a while), the problem is that with Lilliput (and also >> without compact headers, but disabling compressed class-pointers >> -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] >> start at different offsets (12 and 16, respectively). That is because with >> compact headers, we are using the 4 bytes after the arraylength, but >> long-arrays cannot do that because of alignment constraints. The impact is >> that these tests don't work as expected, because vectorization triggers >> differently. I don't remember the details, TBH, but I believe they would now >> generate pre-loops, or some might even not vectorize at all. Those seemed to >> be use-cases that did not look very important, but I may be wrong. It would >> be nice to properly fix those tests, or make corresponding tests for compact >> headers, instead, or improve vectorization to better deal with the offset >> mismatch, if necessary/possible. >> >> I will re-evaluate those tests, and add comments or remove the restrictions. > >> > > @rkennke Can you please explain the changes in these tests: >> > > ``` >> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationMismatchedAccess.java >> > > test/hotspot/jtreg/compiler/c2/irTests/TestVectorizationNotRun.java >> > > test/hotspot/jtreg/compiler/loopopts/superword/TestAlignVector.java >> > > test/hotspot/jtreg/compiler/loopopts/superword/TestIndependentPacksWithCyclicDependency.java >> > > test/hotspot/jtreg/compiler/loopopts/superword/TestMulAddS2I.java >> > > ``` >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > >> > > You added these IR rule restriction: `@IR(applyIf = >> > > {"UseCompactObjectHeaders", "false"},` >> > > This means that if `UseCompactObjectHeaders` is enabled, vectorization >> > > seems to be impacted - that could be concerning because it has a >> > > performance impact. >> > > I have recently changed a few things in SuperWord, so maybe some of them >> > > can be removed, because they now vectorize anyway? >> > > Of course some special tests may just rely on `UseCompactObjectHeaders >> > > == false` - but I would need some comments in the tests where you added >> > > it to justify why we add the restriction. >> > > Please also test this patch with the cross combinations of >> > > `UseCompactObjectHeaders` and `AlignVector` enabled and disabled (and >> > > add `VerifyAlignVector` as well). >> > >> > >> > IIRC (it has been a while), the problem is that with Lilliput (and also >> > without compact headers, but disabling compressed class-pointers >> > -UseCompressedClassPointers, but nobody ever does that), byte[] and long[] >> > start at different offsets (12 and 16, respectively). That is because with >> > compact headers, we are using the 4 bytes after the arraylength, but >> > long-arrays cannot do that because of alignment constraints. The impact is >> > that these tests don't work as expected, because vectorization triggers >> > differently. I don't remember the details, TBH, but I believe they would >> > now generate pre-loops, or some might even not vectorize at all. Those >> > seemed to be use-cases that did not look very important, but I may be >> > wrong. It would be nice to properly fix those tests, or make corresponding >> > tests for compact headers, instead, or improve vectorization to better >> > deal with the offset mismatch, if necessary/possible. >> > I will re-evaluate those tests, and add comments or remove the >> > restrictions. >> >> If it has indeed been a while, then it might well be that some of them work >> now, since I did make some improvements to auto-vectorization ... > `LoadNKlass` nodes can then be expanded into more primitive operations (load > and shift for compact headers, load with `klass_offset_in_bytes()` for > original headers) within C2's back-end or even during code emission as > sketched > [here](https://github.com/robcasloz/jdk/commit/6cb4219f101e3be982264071c2cb1d0af1c6d754). > @rkennke is this similar to what you tried out ("Expanding it as a macro")? No, this is not what I tried. I tried to completely expand LoadNKlass, and replace it with the lower nodes that load and shift the mark-word right there, in ideal graph. But your approach is saner: there is so much implicit knowledge about Load(N)Klass, and even klass_offset_in_bytes(), all over the place, it would be very hard to get this right without breaking something. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2352926265