On Fri, 8 Nov 2024 17:42:24 GMT, Roman Kennke <rken...@openjdk.org> wrote:
>> Could you please cherry pick >> https://github.com/mur47x111/jdk/commit/c45ebc2a89d0b25a3dd8cc46386e37a635ff9af2 >> for the JVMCI support? > > @mur47x111 it's now intergrated in jdk24. do your magic in Graal ;-) @rkennke I have now looked more into the SuperWord collateral damage: [JDK-8340010](https://bugs.openjdk.org/browse/JDK-8340010): Fix vectorization tests with compact headers Do we care about `AlignVector` and `UseCompactObjectHeaders` enabled together? If so, we have a serious issue with mixed type examples. There are actually now some failing cases: Failed IR Rules (3) of Methods (3) ---------------------------------- 1) Method "public char[] compiler.vectorization.runner.ArrayTypeConvertTest.convertFloatToChar()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx2", "true"}, counts={"_#V#VECTOR_CAST_F2S#_", "_@min(max_float, max_char)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(VectorCastF2X.*)+(\\s){2}===.*vector[A-Za-z]<S,16>)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 2) Method "public short[] compiler.vectorization.runner.ArrayTypeConvertTest.convertFloatToShort()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx2", "true"}, counts={"_#V#VECTOR_CAST_F2S#_", "_@min(max_float, max_short)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(VectorCastF2X.*)+(\\s){2}===.*vector[A-Za-z]<S,16>)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! 3) Method "public float[] compiler.vectorization.runner.ArrayTypeConvertTest.convertShortToFloat()" - [Failed IR rules: 1]: * @IR rule 1: "@compiler.lib.ir_framework.IR(phase={DEFAULT}, applyIfPlatformAnd={}, applyIfCPUFeatureOr={"asimd", "true", "avx2", "true"}, counts={"_#V#VECTOR_CAST_S2F#_", "_@min(max_short, max_float)", ">0"}, applyIfPlatform={}, applyIfPlatformOr={}, failOn={}, applyIfOr={}, applyIfCPUFeatureAnd={}, applyIf={}, applyIfCPUFeature={}, applyIfAnd={}, applyIfNot={})" > Phase "PrintIdeal": - counts: Graph contains wrong number of nodes: * Constraint 1: "(\\d+(\\s){2}(VectorCastS2X.*)+(\\s){2}===.*vector[A-Za-z]<F,16>)" - Failed comparison: [found] 0 > 0 [given] - No nodes matched! Let me explain: If we enable AlignVector, we need 8-byte alignment. As long as `UseCompactObjectHeaders` is disabled, all of these are `=16`: UNSAFE.ARRAY_BYTE_BASE_OFFSET UNSAFE.ARRAY_SHORT_BASE_OFFSET UNSAFE.ARRAY_CHAR_BASE_OFFSET UNSAFE.ARRAY_INT_BASE_OFFSET UNSAFE.ARRAY_LONG_BASE_OFFSET UNSAFE.ARRAY_FLOAT_BASE_OFFSET UNSAFE.ARRAY_DOUBLE_BASE_OFFSET However, with `UseCompactObjectHeaders` endabled, these are now 12: UNSAFE.ARRAY_BYTE_BASE_OFFSET UNSAFE.ARRAY_SHORT_BASE_OFFSET UNSAFE.ARRAY_CHAR_BASE_OFFSET UNSAFE.ARRAY_INT_BASE_OFFSET UNSAFE.ARRAY_FLOAT_BASE_OFFSET And these still 16: UNSAFE.ARRAY_LONG_BASE_OFFSET UNSAFE.ARRAY_DOUBLE_BASE_OFFSET Now let's try to get that 8-byte alignment in some example, one from the above: public short[] convertFloatToShort() { short[] res = new short[SIZE]; for (int i = 0; i < SIZE; i++) { res[i] = (short) floats[i]; } return res; } Let's look at the two addresses with `UseCompactObjectHeaders=false`, where we **can** vectorize: F_adr = base + 16 + 4 * i -> aligned for: i % 2 = 0 S_adr = base + 16 + 2 * i -> aligned for: i % 4 = 0 -> solution for both: i % 4 = 0, i.e. we have alignment for both vector accesses every 4th iteration. Let's look at the two addresses with `UseCompactObjectHeaders=true`, where we **cannot** vectorize: F_adr = base + 12 + 4 * i -> aligned for: i % 2 = 1 S_adr = base + 12 + 2 * i -> aligned for: i % 4 = 2 -> There is no solution to satisfy both alignment constraints! It's a little sad that I only just realized this now... but oh well. The issue is that we apparently did not run testing for these examples, so I did not see the impact immediately. So my question: do we care about `UseCompactObjectHeaders` and `AlignVector` enabled at the same time? If so, we have to accept that some examples with mixed types simply will not vectorize. ------------- PR Comment: https://git.openjdk.org/jdk/pull/20677#issuecomment-2483138198