Re: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v6]

2024-09-26 Thread Hamlin Li
On Thu, 26 Sep 2024 02:15:34 GMT, Fei Yang  wrote:

>> I see sleef code only set frm to RNE, but I'm not quite sure.
>> Even if we can make sure current sleef only set frm to RNE, seems to me we 
>> can not depends on current implement, it could change. Although good news is 
>> we don't update sleef regularly.
>> Maybe we should take similar action as call-to-java and return-from-jni?
>
> Just a bit worried about the fact that manipunating CSR could be very costly 
> on RISC-V. Another choice would be adding an assertion about FP rounding mode 
> expecting RNE when returning back from the SLEEF routine. I also checked 
> floating-point intrinsics with `_rm` suffix in the function name in SLEEF src 
> code and only witnessed use of `__RISCV_FRM_RNE`. I didn't see uses of other 
> rounding modes as specified by the rvv-intrinsic-spec [1].
> 
> [1] 
> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc

Sounds like a reasonable solution!
Anyone has other thoughts please kindly let me know.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1776563833


Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]

2024-09-26 Thread Roberto Castañeda Lozano
On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke  wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) 
>> #20603 and #20605, plus the Tiny Class-Pointers parts that have been 
>> previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All 
>> changes in this PR are protected by this flag. The purpose of the flag is to 
>> provide a fallback, in case that users unexpectedly observe problems with 
>> the new implementation. The intention is that this flag will remain 
>> experimental and opt-in for at least one release, then make it on-by-default 
>> and diagnostic (?), and eventually deprecate and obsolete it. However, there 
>> are a few unknowns in that plan, specifically, we may want to further 
>> improve compact headers to 4 bytes, we are planning to enhance the Klass* 
>> encoding to support virtually unlimited number of Klasses, at which point we 
>> could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In 
>> order to be able to do this, we are add some changes to GC forwarding (see 
>> below) to protect the relevant (upper 22) bits of the mark-word. Significant 
>> parts of this PR deal with loading the compressed Klass* from the mark-word. 
>> This PR also changes some code paths (mostly in GCs) to be more careful when 
>> accessing Klass* (or mark-word or size) to be able to fetch it from the 
>> forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now 
>> uses a bit to indicate 'self-forwarding'. This is needed to preserve the 
>> crucial Klass* bits in the header. This also allows to get rid of 
>> preserved-header machinery in SerialGC and G1 (Parallel GC abuses 
>> preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We 
>> have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, 
>> we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the 
>> GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field 
>> layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, 
>> it is not possible to read an archive that has been written with an opposite 
>> setting of UseCompactObjectHeaders. Some build machinery is added so that 
>> _co...
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Allow LM_MONITOR on 32-bit platforms

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5692:

> 5690: 
> 5691: void MacroAssembler::load_klass(Register dst, Register src, Register 
> tmp) {
> 5692:   BLOCK_COMMENT("load_klass");

I am not sure that the complexity of `MacroAssembler::load_klass` and the two 
`MacroAssembler::cmp_klass` functions warrant adding block comments, but if you 
prefer to leave them in, could you use opening and closing comments, as in the 
other functions in this file (e.g. `MacroAssembler::_verify_oop`)? In that 
case, please update the comment in the two `MacroAssembler::cmp_klass` 
functions with a more descriptive name than `cmp_klass 1` and `cmp_klass 2`.

src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5726:

> 5724: #ifdef _LP64
> 5725:   if (UseCompactObjectHeaders) {
> 5726: load_nklass_compact(tmp, obj);

Suggestion: assert here that `tmp != noreg`, just like in 
`MacroAssembler::cmp_klass(Register src, Register dst, Register tmp1, Register 
tmp2)` below. Perhaps also assert that the input registers are different.

src/hotspot/cpu/x86/macroAssembler_x86.hpp line 379:

> 377:   // Uses tmp1 and tmp2 as temporary registers.
> 378:   void cmp_klass(Register src, Register dst, Register tmp1, Register 
> tmp2);
> 379: 

The naming of these two functions could be made clearer and more consistent 
with their documentation. Please consider renaming the four-argument 
`cmp_klass` function to `cmp_klasses_from_objects` or similar. The notion of 
"source" and "destination" in the parameter names is unclear, I suggest to just 
call them `obj`, `obj1`, `obj2` etc. 
Please also make sure that the parameter names are consistent in the 
declaration and definition (e.g. `dst` vs `obj`).

src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008:

> 4006: #ifdef COMPILER2
> 4007:   if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) {
> 4008: generate_string_indexof(StubRoutines::_string_indexof_array);

This stub routine should be re-enabled if `UseCompactObjectHeaders` is to 
become non-experimental and enabled by default in the future. Is there a RFE 
for this task?

src/hotspot/share/opto/memnode.cpp line 1976:


Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]

2024-09-26 Thread Roberto Castañeda Lozano
On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke  wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) 
>> #20603 and #20605, plus the Tiny Class-Pointers parts that have been 
>> previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All 
>> changes in this PR are protected by this flag. The purpose of the flag is to 
>> provide a fallback, in case that users unexpectedly observe problems with 
>> the new implementation. The intention is that this flag will remain 
>> experimental and opt-in for at least one release, then make it on-by-default 
>> and diagnostic (?), and eventually deprecate and obsolete it. However, there 
>> are a few unknowns in that plan, specifically, we may want to further 
>> improve compact headers to 4 bytes, we are planning to enhance the Klass* 
>> encoding to support virtually unlimited number of Klasses, at which point we 
>> could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In 
>> order to be able to do this, we are add some changes to GC forwarding (see 
>> below) to protect the relevant (upper 22) bits of the mark-word. Significant 
>> parts of this PR deal with loading the compressed Klass* from the mark-word. 
>> This PR also changes some code paths (mostly in GCs) to be more careful when 
>> accessing Klass* (or mark-word or size) to be able to fetch it from the 
>> forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now 
>> uses a bit to indicate 'self-forwarding'. This is needed to preserve the 
>> crucial Klass* bits in the header. This also allows to get rid of 
>> preserved-header machinery in SerialGC and G1 (Parallel GC abuses 
>> preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We 
>> have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, 
>> we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the 
>> GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field 
>> layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, 
>> it is not possible to read an archive that has been written with an opposite 
>> setting of UseCompactObjectHeaders. Some build machinery is added so that 
>> _co...
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Allow LM_MONITOR on 32-bit platforms

src/hotspot/cpu/x86/x86_64.ad line 4388:

> 4386:   effect(KILL cr);
> 4387:   ins_cost(125); // XXX
> 4388:   format %{ "movl$dst, $mem\t# compressed klass ptr" %}

For consistency with the aarch64 back-end:
Suggestion:

  format %{ "load_nklass_compact$dst, $mem\t# compressed klass ptr" %}

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776747538


Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]

2024-09-26 Thread Roman Kennke
On Thu, 26 Sep 2024 15:59:50 GMT, Roberto Castañeda Lozano 
 wrote:

>> Ok, this is indeed relevant and helpful. This could segfault if we happen to 
>> read from the very first object on the heap. I can solve this by allowing to 
>> copy only 8 bytes onto the stack: 
>> https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1
>> 
>> Does this look correct to you? Or better to do it as a follow-up?
>> (It passes a couple of indexOf tests, will run tier1-4 on it).
>
>> Does this look correct to you? Or better to do it as a follow-up?
> 
> I do not feel confident enough to review this part. If you want to include 
> https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1
>  in this changeset, I would prefer that the original author of JDK-8320448 or 
> at least someone from Intel reviews it, otherwise I think it is fine to leave 
> it as a follow-up enhancement.

@sviswa7 or @asgibbons WDYT about including 
https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 
as part of compact object headers implementation? Otherwise we would have to 
disable indexOf intrinsic when running with compact headers, because of the 
assumption that array headers are >= 16 bytes, which is no longer true with 
compact headers.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777396409


Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]

2024-09-26 Thread Roberto Castañeda Lozano
On Thu, 26 Sep 2024 13:58:02 GMT, Roman Kennke  wrote:

> Does this look correct to you? Or better to do it as a follow-up?

I do not feel confident enough to review this part. If you want to include 
https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 
in this changeset, I would prefer that the original author of JDK-8320448 or at 
least someone from Intel reviews it, otherwise I think it is fine to leave it 
as a follow-up enhancement.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777370316


Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]

2024-09-26 Thread Roman Kennke
On Thu, 26 Sep 2024 08:55:44 GMT, Roberto Castañeda Lozano 
 wrote:

>> Roman Kennke has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Allow LM_MONITOR on 32-bit platforms
>
> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008:
> 
>> 4006: #ifdef COMPILER2
>> 4007:   if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) 
>> {
>> 4008: generate_string_indexof(StubRoutines::_string_indexof_array);
> 
> This stub routine should be re-enabled if `UseCompactObjectHeaders` is to 
> become non-experimental and enabled by default in the future. Is there a RFE 
> for this task?

This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I 
believe we can perhaps remove that assert and the !UCOH clause. I checked a 
couple of tests that tripped that assert, and they seem to work fine, and I 
also checked the code in `LibraryCallKit::inline_string_indexOfI` and 
`generate_string_indexof_stubs()` and could not find anything obvious that 
requires the base offset to be >=16. I am not sure why that assert is there. I 
am now running tier1-4 with that change: 
https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0

If you know (or find) any reason why we need that assert, please let me know. 
Otherwise I'd remove it, if you don't have objections.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776888460


Re: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v6]

2024-09-26 Thread Hamlin Li
On Thu, 26 Sep 2024 07:54:32 GMT, Hamlin Li  wrote:

>> Just a bit worried about the fact that manipunating CSR could be very costly 
>> on RISC-V. Another choice would be adding an assertion about FP rounding 
>> mode expecting RNE when returning back from the SLEEF routine. I also 
>> checked floating-point intrinsics with `_rm` suffix in the function name in 
>> SLEEF src code and only witnessed use of `__RISCV_FRM_RNE`. I didn't see 
>> uses of other rounding modes as specified by the rvv-intrinsic-spec [1].
>> 
>> [1] 
>> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc
>
> Sounds like a reasonable solution!
> Anyone has other thoughts please kindly let me know.

added some code to check `frm` after sleef calls.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1776903349


Re: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v7]

2024-09-26 Thread Hamlin Li
> Hi,
> Can you help to review this patch?
> Thanks!
> 
> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added 
> the sleef source (in particular the generated sleef inline headers). We use 
> sleef api to vectorize the math operations in vector api.
> 
> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will 
> generate libsleef.so with the bridge functions to sleef api, otherwise 
> without the bridge functions.
> 
> ### Test
> test/jdk/jdk/incubator/vector
> 
> ### Performance
> data on bananapi
> 
> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error 
> +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement
> -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 
> 1508.709 | ns/op | 1.855
> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 
> 2833.61 | ns/op | 2.001
> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 
> 2204.224 | ns/op | 2.074
> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 
> | 3045.192 | ns/op | 2.62
> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 
> 1330.869 | ns/op | 2.166
> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 
> 1319.313 | ns/op | 1.974
> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 
> 61324.863 | ns/op | 1.167
> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 
> 891.272 | ns/op | 2.219
> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 
> | 2144.997 | ns/op | 1.377
> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 
> | 2091.811 | ns/op | 3.357
> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 
> 2653.519 | ns/op | 2.398
> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 
> | 2489.08 | ns/op | 2.449
> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 
> 1062.848 | ns/op | 2.743
> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 
> 2669.33 | ns/op | 1.743
> Double128Vector

Hamlin Li has updated the pull request incrementally with one additional commit 
since the last revision:

  check frm after sleef call

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/21083/files
  - new: https://git.openjdk.org/jdk/pull/21083/files/7719b5cf..50b6d529

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=06
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=05-06

  Stats: 23 lines in 1 file changed: 21 ins; 0 del; 2 mod
  Patch: https://git.openjdk.org/jdk/pull/21083.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083

PR: https://git.openjdk.org/jdk/pull/21083


Re: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v8]

2024-09-26 Thread Hamlin Li
> Hi,
> Can you help to review this patch?
> Thanks!
> 
> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added 
> the sleef source (in particular the generated sleef inline headers). We use 
> sleef api to vectorize the math operations in vector api.
> 
> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will 
> generate libsleef.so with the bridge functions to sleef api, otherwise 
> without the bridge functions.
> 
> ### Test
> test/jdk/jdk/incubator/vector
> 
> ### Performance
> data on bananapi
> 
> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error 
> +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement
> -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | 
> 1508.709 | ns/op | 1.855
> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | 
> 2833.61 | ns/op | 2.001
> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 
> 2204.224 | ns/op | 2.074
> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 
> | 3045.192 | ns/op | 2.62
> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | 
> 1330.869 | ns/op | 2.166
> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 
> 1319.313 | ns/op | 1.974
> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 
> 61324.863 | ns/op | 1.167
> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 
> 891.272 | ns/op | 2.219
> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 
> | 2144.997 | ns/op | 1.377
> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 
> | 2091.811 | ns/op | 3.357
> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 
> 2653.519 | ns/op | 2.398
> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 
> | 2489.08 | ns/op | 2.449
> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 
> 1062.848 | ns/op | 2.743
> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | 
> 2669.33 | ns/op | 1.743
> Double128Vector

Hamlin Li has updated the pull request incrementally with one additional commit 
since the last revision:

  fix test macro

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/21083/files
  - new: https://git.openjdk.org/jdk/pull/21083/files/50b6d529..0bd263d1

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=07
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=06-07

  Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod
  Patch: https://git.openjdk.org/jdk/pull/21083.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083

PR: https://git.openjdk.org/jdk/pull/21083


Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v27]

2024-09-26 Thread Roman Kennke
> This is the main body of the JEP 450: Compact Object Headers (Experimental).
> 
> It is also a follow-up to #20640, which now also includes (and supersedes) 
> #20603 and #20605, plus the Tiny Class-Pointers parts that have been 
> previously missing.
> 
> Main changes:
>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All 
> changes in this PR are protected by this flag. The purpose of the flag is to 
> provide a fallback, in case that users unexpectedly observe problems with the 
> new implementation. The intention is that this flag will remain experimental 
> and opt-in for at least one release, then make it on-by-default and 
> diagnostic (?), and eventually deprecate and obsolete it. However, there are 
> a few unknowns in that plan, specifically, we may want to further improve 
> compact headers to 4 bytes, we are planning to enhance the Klass* encoding to 
> support virtually unlimited number of Klasses, at which point we could also 
> obsolete UseCompressedClassPointers.
>  - The compressed Klass* can now be stored in the mark-word of objects. In 
> order to be able to do this, we are add some changes to GC forwarding (see 
> below) to protect the relevant (upper 22) bits of the mark-word. Significant 
> parts of this PR deal with loading the compressed Klass* from the mark-word. 
> This PR also changes some code paths (mostly in GCs) to be more careful when 
> accessing Klass* (or mark-word or size) to be able to fetch it from the 
> forwardee in case the object is forwarded.
>  - Self-forwarding in GCs (which is used to deal with promotion failure) now 
> uses a bit to indicate 'self-forwarding'. This is needed to preserve the 
> crucial Klass* bits in the header. This also allows to get rid of 
> preserved-header machinery in SerialGC and G1 (Parallel GC abuses 
> preserved-marks to also find all other relevant oops).
>  - Full GC forwarding now uses an encoding similar to compressed-oops. We 
> have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, 
> we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the 
> GC forwarding at all).
>  - Instances can now have their base-offset (the offset where the field 
> layouter starts to place fields) at offset 8 (instead of 12 or 16).
>  - Arrays will now store their length at offset 8.
>  - CDS can now write and read archives with the compressed header. However, 
> it is not possible to read an archive that has been written with an opposite 
> setting of UseCompactObjectHeaders. Some build machinery is added so that 
> _coh variants of CDS archiv...

Roman Kennke has updated the pull request incrementally with two additional 
commits since the last revision:

 - @robcasloz review comments
 - Improve CollectedHeap::is_oop()

-

Changes:
  - all: https://git.openjdk.org/jdk/pull/20677/files
  - new: https://git.openjdk.org/jdk/pull/20677/files/4904d433..d48f55d6

Webrevs:
 - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=26
 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=25-26

  Stats: 86 lines in 10 files changed: 20 ins; 21 del; 45 mod
  Patch: https://git.openjdk.org/jdk/pull/20677.diff
  Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677

PR: https://git.openjdk.org/jdk/pull/20677


Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]

2024-09-26 Thread Roman Kennke
On Thu, 26 Sep 2024 13:04:57 GMT, Roberto Castañeda Lozano 
 wrote:

>> This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I 
>> believe we can perhaps remove that assert and the !UCOH clause. I checked a 
>> couple of tests that tripped that assert, and they seem to work fine, and I 
>> also checked the code in `LibraryCallKit::inline_string_indexOfI` and 
>> `generate_string_indexof_stubs()` and could not find anything obvious that 
>> requires the base offset to be >=16. I am not sure why that assert is there. 
>> I am now running tier1-4 with that change: 
>> https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0
>> 
>> If you know (or find) any reason why we need that assert, please let me 
>> know. Otherwise I'd remove it, if you don't have objections.
>
> I am not familiar with the `indexOf` implementation, but here is a relevant 
> comment that motivates the assertion: 
> https://github.com/openjdk/jdk/pull/16753#discussion_r1592774634.

Ok, this is indeed relevant and helpful. This could segfault if we happen to 
read from the very first object on the heap. I can solve this by allowing to 
copy only 8 bytes onto the stack: 
https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1

Does this look correct to you? Or better to do it as a follow-up?
(It passes a couple of indexOf tests, will run tier1-4 on it).

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777134871


Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]

2024-09-26 Thread Roberto Castañeda Lozano
On Thu, 26 Sep 2024 11:39:02 GMT, Roman Kennke  wrote:

>> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008:
>> 
>>> 4006: #ifdef COMPILER2
>>> 4007:   if ((UseAVX == 2) && EnableX86ECoreOpts && 
>>> !UseCompactObjectHeaders) {
>>> 4008: generate_string_indexof(StubRoutines::_string_indexof_array);
>> 
>> This stub routine should be re-enabled if `UseCompactObjectHeaders` is to 
>> become non-experimental and enabled by default in the future. Is there a RFE 
>> for this task?
>
> This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I 
> believe we can perhaps remove that assert and the !UCOH clause. I checked a 
> couple of tests that tripped that assert, and they seem to work fine, and I 
> also checked the code in `LibraryCallKit::inline_string_indexOfI` and 
> `generate_string_indexof_stubs()` and could not find anything obvious that 
> requires the base offset to be >=16. I am not sure why that assert is there. 
> I am now running tier1-4 with that change: 
> https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0
> 
> If you know (or find) any reason why we need that assert, please let me know. 
> Otherwise I'd remove it, if you don't have objections.

I am not familiar with the `indexOf` implementation, but here is a relevant 
comment that motivates the assertion: 
https://github.com/openjdk/jdk/pull/16753#discussion_r1592774634.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777033220


Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]

2024-09-26 Thread Roberto Castañeda Lozano
On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke  wrote:

>> This is the main body of the JEP 450: Compact Object Headers (Experimental).
>> 
>> It is also a follow-up to #20640, which now also includes (and supersedes) 
>> #20603 and #20605, plus the Tiny Class-Pointers parts that have been 
>> previously missing.
>> 
>> Main changes:
>>  - Introduction of the (experimental) flag UseCompactObjectHeaders. All 
>> changes in this PR are protected by this flag. The purpose of the flag is to 
>> provide a fallback, in case that users unexpectedly observe problems with 
>> the new implementation. The intention is that this flag will remain 
>> experimental and opt-in for at least one release, then make it on-by-default 
>> and diagnostic (?), and eventually deprecate and obsolete it. However, there 
>> are a few unknowns in that plan, specifically, we may want to further 
>> improve compact headers to 4 bytes, we are planning to enhance the Klass* 
>> encoding to support virtually unlimited number of Klasses, at which point we 
>> could also obsolete UseCompressedClassPointers.
>>  - The compressed Klass* can now be stored in the mark-word of objects. In 
>> order to be able to do this, we are add some changes to GC forwarding (see 
>> below) to protect the relevant (upper 22) bits of the mark-word. Significant 
>> parts of this PR deal with loading the compressed Klass* from the mark-word. 
>> This PR also changes some code paths (mostly in GCs) to be more careful when 
>> accessing Klass* (or mark-word or size) to be able to fetch it from the 
>> forwardee in case the object is forwarded.
>>  - Self-forwarding in GCs (which is used to deal with promotion failure) now 
>> uses a bit to indicate 'self-forwarding'. This is needed to preserve the 
>> crucial Klass* bits in the header. This also allows to get rid of 
>> preserved-header machinery in SerialGC and G1 (Parallel GC abuses 
>> preserved-marks to also find all other relevant oops).
>>  - Full GC forwarding now uses an encoding similar to compressed-oops. We 
>> have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, 
>> we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the 
>> GC forwarding at all).
>>  - Instances can now have their base-offset (the offset where the field 
>> layouter starts to place fields) at offset 8 (instead of 12 or 16).
>>  - Arrays will now store their length at offset 8.
>>  - CDS can now write and read archives with the compressed header. However, 
>> it is not possible to read an archive that has been written with an opposite 
>> setting of UseCompactObjectHeaders. Some build machinery is added so that 
>> _co...
>
> Roman Kennke has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Allow LM_MONITOR on 32-bit platforms

src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2570:

> 2568:   // we get the heapBase in obj, and the 
> narrowOop+klass_offset_in_bytes/sizeof(narrowOop) in index.
> 2569:   // When that happens, we need to lea the address into a single 
> register, and subtract the
> 2570:   // klass_offset_in_bytes, to get the address of the mark-word.

Parts of this comment are obsolete after commit 2c4a7877, please update the 
comment.

src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 882:

> 880:   void store_klass(Register dst, Register src);
> 881:   void cmp_klass(Register oop, Register trial_klass, Register tmp);
> 882:   void cmp_klass(Register src, Register dst, Register tmp1, Register 
> tmp2);

Same suggestion as for the analogous x86 functions: consider renaming the 
four-argument `cmp_klass` function to `cmp_klasses_from_objects` or similar, 
and the `src` and `dst` parameters to `oop1` and `oop2` or similar if there is 
no notion of "source" and "destination".

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776927247
PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776942226


Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]

2024-09-26 Thread Scott Gibbons
On Thu, 26 Sep 2024 16:15:39 GMT, Roman Kennke  wrote:

>>> Does this look correct to you? Or better to do it as a follow-up?
>> 
>> I do not feel confident enough to review this part. If you want to include 
>> https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1
>>  in this changeset, I would prefer that the original author of JDK-8320448 
>> or at least someone from Intel reviews it, otherwise I think it is fine to 
>> leave it as a follow-up enhancement.
>
> @sviswa7 or @asgibbons WDYT about including 
> https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1
>  as part of compact object headers implementation? Otherwise we would have to 
> disable indexOf intrinsic when running with compact headers, because of the 
> assumption that array headers are >= 16 bytes, which is no longer true with 
> compact headers.

@rkennke I reviewed [rkennke@ 
097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1)
 and the code looks good to me.  I would prefer this approach instead of not 
generating the IndexOf intrinsic.

Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead 
of `arrayOopDesc::base_offset_in_bytes`?  I can see benefits to either - which 
provides more clarity?  I like the assert as it makes the intention clear 
(thanks!).

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777485078


Integrated: 8340552: Harden TzdbZoneRulesCompiler against missing zone names

2024-09-26 Thread Florian Weimer
On Fri, 20 Sep 2024 16:09:13 GMT, Florian Weimer  wrote:

> 8340552: Harden TzdbZoneRulesCompiler against missing zone names

This pull request has now been integrated.

Changeset: 1bc13a1c
Author:Florian Weimer 
URL:   
https://git.openjdk.org/jdk/commit/1bc13a1c10a580f84f1b7686c95344ec2633f611
Stats: 12 lines in 1 file changed: 8 ins; 0 del; 4 mod

8340552: Harden TzdbZoneRulesCompiler against missing zone names

Reviewed-by: andrew, jlu, naoto

-

PR: https://git.openjdk.org/jdk/pull/21112


Re: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v8]

2024-09-26 Thread Fei Yang
On Thu, 26 Sep 2024 13:14:04 GMT, Hamlin Li  wrote:

>> Hi,
>> Can you help to review this patch?
>> Thanks!
>> 
>> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added 
>> the sleef source (in particular the generated sleef inline headers). We use 
>> sleef api to vectorize the math operations in vector api.
>> 
>> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will 
>> generate libsleef.so with the bridge functions to sleef api, otherwise 
>> without the bridge functions.
>> 
>> ### Test
>> test/jdk/jdk/incubator/vector
>> 
>> ### Performance
>> data on bananapi
>> 
>> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error 
>> +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement
>> -- | -- | -- | -- | -- | -- | -- | -- | -- | --
>> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 
>> | 1508.709 | ns/op | 1.855
>> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 
>> | 2833.61 | ns/op | 2.001
>> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | 
>> 2204.224 | ns/op | 2.074
>> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 
>> | 3045.192 | ns/op | 2.62
>> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 
>> | 1330.869 | ns/op | 2.166
>> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | 
>> 1319.313 | ns/op | 1.974
>> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | 
>> 61324.863 | ns/op | 1.167
>> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | 
>> 891.272 | ns/op | 2.219
>> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 
>> | 2144.997 | ns/op | 1.377
>> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 
>> | 2091.811 | ns/op | 3.357
>> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | 
>> 2653.519 | ns/op | 2.398
>> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 
>> | 2489.08 | ns/op | 2.449
>> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | 
>> 1062.848 | ns/op | 2.743
>> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 3...
>
> Hamlin Li has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fix test macro

src/jdk.incubator.vector/linux/native/libsleef/lib/vector_math_rvv.c line 51:

> 49: // the dynamic rounding mode is always RNE.
> 50: 
> 51: #ifdef DEBUG

Question: Should we check for `NDEBUG` macro here instead? I see checks for 
this macro in the original SLEEF code.


#ifndef NDEBUG
#define CHECK_FRM   __asm__ __volatile__ ( \
"frrm   t0  \n\t"  \
"beqz   t0, 2f  \n\t"  \
"csrrw  x0, cycle, x0   \n\t"  \
"2: \n\t"  \
: : : "memory" );
#else
#define CHECK_FRM
#endif

-

PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1777931566