Re: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v6]
On Thu, 26 Sep 2024 02:15:34 GMT, Fei Yang wrote: >> I see sleef code only set frm to RNE, but I'm not quite sure. >> Even if we can make sure current sleef only set frm to RNE, seems to me we >> can not depends on current implement, it could change. Although good news is >> we don't update sleef regularly. >> Maybe we should take similar action as call-to-java and return-from-jni? > > Just a bit worried about the fact that manipunating CSR could be very costly > on RISC-V. Another choice would be adding an assertion about FP rounding mode > expecting RNE when returning back from the SLEEF routine. I also checked > floating-point intrinsics with `_rm` suffix in the function name in SLEEF src > code and only witnessed use of `__RISCV_FRM_RNE`. I didn't see uses of other > rounding modes as specified by the rvv-intrinsic-spec [1]. > > [1] > https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc Sounds like a reasonable solution! Anyone has other thoughts please kindly let me know. - PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1776563833
Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]
On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) >> #20603 and #20605, plus the Tiny Class-Pointers parts that have been >> previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All >> changes in this PR are protected by this flag. The purpose of the flag is to >> provide a fallback, in case that users unexpectedly observe problems with >> the new implementation. The intention is that this flag will remain >> experimental and opt-in for at least one release, then make it on-by-default >> and diagnostic (?), and eventually deprecate and obsolete it. However, there >> are a few unknowns in that plan, specifically, we may want to further >> improve compact headers to 4 bytes, we are planning to enhance the Klass* >> encoding to support virtually unlimited number of Klasses, at which point we >> could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In >> order to be able to do this, we are add some changes to GC forwarding (see >> below) to protect the relevant (upper 22) bits of the mark-word. Significant >> parts of this PR deal with loading the compressed Klass* from the mark-word. >> This PR also changes some code paths (mostly in GCs) to be more careful when >> accessing Klass* (or mark-word or size) to be able to fetch it from the >> forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now >> uses a bit to indicate 'self-forwarding'. This is needed to preserve the >> crucial Klass* bits in the header. This also allows to get rid of >> preserved-header machinery in SerialGC and G1 (Parallel GC abuses >> preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We >> have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, >> we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the >> GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field >> layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, >> it is not possible to read an archive that has been written with an opposite >> setting of UseCompactObjectHeaders. Some build machinery is added so that >> _co... > > Roman Kennke has updated the pull request incrementally with one additional > commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5692: > 5690: > 5691: void MacroAssembler::load_klass(Register dst, Register src, Register > tmp) { > 5692: BLOCK_COMMENT("load_klass"); I am not sure that the complexity of `MacroAssembler::load_klass` and the two `MacroAssembler::cmp_klass` functions warrant adding block comments, but if you prefer to leave them in, could you use opening and closing comments, as in the other functions in this file (e.g. `MacroAssembler::_verify_oop`)? In that case, please update the comment in the two `MacroAssembler::cmp_klass` functions with a more descriptive name than `cmp_klass 1` and `cmp_klass 2`. src/hotspot/cpu/x86/macroAssembler_x86.cpp line 5726: > 5724: #ifdef _LP64 > 5725: if (UseCompactObjectHeaders) { > 5726: load_nklass_compact(tmp, obj); Suggestion: assert here that `tmp != noreg`, just like in `MacroAssembler::cmp_klass(Register src, Register dst, Register tmp1, Register tmp2)` below. Perhaps also assert that the input registers are different. src/hotspot/cpu/x86/macroAssembler_x86.hpp line 379: > 377: // Uses tmp1 and tmp2 as temporary registers. > 378: void cmp_klass(Register src, Register dst, Register tmp1, Register > tmp2); > 379: The naming of these two functions could be made clearer and more consistent with their documentation. Please consider renaming the four-argument `cmp_klass` function to `cmp_klasses_from_objects` or similar. The notion of "source" and "destination" in the parameter names is unclear, I suggest to just call them `obj`, `obj1`, `obj2` etc. Please also make sure that the parameter names are consistent in the declaration and definition (e.g. `dst` vs `obj`). src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008: > 4006: #ifdef COMPILER2 > 4007: if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) { > 4008: generate_string_indexof(StubRoutines::_string_indexof_array); This stub routine should be re-enabled if `UseCompactObjectHeaders` is to become non-experimental and enabled by default in the future. Is there a RFE for this task? src/hotspot/share/opto/memnode.cpp line 1976:
Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]
On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) >> #20603 and #20605, plus the Tiny Class-Pointers parts that have been >> previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All >> changes in this PR are protected by this flag. The purpose of the flag is to >> provide a fallback, in case that users unexpectedly observe problems with >> the new implementation. The intention is that this flag will remain >> experimental and opt-in for at least one release, then make it on-by-default >> and diagnostic (?), and eventually deprecate and obsolete it. However, there >> are a few unknowns in that plan, specifically, we may want to further >> improve compact headers to 4 bytes, we are planning to enhance the Klass* >> encoding to support virtually unlimited number of Klasses, at which point we >> could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In >> order to be able to do this, we are add some changes to GC forwarding (see >> below) to protect the relevant (upper 22) bits of the mark-word. Significant >> parts of this PR deal with loading the compressed Klass* from the mark-word. >> This PR also changes some code paths (mostly in GCs) to be more careful when >> accessing Klass* (or mark-word or size) to be able to fetch it from the >> forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now >> uses a bit to indicate 'self-forwarding'. This is needed to preserve the >> crucial Klass* bits in the header. This also allows to get rid of >> preserved-header machinery in SerialGC and G1 (Parallel GC abuses >> preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We >> have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, >> we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the >> GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field >> layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, >> it is not possible to read an archive that has been written with an opposite >> setting of UseCompactObjectHeaders. Some build machinery is added so that >> _co... > > Roman Kennke has updated the pull request incrementally with one additional > commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/cpu/x86/x86_64.ad line 4388: > 4386: effect(KILL cr); > 4387: ins_cost(125); // XXX > 4388: format %{ "movl$dst, $mem\t# compressed klass ptr" %} For consistency with the aarch64 back-end: Suggestion: format %{ "load_nklass_compact$dst, $mem\t# compressed klass ptr" %} - PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776747538
Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]
On Thu, 26 Sep 2024 15:59:50 GMT, Roberto Castañeda Lozano wrote: >> Ok, this is indeed relevant and helpful. This could segfault if we happen to >> read from the very first object on the heap. I can solve this by allowing to >> copy only 8 bytes onto the stack: >> https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 >> >> Does this look correct to you? Or better to do it as a follow-up? >> (It passes a couple of indexOf tests, will run tier1-4 on it). > >> Does this look correct to you? Or better to do it as a follow-up? > > I do not feel confident enough to review this part. If you want to include > https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 > in this changeset, I would prefer that the original author of JDK-8320448 or > at least someone from Intel reviews it, otherwise I think it is fine to leave > it as a follow-up enhancement. @sviswa7 or @asgibbons WDYT about including https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 as part of compact object headers implementation? Otherwise we would have to disable indexOf intrinsic when running with compact headers, because of the assumption that array headers are >= 16 bytes, which is no longer true with compact headers. - PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777396409
Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]
On Thu, 26 Sep 2024 13:58:02 GMT, Roman Kennke wrote: > Does this look correct to you? Or better to do it as a follow-up? I do not feel confident enough to review this part. If you want to include https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 in this changeset, I would prefer that the original author of JDK-8320448 or at least someone from Intel reviews it, otherwise I think it is fine to leave it as a follow-up enhancement. - PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777370316
Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]
On Thu, 26 Sep 2024 08:55:44 GMT, Roberto Castañeda Lozano wrote: >> Roman Kennke has updated the pull request incrementally with one additional >> commit since the last revision: >> >> Allow LM_MONITOR on 32-bit platforms > > src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008: > >> 4006: #ifdef COMPILER2 >> 4007: if ((UseAVX == 2) && EnableX86ECoreOpts && !UseCompactObjectHeaders) >> { >> 4008: generate_string_indexof(StubRoutines::_string_indexof_array); > > This stub routine should be re-enabled if `UseCompactObjectHeaders` is to > become non-experimental and enabled by default in the future. Is there a RFE > for this task? This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I believe we can perhaps remove that assert and the !UCOH clause. I checked a couple of tests that tripped that assert, and they seem to work fine, and I also checked the code in `LibraryCallKit::inline_string_indexOfI` and `generate_string_indexof_stubs()` and could not find anything obvious that requires the base offset to be >=16. I am not sure why that assert is there. I am now running tier1-4 with that change: https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0 If you know (or find) any reason why we need that assert, please let me know. Otherwise I'd remove it, if you don't have objections. - PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776888460
Re: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v6]
On Thu, 26 Sep 2024 07:54:32 GMT, Hamlin Li wrote: >> Just a bit worried about the fact that manipunating CSR could be very costly >> on RISC-V. Another choice would be adding an assertion about FP rounding >> mode expecting RNE when returning back from the SLEEF routine. I also >> checked floating-point intrinsics with `_rm` suffix in the function name in >> SLEEF src code and only witnessed use of `__RISCV_FRM_RNE`. I didn't see >> uses of other rounding modes as specified by the rvv-intrinsic-spec [1]. >> >> [1] >> https://github.com/riscv-non-isa/rvv-intrinsic-doc/blob/main/doc/rvv-intrinsic-spec.adoc > > Sounds like a reasonable solution! > Anyone has other thoughts please kindly let me know. added some code to check `frm` after sleef calls. - PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1776903349
Re: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v7]
> Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added > the sleef source (in particular the generated sleef inline headers). We use > sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will > generate libsleef.so with the bridge functions to sleef api, otherwise > without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error > +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | > 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | > 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | > 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 > | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | > 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | > 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | > 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | > 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 > | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 > | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | > 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 > | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | > 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | > 2669.33 | ns/op | 1.743 > Double128Vector Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: check frm after sleef call - Changes: - all: https://git.openjdk.org/jdk/pull/21083/files - new: https://git.openjdk.org/jdk/pull/21083/files/7719b5cf..50b6d529 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=06 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=05-06 Stats: 23 lines in 1 file changed: 21 ins; 0 del; 2 mod Patch: https://git.openjdk.org/jdk/pull/21083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083 PR: https://git.openjdk.org/jdk/pull/21083
Re: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v8]
> Hi, > Can you help to review this patch? > Thanks! > > This patch is based on https://github.com/openjdk/jdk/pull/20781 which added > the sleef source (in particular the generated sleef inline headers). We use > sleef api to vectorize the math operations in vector api. > > On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will > generate libsleef.so with the bridge functions to sleef api, otherwise > without the bridge functions. > > ### Test > test/jdk/jdk/incubator/vector > > ### Performance > data on bananapi > > Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error > +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement > -- | -- | -- | -- | -- | -- | -- | -- | -- | -- > Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 | > 1508.709 | ns/op | 1.855 > Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 | > 2833.61 | ns/op | 2.001 > Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | > 2204.224 | ns/op | 2.074 > Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 > | 3045.192 | ns/op | 2.62 > Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 | > 1330.869 | ns/op | 2.166 > Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | > 1319.313 | ns/op | 1.974 > Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | > 61324.863 | ns/op | 1.167 > Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | > 891.272 | ns/op | 2.219 > Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 > | 2144.997 | ns/op | 1.377 > Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 > | 2091.811 | ns/op | 3.357 > Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | > 2653.519 | ns/op | 2.398 > Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 > | 2489.08 | ns/op | 2.449 > Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | > 1062.848 | ns/op | 2.743 > Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 36.55 | 560765.066 | > 2669.33 | ns/op | 1.743 > Double128Vector Hamlin Li has updated the pull request incrementally with one additional commit since the last revision: fix test macro - Changes: - all: https://git.openjdk.org/jdk/pull/21083/files - new: https://git.openjdk.org/jdk/pull/21083/files/50b6d529..0bd263d1 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=07 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=21083&range=06-07 Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod Patch: https://git.openjdk.org/jdk/pull/21083.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/21083/head:pull/21083 PR: https://git.openjdk.org/jdk/pull/21083
Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v27]
> This is the main body of the JEP 450: Compact Object Headers (Experimental). > > It is also a follow-up to #20640, which now also includes (and supersedes) > #20603 and #20605, plus the Tiny Class-Pointers parts that have been > previously missing. > > Main changes: > - Introduction of the (experimental) flag UseCompactObjectHeaders. All > changes in this PR are protected by this flag. The purpose of the flag is to > provide a fallback, in case that users unexpectedly observe problems with the > new implementation. The intention is that this flag will remain experimental > and opt-in for at least one release, then make it on-by-default and > diagnostic (?), and eventually deprecate and obsolete it. However, there are > a few unknowns in that plan, specifically, we may want to further improve > compact headers to 4 bytes, we are planning to enhance the Klass* encoding to > support virtually unlimited number of Klasses, at which point we could also > obsolete UseCompressedClassPointers. > - The compressed Klass* can now be stored in the mark-word of objects. In > order to be able to do this, we are add some changes to GC forwarding (see > below) to protect the relevant (upper 22) bits of the mark-word. Significant > parts of this PR deal with loading the compressed Klass* from the mark-word. > This PR also changes some code paths (mostly in GCs) to be more careful when > accessing Klass* (or mark-word or size) to be able to fetch it from the > forwardee in case the object is forwarded. > - Self-forwarding in GCs (which is used to deal with promotion failure) now > uses a bit to indicate 'self-forwarding'. This is needed to preserve the > crucial Klass* bits in the header. This also allows to get rid of > preserved-header machinery in SerialGC and G1 (Parallel GC abuses > preserved-marks to also find all other relevant oops). > - Full GC forwarding now uses an encoding similar to compressed-oops. We > have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, > we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the > GC forwarding at all). > - Instances can now have their base-offset (the offset where the field > layouter starts to place fields) at offset 8 (instead of 12 or 16). > - Arrays will now store their length at offset 8. > - CDS can now write and read archives with the compressed header. However, > it is not possible to read an archive that has been written with an opposite > setting of UseCompactObjectHeaders. Some build machinery is added so that > _coh variants of CDS archiv... Roman Kennke has updated the pull request incrementally with two additional commits since the last revision: - @robcasloz review comments - Improve CollectedHeap::is_oop() - Changes: - all: https://git.openjdk.org/jdk/pull/20677/files - new: https://git.openjdk.org/jdk/pull/20677/files/4904d433..d48f55d6 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=26 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20677&range=25-26 Stats: 86 lines in 10 files changed: 20 ins; 21 del; 45 mod Patch: https://git.openjdk.org/jdk/pull/20677.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20677/head:pull/20677 PR: https://git.openjdk.org/jdk/pull/20677
Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]
On Thu, 26 Sep 2024 13:04:57 GMT, Roberto Castañeda Lozano wrote: >> This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I >> believe we can perhaps remove that assert and the !UCOH clause. I checked a >> couple of tests that tripped that assert, and they seem to work fine, and I >> also checked the code in `LibraryCallKit::inline_string_indexOfI` and >> `generate_string_indexof_stubs()` and could not find anything obvious that >> requires the base offset to be >=16. I am not sure why that assert is there. >> I am now running tier1-4 with that change: >> https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0 >> >> If you know (or find) any reason why we need that assert, please let me >> know. Otherwise I'd remove it, if you don't have objections. > > I am not familiar with the `indexOf` implementation, but here is a relevant > comment that motivates the assertion: > https://github.com/openjdk/jdk/pull/16753#discussion_r1592774634. Ok, this is indeed relevant and helpful. This could segfault if we happen to read from the very first object on the heap. I can solve this by allowing to copy only 8 bytes onto the stack: https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 Does this look correct to you? Or better to do it as a follow-up? (It passes a couple of indexOf tests, will run tier1-4 on it). - PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777134871
Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]
On Thu, 26 Sep 2024 11:39:02 GMT, Roman Kennke wrote: >> src/hotspot/cpu/x86/stubGenerator_x86_64.cpp line 4008: >> >>> 4006: #ifdef COMPILER2 >>> 4007: if ((UseAVX == 2) && EnableX86ECoreOpts && >>> !UseCompactObjectHeaders) { >>> 4008: generate_string_indexof(StubRoutines::_string_indexof_array); >> >> This stub routine should be re-enabled if `UseCompactObjectHeaders` is to >> become non-experimental and enabled by default in the future. Is there a RFE >> for this task? > > This comes from an assert in `LibraryCallKit::inline_string_indexOfI` and I > believe we can perhaps remove that assert and the !UCOH clause. I checked a > couple of tests that tripped that assert, and they seem to work fine, and I > also checked the code in `LibraryCallKit::inline_string_indexOfI` and > `generate_string_indexof_stubs()` and could not find anything obvious that > requires the base offset to be >=16. I am not sure why that assert is there. > I am now running tier1-4 with that change: > https://github.com/rkennke/jdk/commit/7001783e8c11718226506f42b7c1f1fda1af3ad0 > > If you know (or find) any reason why we need that assert, please let me know. > Otherwise I'd remove it, if you don't have objections. I am not familiar with the `indexOf` implementation, but here is a relevant comment that motivates the assertion: https://github.com/openjdk/jdk/pull/16753#discussion_r1592774634. - PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777033220
Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]
On Wed, 25 Sep 2024 12:53:17 GMT, Roman Kennke wrote: >> This is the main body of the JEP 450: Compact Object Headers (Experimental). >> >> It is also a follow-up to #20640, which now also includes (and supersedes) >> #20603 and #20605, plus the Tiny Class-Pointers parts that have been >> previously missing. >> >> Main changes: >> - Introduction of the (experimental) flag UseCompactObjectHeaders. All >> changes in this PR are protected by this flag. The purpose of the flag is to >> provide a fallback, in case that users unexpectedly observe problems with >> the new implementation. The intention is that this flag will remain >> experimental and opt-in for at least one release, then make it on-by-default >> and diagnostic (?), and eventually deprecate and obsolete it. However, there >> are a few unknowns in that plan, specifically, we may want to further >> improve compact headers to 4 bytes, we are planning to enhance the Klass* >> encoding to support virtually unlimited number of Klasses, at which point we >> could also obsolete UseCompressedClassPointers. >> - The compressed Klass* can now be stored in the mark-word of objects. In >> order to be able to do this, we are add some changes to GC forwarding (see >> below) to protect the relevant (upper 22) bits of the mark-word. Significant >> parts of this PR deal with loading the compressed Klass* from the mark-word. >> This PR also changes some code paths (mostly in GCs) to be more careful when >> accessing Klass* (or mark-word or size) to be able to fetch it from the >> forwardee in case the object is forwarded. >> - Self-forwarding in GCs (which is used to deal with promotion failure) now >> uses a bit to indicate 'self-forwarding'. This is needed to preserve the >> crucial Klass* bits in the header. This also allows to get rid of >> preserved-header machinery in SerialGC and G1 (Parallel GC abuses >> preserved-marks to also find all other relevant oops). >> - Full GC forwarding now uses an encoding similar to compressed-oops. We >> have 40 bits for that, and can encode up to 8TB of heap. When exceeding 8TB, >> we turn off UseCompressedClassPointers (except in ZGC, which doesn't use the >> GC forwarding at all). >> - Instances can now have their base-offset (the offset where the field >> layouter starts to place fields) at offset 8 (instead of 12 or 16). >> - Arrays will now store their length at offset 8. >> - CDS can now write and read archives with the compressed header. However, >> it is not possible to read an archive that has been written with an opposite >> setting of UseCompactObjectHeaders. Some build machinery is added so that >> _co... > > Roman Kennke has updated the pull request incrementally with one additional > commit since the last revision: > > Allow LM_MONITOR on 32-bit platforms src/hotspot/cpu/aarch64/c2_MacroAssembler_aarch64.cpp line 2570: > 2568: // we get the heapBase in obj, and the > narrowOop+klass_offset_in_bytes/sizeof(narrowOop) in index. > 2569: // When that happens, we need to lea the address into a single > register, and subtract the > 2570: // klass_offset_in_bytes, to get the address of the mark-word. Parts of this comment are obsolete after commit 2c4a7877, please update the comment. src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 882: > 880: void store_klass(Register dst, Register src); > 881: void cmp_klass(Register oop, Register trial_klass, Register tmp); > 882: void cmp_klass(Register src, Register dst, Register tmp1, Register > tmp2); Same suggestion as for the analogous x86 functions: consider renaming the four-argument `cmp_klass` function to `cmp_klasses_from_objects` or similar, and the `src` and `dst` parameters to `oop1` and `oop2` or similar if there is no notion of "source" and "destination". - PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776927247 PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1776942226
Re: RFR: 8305895: Implement JEP 450: Compact Object Headers (Experimental) [v26]
On Thu, 26 Sep 2024 16:15:39 GMT, Roman Kennke wrote: >>> Does this look correct to you? Or better to do it as a follow-up? >> >> I do not feel confident enough to review this part. If you want to include >> https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 >> in this changeset, I would prefer that the original author of JDK-8320448 >> or at least someone from Intel reviews it, otherwise I think it is fine to >> leave it as a follow-up enhancement. > > @sviswa7 or @asgibbons WDYT about including > https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1 > as part of compact object headers implementation? Otherwise we would have to > disable indexOf intrinsic when running with compact headers, because of the > assumption that array headers are >= 16 bytes, which is no longer true with > compact headers. @rkennke I reviewed [rkennke@ 097c2af](https://github.com/rkennke/jdk/commit/097c2afa04397773e514552dfb942aa889bfa2c1) and the code looks good to me. I would prefer this approach instead of not generating the IndexOf intrinsic. Should the controlling `if` be conditioned on `UseCompactObjectHeaders` instead of `arrayOopDesc::base_offset_in_bytes`? I can see benefits to either - which provides more clarity? I like the assert as it makes the intention clear (thanks!). - PR Review Comment: https://git.openjdk.org/jdk/pull/20677#discussion_r1777485078
Integrated: 8340552: Harden TzdbZoneRulesCompiler against missing zone names
On Fri, 20 Sep 2024 16:09:13 GMT, Florian Weimer wrote: > 8340552: Harden TzdbZoneRulesCompiler against missing zone names This pull request has now been integrated. Changeset: 1bc13a1c Author:Florian Weimer URL: https://git.openjdk.org/jdk/commit/1bc13a1c10a580f84f1b7686c95344ec2633f611 Stats: 12 lines in 1 file changed: 8 ins; 0 del; 4 mod 8340552: Harden TzdbZoneRulesCompiler against missing zone names Reviewed-by: andrew, jlu, naoto - PR: https://git.openjdk.org/jdk/pull/21112
Re: RFR: 8320500: [vectorapi] RISC-V: Optimize vector math operations with SLEEF [v8]
On Thu, 26 Sep 2024 13:14:04 GMT, Hamlin Li wrote: >> Hi, >> Can you help to review this patch? >> Thanks! >> >> This patch is based on https://github.com/openjdk/jdk/pull/20781 which added >> the sleef source (in particular the generated sleef inline headers). We use >> sleef api to vectorize the math operations in vector api. >> >> On machine with vector intrinsic support on riscv (e.g. gcc 14+) it will >> generate libsleef.so with the bridge functions to sleef api, otherwise >> without the bridge functions. >> >> ### Test >> test/jdk/jdk/incubator/vector >> >> ### Performance >> data on bananapi >> >> Benchmark - bananapi | (size) | Mode | Cnt | Score +intrinsic | Error >> +intrinsic | Score -intrinsic | Error -intrinsic | Units | Improvement >> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- >> Double128Vector.ACOS | 1024 | avgt | 10 | 112444.388 | 655.761 | 208554.742 >> | 1508.709 | ns/op | 1.855 >> Double128Vector.ASIN | 1024 | avgt | 10 | 104121.259 | 243.167 | 208314.499 >> | 2833.61 | ns/op | 2.001 >> Double128Vector.ATAN | 1024 | avgt | 10 | 136941.263 | 243.486 | 284024.53 | >> 2204.224 | ns/op | 2.074 >> Double128Vector.ATAN2 | 1024 | avgt | 10 | 163228.681 | 435.455 | 427589.587 >> | 3045.192 | ns/op | 2.62 >> Double128Vector.CBRT | 1024 | avgt | 10 | 146395.753 | 239.355 | 317136.654 >> | 1330.869 | ns/op | 2.166 >> Double128Vector.COS | 1024 | avgt | 10 | 154865.298 | 235.697 | 305721.518 | >> 1319.313 | ns/op | 1.974 >> Double128Vector.COSH | 1024 | avgt | 10 | 189212.943 | 262.399 | 220756.27 | >> 61324.863 | ns/op | 1.167 >> Double128Vector.EXP | 1024 | avgt | 10 | 113941.594 | 219.647 | 252853.07 | >> 891.272 | ns/op | 2.219 >> Double128Vector.EXPM1 | 1024 | avgt | 10 | 184552.939 | 513.715 | 254087.184 >> | 2144.997 | ns/op | 1.377 >> Double128Vector.HYPOT | 1024 | avgt | 10 | 111580.194 | 423.282 | 374537.338 >> | 2091.811 | ns/op | 3.357 >> Double128Vector.LOG | 1024 | avgt | 10 | 110680.548 | 192.731 | 265391.129 | >> 2653.519 | ns/op | 2.398 >> Double128Vector.LOG10 | 1024 | avgt | 10 | 116708.105 | 167.095 | 285764.405 >> | 2489.08 | ns/op | 2.449 >> Double128Vector.LOG1P | 1024 | avgt | 10 | 115633.302 | 567.7 | 317235.967 | >> 1062.848 | ns/op | 2.743 >> Double128Vector.POW | 1024 | avgt | 10 | 321655.14 | 3... > > Hamlin Li has updated the pull request incrementally with one additional > commit since the last revision: > > fix test macro src/jdk.incubator.vector/linux/native/libsleef/lib/vector_math_rvv.c line 51: > 49: // the dynamic rounding mode is always RNE. > 50: > 51: #ifdef DEBUG Question: Should we check for `NDEBUG` macro here instead? I see checks for this macro in the original SLEEF code. #ifndef NDEBUG #define CHECK_FRM __asm__ __volatile__ ( \ "frrm t0 \n\t" \ "beqz t0, 2f \n\t" \ "csrrw x0, cycle, x0 \n\t" \ "2: \n\t" \ : : : "memory" ); #else #define CHECK_FRM #endif - PR Review Comment: https://git.openjdk.org/jdk/pull/21083#discussion_r1777931566