Re: RFR: 8332547: Unloaded signature classes in DirectMethodHandles

2024-06-03 Thread Vladimir Ivanov
On Tue, 21 May 2024 20:14:41 GMT, Jorn Vernee wrote: >> Class loading triggered by `Class.forName()` call is at the core of >> `isTypeVisible`. (The rest is fast path checks.) It's what makes >> `isTypeVisible` query idempotent. >> >> I can definitely name it differently (e.g, `ensureTypeVisi

Re: RFR: 8332547: Unloaded signature classes in DirectMethodHandles [v2]

2024-06-03 Thread Vladimir Ivanov
ots of arguments), but > `MethodHandle` construction step is not performance critical. > > Testing: hs-tier1 - hs-tier4 Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: Renaming: isTypeVisible -> ensureTypeVisible --

Re: RFR: 8332547: Unloaded signature classes in DirectMethodHandles [v2]

2024-06-03 Thread Vladimir Ivanov
On Mon, 3 Jun 2024 19:36:58 GMT, Vladimir Ivanov wrote: >> JVM routinely installs loader constraints for unloaded signature classes >> when method resolution takes place. MethodHandle resolution took a different >> route and eagerly resolves signature cl

Integrated: 8332547: Unloaded signature classes in DirectMethodHandles

2024-06-03 Thread Vladimir Ivanov
On Mon, 20 May 2024 21:29:20 GMT, Vladimir Ivanov wrote: > JVM routinely installs loader constraints for unloaded signature classes when > method resolution takes place. MethodHandle resolution took a different route > and eagerly resolves signature classes ins

Re: RFR: 8307513: C2: intrinsify Math.max(long,long) and Math.min(long,long)

2024-07-11 Thread Vladimir Ivanov
On Tue, 9 Jul 2024 12:07:37 GMT, Galder Zamarreño wrote: > This patch intrinsifies `Math.max(long, long)` and `Math.min(long, long)` in > order to help improve vectorization performance. > > Currently vectorization does not kick in for loops containing either of these > calls because of the fo

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter

2024-07-11 Thread Vladimir Ivanov
On Tue, 2 Jul 2024 14:52:09 GMT, Andrew Haley wrote: > This patch expands the use of a hash table for secondary superclasses > to the interpreter, C1, and runtime. It also adds a C2 implementation > of hashed lookup in cases where the superclass isn't known at compile > time. > > HotSpot shared

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter

2024-07-11 Thread Vladimir Ivanov
On Thu, 11 Jul 2024 23:17:10 GMT, Vladimir Ivanov wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't k

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter

2024-07-17 Thread Vladimir Ivanov
On Wed, 17 Jul 2024 17:13:49 GMT, Andrew Haley wrote: >> Another observation while browsing the code: `_secondary_supers_bitmap` >> would be a better name. (Same considerations apply to `_hash_slot`.) > > This is because the C++ runtime does secondary super cache lookups even > before the bitma

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter

2024-07-17 Thread Vladimir Ivanov
On Wed, 17 Jul 2024 18:46:11 GMT, Vladimir Ivanov wrote: >> This is because the C++ runtime does secondary super cache lookups even >> before the bitmap has been calculated and the hash table sorted. In this >> case the bitmap is zero, so teh search thinks there are n

Re: RFR: 8335480: Only deoptimize threads if needed when closing shared arena [v7]

2024-07-17 Thread Vladimir Ivanov
On Wed, 17 Jul 2024 15:19:18 GMT, Jorn Vernee wrote: >> This PR limits the number of cases in which we deoptimize frames when >> closing a shared Arena. The initial intent of this was to improve the >> performance of shared arena closure in cases where a lot of threads are >> accessing and clo

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v2]

2024-07-18 Thread Vladimir Ivanov
On Thu, 18 Jul 2024 16:40:47 GMT, Andrew Haley wrote: >> src/hotspot/cpu/aarch64/macroAssembler_aarch64.hpp line 1040: >> >>> 1038: >>> 1039: // Secondary subtype checking >>> 1040: void lookup_secondary_supers_table(Register sub_klass, >> >> While browsing the code, I noticed that it's fa

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v2]

2024-07-18 Thread Vladimir Ivanov
On Thu, 18 Jul 2024 16:35:16 GMT, Andrew Haley wrote: >> On a second thought the following setter may be the culprit: >> >> void Klass::set_secondary_supers(Array* secondaries) { >> assert(!UseSecondarySupersTable || secondaries == nullptr, ""); >> set_secondary_supers(secondaries, SECONDARY

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v2]

2024-07-18 Thread Vladimir Ivanov
On Wed, 17 Jul 2024 17:15:32 GMT, Andrew Haley wrote: >> src/hotspot/share/oops/klass.inline.hpp line 122: >> >>> 120: return true; >>> 121: >>> 122: bool result = lookup_secondary_supers_table(k); >> >> Should `UseSecondarySupersTable` affect `Klass::search_secondary_supers` as >> well

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v2]

2024-07-18 Thread Vladimir Ivanov
On Thu, 18 Jul 2024 20:07:14 GMT, Vladimir Ivanov wrote: >> I think not. It'd complicate C++ runtime for no useful reason. > > On the other hand, if `-XX:-UseSecondarySupersTable` is intended solely for > diagnostic purposes, then handling all possible execution modes unifor

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5]

2024-07-23 Thread Vladimir Ivanov
On Mon, 22 Jul 2024 14:56:31 GMT, Andrew Haley wrote: >> src/hotspot/share/oops/klass.inline.hpp line 117: >> >>> 115: } >>> 116: >>> 117: inline bool Klass::search_secondary_supers(Klass *k) const { >> >> I see you moved `Klass::search_secondary_supers` in `klass.inline.hpp`, but >> I'm not

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5]

2024-07-23 Thread Vladimir Ivanov
On Mon, 22 Jul 2024 14:16:05 GMT, Andrew Haley wrote: >> src/hotspot/share/oops/klass.cpp line 175: >> >>> 173: if (secondary_supers()->at(i) == k) { >>> 174: if (UseSecondarySupersCache) { >>> 175: ((Klass*)this)->set_secondary_super_cache(k); >> >> Does it make sense to asse

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5]

2024-07-23 Thread Vladimir Ivanov
On Mon, 22 Jul 2024 14:00:35 GMT, Andrew Haley wrote: >> Also, `num_extra_slots == 0` check is redundant. > >> Since `secondary_supers` are hashed unconditionally now, is >> `interfaces->length() <= 1` check still needed? > > I don't think so, no. Our incoming `transitive_interfaces` is formed

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5]

2024-07-23 Thread Vladimir Ivanov
On Mon, 22 Jul 2024 16:45:06 GMT, Andrew Haley wrote: >> src/hotspot/cpu/x86/macroAssembler_x86.cpp line 4810: >> >>> 4808: Label* >>> L_success, >>> 4809: Label* >>> L_failure) {

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5]

2024-07-23 Thread Vladimir Ivanov
On Mon, 22 Jul 2024 17:19:46 GMT, Andrew Haley wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5]

2024-07-23 Thread Vladimir Ivanov
On Mon, 22 Jul 2024 16:36:25 GMT, Andrew Haley wrote: >>> Alternatively, `Klass::is_subtype_of()` can unconditionally perform linear >>> search over secondary_supers array. >>> >>> Even though I very much like to see table lookup written in C++ >>> (accompanying heavily optimized platform-spec

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v5]

2024-07-23 Thread Vladimir Ivanov
On Mon, 22 Jul 2024 17:19:46 GMT, Andrew Haley wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v6]

2024-07-24 Thread Vladimir Ivanov
On Wed, 24 Jul 2024 09:03:12 GMT, Andrew Haley wrote: >>> Also also, Klass::is_subtype_of() is used for C1 runtime. >> >> Can you elaborate, please? What I'm seeing in >> `Runtime1::generate_code_for()` for `slow_subtype_check` is a call into >> `MacroAssembler::check_klass_subtype_slow_path()

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v6]

2024-07-24 Thread Vladimir Ivanov
On Wed, 24 Jul 2024 16:14:47 GMT, Andrew Haley wrote: >>> I suspect that Klass::search_secondary_supers() won't be inlinined in such >>> case. >> >> That's true, but it's true of every other function in that file. Is it not >> deliberate? > > FYI, somewhat related: AArch64 GCC inlines `lookup_

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v8]

2024-07-25 Thread Vladimir Ivanov
On Thu, 25 Jul 2024 16:05:49 GMT, Andrew Haley wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v8]

2024-07-25 Thread Vladimir Ivanov
On Thu, 25 Jul 2024 16:05:49 GMT, Andrew Haley wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v8]

2024-07-25 Thread Vladimir Ivanov
On Thu, 25 Jul 2024 13:56:34 GMT, Andrew Haley wrote: >> Thanks, now I see that `Class::isInstance(Object)` is backed by >> `Runtime1::is_instance_of()` which uses `oopDesc::is_a()` to do the job. >> >> If it turns out to be performance critical, the intrinsic implementation >> should be rewri

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v9]

2024-07-26 Thread Vladimir Ivanov
On Fri, 26 Jul 2024 15:13:06 GMT, Andrew Haley wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v9]

2024-07-26 Thread Vladimir Ivanov
On Fri, 26 Jul 2024 15:13:06 GMT, Andrew Haley wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot

Re: RFR: 8331341: secondary_super_cache does not scale well: C1 and interpreter [v9]

2024-07-26 Thread Vladimir Ivanov
On Fri, 26 Jul 2024 15:13:06 GMT, Andrew Haley wrote: >> This patch expands the use of a hash table for secondary superclasses >> to the interpreter, C1, and runtime. It also adds a C2 implementation >> of hashed lookup in cases where the superclass isn't known at compile >> time. >> >> HotSpot

Re: RFR: 8295302: Do not use ArrayList when LambdaForm has a single ClassData

2022-10-13 Thread Vladimir Ivanov
On Thu, 13 Oct 2022 21:53:47 GMT, Ioi Lam wrote: > Please review this small optimization. As shown in the JBS issue, most of the > generated LambdaForm classes have a single ClassData, so we can get a small > footprint/speed improvement. src/java.base/share/classes/java/lang/invoke/InvokerByte

Re: RFR: 8295302: Do not use ArrayList when LambdaForm has a single ClassData [v2]

2022-10-13 Thread Vladimir Ivanov
On Fri, 14 Oct 2022 04:37:22 GMT, Ioi Lam wrote: >> Please review this small optimization. As shown in the JBS issue, most of >> the generated LambdaForm classes have a single ClassData, so we can get a >> small footprint/speed improvement. > > Ioi Lam has updated the pull request incrementally

Re: RFR: JDK-8285932 Implementation of JEP-430 String Templates (Preview) [v11]

2022-11-02 Thread Vladimir Ivanov
On Wed, 2 Nov 2022 19:44:00 GMT, Jim Laskey wrote: >> Enhance the Java programming language with string templates, which are >> similar to string literals but contain embedded expressions. A string >> template is interpreted at run time by replacing each expression with the >> result of evalua

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-11-11 Thread Vladimir Ivanov
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-11-11 Thread Vladimir Ivanov
On Sat, 12 Nov 2022 00:55:56 GMT, Vladimir Ivanov wrote: >> Claes Redestad has updated the pull request incrementally with one >> additional commit since the last revision: >> >> Missing & 0xff in StringLatin1::hashCode > > src/hotspot/cpu/x86/x86_64.ad

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-11-11 Thread Vladimir Ivanov
On Fri, 11 Nov 2022 13:00:06 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-11-14 Thread Vladimir Ivanov
On Sun, 13 Nov 2022 21:01:21 GMT, Claes Redestad wrote: >> src/hotspot/share/opto/intrinsicnode.hpp line 175: >> >>> 173: // as well as adjusting for special treatment of various encoding of >>> String >>> 174: // arrays. Must correspond to declared constants in >>> jdk.internal.util.Array

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-11-14 Thread Vladimir Ivanov
On Sun, 13 Nov 2022 19:50:46 GMT, Claes Redestad wrote: > ... several challenges were brought up to the table, including how to deal > with all the different contingencies that might be the result of a safepoint, > including deoptimization. FTR if the intrinsic is represented as a stand-alone

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v13]

2022-11-14 Thread Vladimir Ivanov
On Sun, 13 Nov 2022 21:08:53 GMT, Claes Redestad wrote: > How far off is this ...? Back then it looked way too constrained (tight constraints on code shapes). But I considered it as a generally applicable optimization. > ... do you think it'll be able to match the efficiency we see here with

Re: RFR: 8296477: Foreign linker implementation update following JEP 434 [v4]

2022-11-14 Thread Vladimir Ivanov
On Thu, 10 Nov 2022 16:48:19 GMT, Jorn Vernee wrote: >> Pull in linker implementation changes, that include non-trivial changes to >> VM code, from the panama-foreign repo into the main JDK. >> >> This is split off from the main JEP integration to make reviewing easier. >> >> This includes the

Re: RFR: 8296477: Foreign linker implementation update following JEP 434 [v7]

2022-11-21 Thread Vladimir Ivanov
On Fri, 18 Nov 2022 14:54:52 GMT, Jorn Vernee wrote: >> Pull in linker implementation changes, that include non-trivial changes to >> VM code, from the panama-foreign repo into the main JDK. >> >> This is split off from the main JEP integration to make reviewing easier. >> >> This includes the

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v18]

2023-01-11 Thread Vladimir Ivanov
On Mon, 9 Jan 2023 16:49:25 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they a

Re: RFR: 8282664: Unroll by hand StringUTF16 and StringLatin1 polynomial hash loops [v20]

2023-01-17 Thread Vladimir Ivanov
On Mon, 16 Jan 2023 23:28:37 GMT, Claes Redestad wrote: >> Continuing the work initiated by @luhenry to unroll and then intrinsify >> polynomial hash loops. >> >> I've rewired the library changes to route via a single `@IntrinsicCandidate` >> method. To make this work I've harmonized how they

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter [v4]

2023-03-08 Thread Vladimir Ivanov
On Wed, 8 Mar 2023 05:17:53 GMT, Vladimir Kozlov wrote: >> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in >> Interpreter and C1 compiler to produce the same results as C2 intrinsics on >> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java >>

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter [v4]

2023-03-08 Thread Vladimir Ivanov
On Wed, 8 Mar 2023 20:55:29 GMT, Vladimir Kozlov wrote: >> src/hotspot/share/opto/convertnode.cpp line 171: >> >>> 169: if (t == Type::TOP) return Type::TOP; >>> 170: if (t == Type::FLOAT) return TypeInt::SHORT; >>> 171: if (StubRoutines::f2hf() == nullptr) return bottom_type(); >> >> Wha

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter [v4]

2023-03-08 Thread Vladimir Ivanov
On Wed, 8 Mar 2023 21:41:31 GMT, Vladimir Ivanov wrote: > Or encapsulate the constant folding logic (along with the guard) into > SharedRuntime and return Type* (instead of int/float scalar). I take this particular suggestion back. `SharedRuntime` is compiler-agnostic while `Type`

Re: RFR: 8302976: C2 intrinsification of Float.floatToFloat16 and Float.float16ToFloat yields different result than the interpreter [v5]

2023-03-08 Thread Vladimir Ivanov
On Wed, 8 Mar 2023 23:14:05 GMT, Vladimir Kozlov wrote: >> Implemented `Float.floatToFloat16` and `Float.float16ToFloat` intrinsics in >> Interpreter and C1 compiler to produce the same results as C2 intrinsics on >> x64, Aarch64 and RISC-V - all platforms where C2 intrinsics for these Java >>

Re: RFR: 8303022: "assert(allocates2(pc)) failed: not in CodeBuffer memory" When linking downcall handle [v3]

2023-03-15 Thread Vladimir Ivanov
On Fri, 10 Mar 2023 14:14:55 GMT, Jorn Vernee wrote: >> The issue is that the size of the code buffer is not large enough to hold >> the whole stub. >> >> Proposed solution is to scale the size of the stub with the number of >> arguments. I've adjusted sizes for both downcall and upcall stubs.

Re: RFR: 8304303: implement VirtualThread class notifyJvmti methods as C2 intrinsics [v4]

2023-03-17 Thread Vladimir Ivanov
On Fri, 17 Mar 2023 10:31:46 GMT, Serguei Spitsyn wrote: >> This is needed for performance improvements in support of virtual threads. >> The update includes the following: >> >> 1. Refactored the `VirtualThread` native methods: >> `notifyJvmtiMountBegin` and `notifyJvmtiMountEnd` =

Re: RFR: 8304265: Implementation of Foreign Function and Memory API (Third Preview) [v20]

2023-04-11 Thread Vladimir Ivanov
On Thu, 6 Apr 2023 10:54:18 GMT, Per Minborg wrote: >> API changes for the FFM API (third preview) >> >> Specdiff: >> https://cr.openjdk.org/~pminborg/panama/21/v1/specdiff/overview-summary.html >> >> Javadoc: >> https://cr.openjdk.org/~pminborg/panama/21/v1/javadoc/java.base/module-summary.htm

Re: RFR: 8304450: [vectorapi] Refactor VectorShuffle implementation [v7]

2023-04-11 Thread Vladimir Ivanov
On Fri, 7 Apr 2023 17:13:50 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch reimplements `VectorShuffle` implementations to be a vector of >> the bit type. Currently, VectorShuffle is stored as a byte array, and would >> be expanded upon usage. This poses several drawbacks: >> >> 1. Ineffici

Re: RFR: 8303762: [vectorapi] Intrinsification of Vector.slice [v6]

2023-04-11 Thread Vladimir Ivanov
On Tue, 4 Apr 2023 13:46:12 GMT, Quan Anh Mai wrote: >> `Vector::slice` is a method at the top-level class of the Vector API that >> concatenates the 2 inputs into an intermediate composite and extracts a >> window equal to the size of the inputs into the result. It is used in vector >> conver

Re: RFR: 8313023: Return value corrupted when using CCS + isTrivial (mainline)

2023-07-28 Thread Vladimir Ivanov
On Tue, 25 Jul 2023 19:17:38 GMT, Jorn Vernee wrote: > Port of: https://github.com/openjdk/panama-foreign/pull/848 from the > panama-foreign repo. > > Copying the PR body here for convenience: > > Due to a bug in the downcall linker stub generation, we don't save the return > value when captu

Re: RFR: 8313406: nep_invoker_blob can be simplified more

2023-08-14 Thread Vladimir Ivanov
On Mon, 31 Jul 2023 12:22:00 GMT, Yasumasa Suenaga wrote: > In FFM, native function would be called via `nep_invoker_blob`. If the > function has two arguments, it would be following: > > > Decoding RuntimeStub - nep_invoker_blob 0x7fcae394cd10 > ---

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v5]

2023-10-11 Thread Vladimir Ivanov
On Wed, 11 Oct 2023 20:58:23 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to address the follow-up comments to the SIMD >> accelerated sort PR (#14227) which implemented AVX512 intrinsics for >> Arrays.sort() methods. >> The proposed changes are: >> >> 1) Restriction of the AVX

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]

2023-10-11 Thread Vladimir Ivanov
On Thu, 5 Oct 2023 23:36:48 GMT, Srinivas Vamsi Parasa wrote: >> The goal is to develop faster sort routines for x86_64 CPUs by taking >> advantage of AVX512 instructions. This enhancement provides an order of >> magnitude speedup for Arrays.sort() using int, long, float and double arrays. >>

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]

2023-10-11 Thread Vladimir Ivanov
On Wed, 11 Oct 2023 23:25:22 GMT, Vladimir Ivanov wrote: >> Srinivas Vamsi Parasa has updated the pull request with a new target base >> due to a merge or a rebase. The pull request now contains 45 commits: >> >> - fix code style and formatting >> -

Re: RFR: 8309130: x86_64 AVX512 intrinsics for Arrays.sort methods (int, long, float and double arrays) [v42]

2023-10-11 Thread Vladimir Ivanov
On Wed, 11 Oct 2023 23:38:05 GMT, Sandhya Viswanathan wrote: >> Also, for on-heap case the fallback implementation is equivalent to >> intrinsified case only when offset points at the 0th element of the array. > > @iwanowww Yes, you are late to the party :). The fallback implementation > could

Re: RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v5]

2023-10-11 Thread Vladimir Ivanov
On Wed, 11 Oct 2023 20:58:23 GMT, Srinivas Vamsi Parasa wrote: >> The goal of this PR is to address the follow-up comments to the SIMD >> accelerated sort PR (#14227) which implemented AVX512 intrinsics for >> Arrays.sort() methods. >> The proposed changes are: >> >> 1) Restriction of the AVX

Re: RFR: 8254693: Add Panama feature to pass heap segments to native code [v12]

2023-11-08 Thread Vladimir Ivanov
On Tue, 24 Oct 2023 15:09:57 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using >> `Linker.Option.critical(true)` as a linker option. It has the same >> limitations as normal critical calls, namely: upcalls into Java are not >> allowed, and the

Re: RFR: 8254693: Add Panama feature to pass heap segments to native code [v11]

2023-11-08 Thread Vladimir Ivanov
On Sat, 21 Oct 2023 12:04:10 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using >> `Linker.Option.critical(true)` as a linker option. It has the same >> limitations as normal critical calls, namely: upcalls into Java are not >> allowed, and the

Re: RFR: 8254693: Add Panama feature to pass heap segments to native code [v14]

2023-11-13 Thread Vladimir Ivanov
On Mon, 13 Nov 2023 12:51:36 GMT, Jorn Vernee wrote: >> Add the ability to pass heap segments to native code. This requires using >> `Linker.Option.critical(true)` as a linker option. It has the same >> limitations as normal critical calls, namely: upcalls into Java are not >> allowed, and the

Re: RFR: 8324433: Introduce a way to determine if an expression is evaluated as a constant by the Jit compiler [v7]

2024-01-28 Thread Vladimir Ivanov
On Thu, 25 Jan 2024 14:01:59 GMT, Quan Anh Mai wrote: >> Hi, >> >> This patch introduces `JitCompiler::isConstantExpression` which can be used >> to statically determine whether an expression has been constant-folded by >> the Jit compiler, leading to more constant-folding opportunities. For

RFR: 8332547: Unloaded signature classes in DirectMethodHandles

2024-05-20 Thread Vladimir Ivanov
JVM routinely installs loader constraints for unloaded signature classes when method resolution takes place. MethodHandle resolution took a different route and eagerly resolves signature classes instead (see `java.lang.invoke.MemberName$Factory::resolve` and `sun.invoke.util.VerifyAccess::isTyp

Re: RFR: 8332547: Unloaded signature classes in DirectMethodHandles

2024-05-21 Thread Vladimir Ivanov
On Mon, 20 May 2024 21:29:20 GMT, Vladimir Ivanov wrote: > JVM routinely installs loader constraints for unloaded signature classes when > method resolution takes place. MethodHandle resolution took a different route > and eagerly resolves signature classes ins

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

2024-10-14 Thread Vladimir Ivanov
On Wed, 9 Oct 2024 09:59:11 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ >> instruction for following IR pallets. >> >> >>MulL ( And SRC1, 0x) ( And SRC2, 0x) >>MulL (URShift SRC1 , 32) (URShift S

Re: RFR: 8341127: Extra call to MethodHandle::asType from memory segment var handles fails to inline [v4]

2024-10-01 Thread Vladimir Ivanov
On Tue, 1 Oct 2024 17:49:12 GMT, Maurizio Cimadamore wrote: >> The fix for JDK-8331865 introduced an accidental performance regression. >> The main issue is that now *all* memory segment var handles go through some >> round of adaptation. >> Adapting a var handle results in a so called *indirec

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4]

2024-10-18 Thread Vladimir Ivanov
On Fri, 18 Oct 2024 02:03:21 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ >> instruction for following IR pallets. >> >> >>MulL ( And SRC1, 0x) ( And SRC2, 0x) >>MulL (URShift SRC1 , 32) (URShift

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

2024-10-17 Thread Vladimir Ivanov
On Thu, 17 Oct 2024 19:40:52 GMT, Jatin Bhateja wrote: >> MulVL (VectorCastI2X src1) (VectorCastI2X src2) > It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, ... Hm, I don't see any problems with it if `VPMULDQ` is used. Sign extension becomes redundant when 64-bit multipl

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v2]

2024-10-17 Thread Vladimir Ivanov
On Tue, 15 Oct 2024 17:26:49 GMT, Quan Anh Mai wrote: >> I'm pretty ambivalent, I think implementing it either way would be alright. >> Especially with unit tests, I think the lowering implementation wouldn't be >> that difficult. Maybe another reviewer has an opinion? >> >> About PhaseLowerin

Re: RFR: 8311071: Avoid SoftReferences in LambdaFormEditor and MethodTypeForm when storing heap objects into AOT cache [v9]

2024-10-17 Thread Vladimir Ivanov
On Wed, 16 Oct 2024 00:03:25 GMT, Ioi Lam wrote: >> This is the 6th PR for [JEP 483: Ahead-of-Time Class Loading & >> Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> The implementation of java.lang.invoke uses SoftReferences so that unused >> MethodHandles, LambdaForms, etc, can b

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4]

2024-10-17 Thread Vladimir Ivanov
On Fri, 18 Oct 2024 04:16:15 GMT, Jatin Bhateja wrote: > It convolutes the graph with machine-dependent nodes early in the compiling > process. Ah, I see your point now! I took a closer look at the patch and indeed `MulVLNode::_mult_lower_double_word` with `MulVLNode::Ideal()` don't look pret

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4]

2024-10-17 Thread Vladimir Ivanov
On Fri, 18 Oct 2024 05:05:16 GMT, Quan Anh Mai wrote: > The issue is that a node is not immutable. I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNo

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4]

2024-10-17 Thread Vladimir Ivanov
On Fri, 18 Oct 2024 05:46:25 GMT, Vladimir Ivanov wrote: >> You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq >> >> VPMULUDQ (VEX.256 Encoded Version)[ >> ¶](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-) >&g

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4]

2024-10-17 Thread Vladimir Ivanov
On Fri, 18 Oct 2024 05:35:27 GMT, Quan Anh Mai wrote: >>> The issue is that a node is not immutable. >> >> I don't see any issues with mutability here. >> `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate >> new node if you want to change its value. (And that's exactly

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4]

2024-10-17 Thread Vladimir Ivanov
On Fri, 18 Oct 2024 05:39:08 GMT, Quan Anh Mai wrote: >> `vpmuludq` does a long multiplication but throws away the upper bits of the >> operands, effectively does a `(x & max_juint) * (y & max_juint)` > > You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq > > VPMULUDQ

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMULUDQ instruction [v4]

2024-10-17 Thread Vladimir Ivanov
On Fri, 18 Oct 2024 02:03:21 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMULUDQ >> instruction for following IR pallets. >> >> >>MulL ( And SRC1, 0x) ( And SRC2, 0x) >>MulL (URShift SRC1 , 32) (URShift

Re: RFR: 8343019: Primitive caches must use boxed instances from the archive

2024-10-28 Thread Vladimir Ivanov
On Mon, 28 Oct 2024 10:16:40 GMT, Aleksey Shipilev wrote: > This is forked from > [JDK-8342642](https://bugs.openjdk.org/browse/JDK-8342642) and filed as a > general issue for archived boxed Integer cache when it's recreated at > runtime. In short, current code drops the entire primitive cache

Re: RFR: 8343019: Primitive caches must use boxed instances from the archive

2024-10-28 Thread Vladimir Ivanov
On Mon, 28 Oct 2024 10:16:40 GMT, Aleksey Shipilev wrote: > This is forked from > [JDK-8342642](https://bugs.openjdk.org/browse/JDK-8342642) and filed as a > general issue for archived boxed Integer cache when it's recreated at > runtime. In short, current code drops the entire primitive cache

Re: RFR: 8343019: Primitive caches must use boxed instances from the archive [v2]

2024-10-29 Thread Vladimir Ivanov
On Tue, 29 Oct 2024 13:12:56 GMT, Aleksey Shipilev wrote: >> This is forked from >> [JDK-8342642](https://bugs.openjdk.org/browse/JDK-8342642) and filed as a >> general issue for archived boxed Integer cache when it's recreated at >> runtime. In short, current code drops the entire primitive c

RFR: 8317542: Specjvm::xml have scalability issue for high vCPU numbers

2024-11-01 Thread Vladimir Ivanov
The synchronization block may be substituted by the 'volatile' variable smaller synchronization block. It reduce the total blocking time for the specjvm2008::xml.validation workload and improve the reported score. Scores for the 112vCPU on the with 28GB heap increased from 17915.83 to 22943.2. Un

Re: RFR: 8331497: Implement JEP 483: Ahead-of-Time Class Loading & Linking [v8]

2024-11-04 Thread Vladimir Ivanov
On Mon, 4 Nov 2024 23:57:48 GMT, Ioi Lam wrote: >> This is an implementation of [JEP 483: Ahead-of-Time Class Loading & >> Linking](https://openjdk.org/jeps/483). >> >> >> Note: this is a combined PR of the following individual PRs >> - https://github.com/openjdk/jdk/pull/20516 >> - https:

Re: RFR: 8317542: Specjvm::xml have scalability issue for high vCPU numbers

2024-11-05 Thread Vladimir Ivanov
On Thu, 31 Oct 2024 21:33:11 GMT, Vladimir Ivanov wrote: > The synchronization block may be substituted by the 'volatile' variable > smaller synchronization block. > It reduce the total blocking time for the specjvm2008::xml.validation > workload and improve the report

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2]

2024-11-08 Thread Vladimir Ivanov
On Fri, 8 Nov 2024 08:15:32 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ >> instruction for following IR pallets. >> >> >>MulVL ( AndV SRC1, 0x) ( AndV SRC2, 0x) >>MulVL (URShiftVL SRC1 , 32) (U

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v2]

2024-11-12 Thread Vladimir Ivanov
On Sun, 10 Nov 2024 07:40:30 GMT, Jatin Bhateja wrote: >> In the latest version you added new Ideal nodes (`MulIVL` and `MulUIVL`). I >> don't see a compelling reason to do so. IMO matcher functionality is more >> than enough to cover `VPMULDQ` case. `MulIVL` is equivalent to `MulVL` + >> `has

Re: RFR: 8311071: Avoid SoftReferences in LambdaFormEditor and MethodTypeForm when storing heap objects into AOT cache [v7]

2024-10-02 Thread Vladimir Ivanov
On Wed, 2 Oct 2024 01:06:20 GMT, Ioi Lam wrote: >> This is the 6th PR for [JEP 483: Ahead-of-Time Class Loading & >> Linking](https://bugs.openjdk.org/browse/JDK-8315737). >> >> The implementation of java.lang.invoke uses SoftReferences so that unused >> MethodHandles, LambdaForms, etc, can be

Re: RFR: 8337753: Target class of upcall stub may be unloaded [v6]

2024-10-02 Thread Vladimir Ivanov
On Thu, 19 Sep 2024 12:20:13 GMT, Jorn Vernee wrote: >> As discussed in the JBS issue: >> >> FFM upcall stubs embed a `Method*` of the target method in the stub. This >> `Method*` is read from the `LambdaForm::vmentry` field associated with the >> target method handle at the time when the upca

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

2024-11-06 Thread Vladimir Ivanov
On Thu, 24 Oct 2024 23:47:29 GMT, Vladimir Ivanov wrote: > So, IMO the best way to move this particular enhancement forward is: ... @jatin-bhateja here's a sketch (not tested): https://github.com/openjdk/jdk/compare/master...iwanowww:jdk:pr/21244 - PR Commen

Re: RFR: 8317542: Specjvm::xml have scalability issue for high vCPU numbers [v3]

2024-11-07 Thread Vladimir Ivanov
6 0 0 >jtreg:test/jdk/javax/xml 7070 0 0 > == > TEST SUCCESS > > The tier1 is OK too. Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: 8317542:

Re: RFR: 8317542: Specjvm::xml have scalability issue for high vCPU numbers [v2]

2024-11-07 Thread Vladimir Ivanov
6 0 0 >jtreg:test/jdk/javax/xml 7070 0 0 > == > TEST SUCCESS > > The tier1 is OK too. Vladimir Ivanov has updated the pull request incrementally with one additional commit since the last revision: 8317542:

Re: RFR: 8317542: Specjvm::xml have scalability issue for high vCPU numbers [v3]

2024-11-07 Thread Vladimir Ivanov
On Thu, 7 Nov 2024 18:30:22 GMT, Vladimir Ivanov wrote: >> The synchronization block may be substituted by the 'volatile' variable >> smaller synchronization block. >> It reduce the total blocking time for the specjvm2008::xml.validation >> workload and imp

Re: RFR: 8317542: Specjvm::xml have scalability issue for high vCPU numbers [v3]

2024-11-07 Thread Vladimir Ivanov
On Thu, 7 Nov 2024 18:30:22 GMT, Vladimir Ivanov wrote: >> The synchronization block may be substituted by the 'volatile' variable >> smaller synchronization block. >> It reduce the total blocking time for the specjvm2008::xml.validation >> workload and imp

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

2024-11-06 Thread Vladimir Ivanov
On Fri, 18 Oct 2024 05:05:16 GMT, Quan Anh Mai wrote: > The issue is that a node is not immutable. I don't see any issues with mutability here. `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate new node if you want to change its value. (And that's exactly what `MulVLNo

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

2024-11-06 Thread Vladimir Ivanov
On Fri, 18 Oct 2024 05:35:27 GMT, Quan Anh Mai wrote: >>> The issue is that a node is not immutable. >> >> I don't see any issues with mutability here. >> `MulVLNode::_mult_lower_double_word` is constant, so you have to allocate >> new node if you want to change its value. (And that's exactly

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

2024-11-06 Thread Vladimir Ivanov
On Sun, 29 Sep 2024 04:21:19 GMT, Jatin Bhateja wrote: > This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ > instruction for following IR pallets. > > >MulVL ( AndV SRC1, 0x) ( AndV SRC2, 0x) >MulVL (URShiftVL SRC1 , 32) (URShif

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

2024-11-06 Thread Vladimir Ivanov
On Thu, 17 Oct 2024 19:40:52 GMT, Jatin Bhateja wrote: >> MulVL (VectorCastI2X src1) (VectorCastI2X src2) > It looks unsafe to me, since VectorCastI2L sign-extends integer lanes, ... Hm, I don't see any problems with it if `VPMULDQ` is used. Sign extension becomes redundant when 64-bit multipl

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

2024-11-06 Thread Vladimir Ivanov
On Tue, 15 Oct 2024 17:26:49 GMT, Quan Anh Mai wrote: >> I'm pretty ambivalent, I think implementing it either way would be alright. >> Especially with unit tests, I think the lowering implementation wouldn't be >> that difficult. Maybe another reviewer has an opinion? >> >> About PhaseLowerin

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

2024-11-06 Thread Vladimir Ivanov
On Fri, 18 Oct 2024 04:16:15 GMT, Jatin Bhateja wrote: > It convolutes the graph with machine-dependent nodes early in the compiling > process. Ah, I see your point now! I took a closer look at the patch and indeed `MulVLNode::_mult_lower_double_word` with `MulVLNode::Ideal()` don't look pret

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

2024-11-06 Thread Vladimir Ivanov
On Fri, 18 Oct 2024 05:39:08 GMT, Quan Anh Mai wrote: >> `vpmuludq` does a long multiplication but throws away the upper bits of the >> operands, effectively does a `(x & max_juint) * (y & max_juint)` > > You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq > > VPMULUDQ

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction

2024-11-06 Thread Vladimir Ivanov
On Fri, 18 Oct 2024 05:46:25 GMT, Vladimir Ivanov wrote: >> You can see its pseudocode here https://www.felixcloutier.com/x86/pmuludq >> >> VPMULUDQ (VEX.256 Encoded Version)[ >> ¶](https://www.felixcloutier.com/x86/pmuludq#vpmuludq--vex-256-encoded-version-) >&g

Re: RFR: 8341137: Optimize long vector multiplication using x86 VPMUL[U]DQ instruction [v5]

2024-11-14 Thread Vladimir Ivanov
On Thu, 14 Nov 2024 18:24:59 GMT, Jatin Bhateja wrote: >> This patch optimizes LongVector multiplication by inferring VPMUL[U]DQ >> instruction for following IR pallets. >> >> >>MulVL ( AndV SRC1, 0x) ( AndV SRC2, 0x) >>MulVL (URShiftVL SRC1 , 32) (

  1   2   3   >