Re: RFR: 8291555: Implement alternative fast-locking scheme [v30]

2023-03-27 Thread Dean Long
On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8291555: Implement alternative fast-locking scheme [v30]

2023-03-27 Thread Dean Long
On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8291555: Implement alternative fast-locking scheme [v30]

2023-03-27 Thread Dean Long
On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8291555: Implement alternative fast-locking scheme [v29]

2023-03-27 Thread Dean Long
On Fri, 24 Mar 2023 06:15:31 GMT, David Holmes  wrote:

>> Roman Kennke has updated the pull request incrementally with two additional 
>> commits since the last revision:
>> 
>>  - Merge remote-tracking branch 'origin/JDK-8291555-v2' into JDK-8291555-v2
>>  - Set condition flags correctly after fast-lock call on aarch64
>
> src/hotspot/share/oops/oop.cpp line 126:
> 
>> 124:   // Outside of a safepoint, the header could be changing (for example,
>> 125:   // another thread could be inflating a lock on this object).
>> 126:   if (ignore_mark_word || UseFastLocking) {
> 
> Not clear why UseFastLocking appears here instead of in the return expression 
> - especially given the comment above.

I have the same question.  Based on the comment, I would expect
`return !SafepointSynchronize::is_at_safepoint() || UseFastLocking`;

-

PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1149966288


Re: RFR: 8291555: Implement alternative fast-locking scheme [v30]

2023-03-27 Thread Dean Long
On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8291555: Implement alternative fast-locking scheme [v30]

2023-03-27 Thread Dean Long
On Mon, 27 Mar 2023 20:24:14 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is grown when needed. This means that we need to check for 
>> potential overflow before attempting locking. When that is the case, locking 
>> fast-paths would call into the runtime to grow the stack and handle the 
>> locking. Compiled fast-paths (C1 and C2 on x86_64 and aarch64) do this check 
>> on method entry to avoid (possibly lots) of such checks at locking sites.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads participate in monitor deflation protocol.

Re: RFR: 8291555: Implement alternative fast-locking scheme [v29]

2023-03-28 Thread Dean Long
On Tue, 28 Mar 2023 10:53:10 GMT, Roman Kennke  wrote:

>> src/hotspot/share/runtime/threads.cpp line 1422:
>> 
>>> 1420: }
>>> 1421: 
>>> 1422: JavaThread* Threads::owning_thread_from_object(ThreadsList * t_list, 
>>> oop obj) {
>> 
>> Is this thread-safe?
>
> My last commit changed that code to only run during safepoints. It should be 
> safe now, and I added an assert that verifies that it is only run at 
> safepoint.

I see the assert in `owning_thread_from_monitor` but not 
`owning_thread_from_object`.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1150900864


Re: RFR: 8291555: Implement alternative fast-locking scheme [v47]

2023-03-31 Thread Dean Long
On Fri, 31 Mar 2023 07:25:48 GMT, Thomas Stuefe  wrote:

>> Roman Kennke has updated the pull request incrementally with one additional 
>> commit since the last revision:
>> 
>>   Use int instead of size_t for cached offsets, to match the uncached offset 
>> type and avoid build failures
>
> src/hotspot/cpu/aarch64/macroAssembler_aarch64.cpp line 6234:
> 
>> 6232:   orr(hdr, hdr, markWord::unlocked_value);
>> 6233:   // Clear lock-bits, into t2
>> 6234:   eor(t2, hdr, markWord::unlocked_value);
> 
> In arm, I use a combination of bic and orr instead. That gives me, with just 
> two instructions, added safety against someone handing in a "11" marked MW. I 
> know, should never happen, but better safe.
> 
> 
>   ldr(new_hdr, Address(obj, oopDesc::mark_offset_in_bytes()));
>   bic(new_hdr, new_hdr, markWord::lock_mask_in_place);  // new header (00)
>   orr(old_hdr, new_hdr, markWord::unlocked_value);  // old header (01)
> 
> (note that I moved MW loading down into MA::fast_lock for unrelated reasons).
> 
> Unfortunately, on aarch64 there seem to be no bic variants that accept 
> immediates. So it would take one more instruction to get the same result:
> 
> 
> -  // Load (object->mark() | 1) into hdr
> -  orr(hdr, hdr, markWord::unlocked_value);
> -  // Clear lock-bits, into t2
> -  eor(t2, hdr, markWord::unlocked_value);
> +  // Prepare new and old header
> +  mov(t2, markWord::lock_mask_in_place);
> +  bic(t2, hdr, t2);
> +  orr(hdr, t2, markWord::unlocked_value);
> 
> 
> But maybe there is a better way that does not need three instructions.

There is a BFC (Bitfield Clear) pseudo-instruction that uses the BFM 
instruction.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1154955795


Re: RFR: 8291555: Implement alternative fast-locking scheme [v56]

2023-04-11 Thread Dean Long
On Tue, 11 Apr 2023 15:29:16 GMT, Daniel D. Daugherty  
wrote:

>> OK. Given that I haven't looked at the rest of the patch, I leave it up to 
>> you and the Reviewers to figure out what to do about this code. Cheers.
>
> Given that the race with new lightweight locking is virtually the same as the 
> race
> with legacy stack locking, please do not put back the 'LockingMode == 2' check
> which would make `jmm_GetThreadInfo()` calls slower with new lightweight 
> locking
> than with legacy stack locking.
> 
> Perhaps I'm not understanding the risk of what @stefank means with:
> 
> It looks to me like the code could read racingly read the element just above 
> _top,
> which could contain a stale oop. If the address of the stale oop matches the
> address of o then contains would incorrectly return true.

The `_base` array is only initialized to nullptr in debug builds.  I don't see 
a release barrier in LockStack::push between the update to _base[] and the 
update to _top, nor a corresponding acquire barrier when reading.  Doesn't this 
mean it is possible to racily read an uninitialized junk oop value from 
_base[], especially on weak memory models?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1163243827


Re: RFR: 8291555: Implement alternative fast-locking scheme [v56]

2023-04-11 Thread Dean Long
On Tue, 11 Apr 2023 19:58:19 GMT, Roman Kennke  wrote:

>> The `_base` array is only initialized to nullptr in debug builds.  I don't 
>> see a release barrier in LockStack::push between the update to _base[] and 
>> the update to _top, nor a corresponding acquire barrier when reading.  
>> Doesn't this mean it is possible to racily read an uninitialized junk oop 
>> value from _base[], especially on weak memory models?
>
> Yes. The whole LockStack is not meant to be accessed cross-thread, pretty 
> much like any thread's stack is not meant to be accessed like that (including 
> current stack-locking). So what can go wrong?
> With the new locking, we could read junk and compare it to the oop that we're 
> testing against and get a wrong result. We're not going to crash though.
> With the current stack-locking, we would fetch the stack-pointer and check if 
> that address is within the foreign thread's stack. Again, because the other 
> thread is not holding still, we might get a wrong result, but we would not 
> crash.
> So I guess we need to answer the question whether or not jmm_GetThreadInfo() 
> is ok with returning wrong result and what could be the consequences of this. 
> For example, how important is it that the info about the thread(s) is correct 
> and consistent (e.g. what happens if we report two threads both holding the 
> same lock?), etc. But I don't consider this to be part of this PR.
> 
> So my proposal is: leave that code as it is, for now (being racy when 
> inspecting foreign threads, but don't crash). Open a new issue to investigate 
> and possibly fix the problem (maybe by safepointing, maybe by handshaking if 
> that is enough, or maybe we find out we don't need to do anything). Add 
> comments in relevant places to point out the problem like you and David 
> suggested earlier. Would that be ok?

That seems fine to me, as long as we don't crash.  But my understanding is that 
Generational ZGC will crash if it sees a stale oop.  Isn't it possible that the 
racing read sees junk that looks to Generational ZGC like a stale oop?  To 
avoid this, unused slots may need to be set to nullptr even in product builds.  
But I'm not a GC expert so maybe there's no problem.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/10907#discussion_r1163306288


Re: RFR: 8291555: Implement alternative fast-locking scheme [v70]

2023-05-02 Thread Dean Long
On Tue, 2 May 2023 18:38:11 GMT, Roman Kennke  wrote:

>> This change adds a fast-locking scheme as an alternative to the current 
>> stack-locking implementation. It retains the advantages of stack-locking 
>> (namely fast locking in uncontended code-paths), while avoiding the overload 
>> of the mark word. That overloading causes massive problems with Lilliput, 
>> because it means we have to check and deal with this situation when trying 
>> to access the mark-word. And because of the very racy nature, this turns out 
>> to be very complex and would involve a variant of the inflation protocol to 
>> ensure that the object header is stable. (The current implementation of 
>> setting/fetching the i-hash provides a glimpse into the complexity).
>> 
>> What the original stack-locking does is basically to push a stack-lock onto 
>> the stack which consists only of the displaced header, and CAS a pointer to 
>> this stack location into the object header (the lowest two header bits being 
>> 00 indicate 'stack-locked'). The pointer into the stack can then be used to 
>> identify which thread currently owns the lock.
>> 
>> This change basically reverses stack-locking: It still CASes the lowest two 
>> header bits to 00 to indicate 'fast-locked' but does *not* overload the 
>> upper bits with a stack-pointer. Instead, it pushes the object-reference to 
>> a thread-local lock-stack. This is a new structure which is basically a 
>> small array of oops that is associated with each thread. Experience shows 
>> that this array typcially remains very small (3-5 elements). Using this lock 
>> stack, it is possible to query which threads own which locks. Most 
>> importantly, the most common question 'does the current thread own me?' is 
>> very quickly answered by doing a quick scan of the array. More complex 
>> queries like 'which thread owns X?' are not performed in very 
>> performance-critical paths (usually in code like JVMTI or deadlock 
>> detection) where it is ok to do more complex operations (and we already do). 
>> The lock-stack is also a new set of GC roots, and would be scanned during 
>> thread scanning, possibly concurrently, via the normal 
 protocols.
>> 
>> The lock-stack is fixed size, currently with 8 elements. According to my 
>> experiments with various workloads, this covers the vast majority of 
>> workloads (in-fact, most workloads seem to never exceed 5 active locks per 
>> thread at a time). We check for overflow in the fast-paths and when the 
>> lock-stack is full, we take the slow-path, which would inflate the lock to a 
>> monitor. That case should be very rare.
>> 
>> In contrast to stack-locking, fast-locking does *not* support recursive 
>> locking (yet). When that happens, the fast-lock gets inflated to a full 
>> monitor. It is not clear if it is worth to add support for recursive 
>> fast-locking.
>> 
>> One trouble is that when a contending thread arrives at a fast-locked 
>> object, it must inflate the fast-lock to a full monitor. Normally, we need 
>> to know the current owning thread, and record that in the monitor, so that 
>> the contending thread can wait for the current owner to properly exit the 
>> monitor. However, fast-locking doesn't have this information. What we do 
>> instead is to record a special marker ANONYMOUS_OWNER. When the thread that 
>> currently holds the lock arrives at monitorexit, and observes 
>> ANONYMOUS_OWNER, it knows it must be itself, fixes the owner to be itself, 
>> and then properly exits the monitor, and thus handing over to the contending 
>> thread.
>> 
>> As an alternative, I considered to remove stack-locking altogether, and only 
>> use heavy monitors. In most workloads this did not show measurable 
>> regressions. However, in a few workloads, I have observed severe 
>> regressions. All of them have been using old synchronized Java collections 
>> (Vector, Stack), StringBuffer or similar code. The combination of two 
>> conditions leads to regressions without stack- or fast-locking: 1. The 
>> workload synchronizes on uncontended locks (e.g. single-threaded use of 
>> Vector or StringBuffer) and 2. The workload churns such locks. IOW, 
>> uncontended use of Vector, StringBuffer, etc as such is ok, but creating 
>> lots of such single-use, single-threaded-locked objects leads to massive 
>> ObjectMonitor churn, which can lead to a significant performance impact. But 
>> alas, such code exists, and we probably don't want to punish it if we can 
>> avoid it.
>> 
>> This change enables to simplify (and speed-up!) a lot of code:
>> 
>> - The inflation protocol is no longer necessary: we can directly CAS the 
>> (tagged) ObjectMonitor pointer to the object header.
>> - Accessing the hashcode could now be done in the fastpath always, if the 
>> hashcode has been installed. Fast-locked headers can be used directly, for 
>> monitor-locked objects we can easily reach-through to the displaced header. 
>> This is safe because Java threads partici

Re: RFR: 8310948: Fix ignored-qualifiers warning in Hotspot

2023-06-28 Thread Dean Long
On Tue, 27 Jun 2023 12:22:43 GMT, Daniel Jeliński  wrote:

> Please review this attempt to fix ignored-qualifiers warning.
> 
> Example warnings:
> 
> src/hotspot/share/oops/method.hpp:413:19: warning: 'volatile' type qualifier 
> on return type has no effect [-Wignored-qualifiers]
>CompiledMethod* volatile code() const;
>^
> 
> 
> src/hotspot/share/jfr/periodic/jfrModuleEvent.cpp:65:20: warning: type 
> qualifiers ignored on cast result type [-Wignored-qualifiers]
> 65 |   event.set_source((const ModuleEntry* const)from_module);
>|^
> 
> 
> The proposed fix removes the ignored qualifiers.
> In a few AD files I replaced `const` with `constexpr` where I noticed that 
> the method is returning a compile-time constant, and other platforms use 
> `constexpr` on the same method.
> 
> Release, debug and slowdebug builds on Aarch64 / x64 and Mac / Linux complete 
> without errors. Cross-compile GHA builds also pass.

In the example, was the intention perhaps `volatile CompiledMethod*` instead of 
`CompiledMethod* volatile`?

-

PR Comment: https://git.openjdk.org/jdk/pull/14674#issuecomment-1612090389


Re: RFR: 8291237: Encapsulate nmethod Deoptimization logic [v2]

2022-08-02 Thread Dean Long
On Mon, 1 Aug 2022 04:58:49 GMT, Axel Boldt-Christmas  wrote:

>> The proposal is to encapsulate the nmethod mark for deoptimization logic in 
>> one place and only allow access to the `mark_for_deoptimization` from a 
>> closure object:
>> ```C++
>> class DeoptimizationMarkerClosure : StackObj {
>> public:
>>   virtual void marker_do(Deoptimization::MarkFn mark_fn) = 0;
>> };
>> 
>> This closure takes a `MarkFn` which it uses to mark which nmethods should be 
>> deoptimized. This marking can only be done through the `MarkFn` and a 
>> `MarkFn` can only be created in the following code which runs the closure. 
>> ```C++
>> {
>>   NoSafepointVerifier nsv;
>>   assert_locked_or_safepoint(Compile_lock);
>>   marker_closure.marker_do(MarkFn());
>>   anything_deoptimized = deoptimize_all_marked();
>> }
>> if (anything_deoptimized) {
>>   run_deoptimize_closure();
>> }
>> 
>> This ensures that this logic is encapsulated and the `NoSafepointVerifier` 
>> and `assert_locked_or_safepoint(Compile_lock)` makes `deoptimize_all_marked` 
>> not having to scan the whole code cache sound.
>> 
>> The exception to this pattern, from `InstanceKlass::unload_class`, is 
>> discussed in the JBS issue, and gives reasons why not marking for 
>> deoptimization there is ok.
>> 
>> An effect of this encapsulation is that the deoptimization logic was moved 
>> from the `CodeCache` class to the `Deoptimization` class and the class 
>> redefinition logic was moved from the `CodeCache` class to the 
>> `VM_RedefineClasses` class/operation. 
>> 
>> Testing: Tier 1-5
>> 
>> _Update_
>> ---
>> Switched too using a RAII object to track the context instead of putting 
>> code in a closure. But all the encapsulation is still the same. 
>> 
>> Testing: Tier 1-7
>> 
>> _Update_
>> ---
>>> @stefank suggested splitting out unloading klass logic change into a 
>>> separate issue [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718).
>>> 
>>> Will probably also limit this PR to only encapsulation. (Skipping the 
>>> linked list optimisation) And create a separate issue for that as well.
>>> 
>>> But this creates a chain of three dependent issues. 
>>> [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237) depends on 
>>> [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718). And the link 
>>> list optimisation depend will depend on 
>>> [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237).
>>> 
>>> Will mark this as a draft for now and create a PR for 
>>> [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718) first.
>
> Axel Boldt-Christmas has updated the pull request incrementally with three 
> additional commits since the last revision:
> 
>  - Add assertions
>  - Fix marked logic
>  - Erik refactorings

Now that you are building a separate list of nmethods to deoptimize, what 
protects that list from nmethods being recycled by other threads?  Don't you 
need to hold the CodeCache_lock?

-

PR: https://git.openjdk.org/jdk/pull/9655


Re: RFR: 8291237: Encapsulate nmethod Deoptimization logic [v2]

2022-08-03 Thread Dean Long
On Mon, 1 Aug 2022 04:58:49 GMT, Axel Boldt-Christmas  wrote:

>> The proposal is to encapsulate the nmethod mark for deoptimization logic in 
>> one place and only allow access to the `mark_for_deoptimization` from a 
>> closure object:
>> ```C++
>> class DeoptimizationMarkerClosure : StackObj {
>> public:
>>   virtual void marker_do(Deoptimization::MarkFn mark_fn) = 0;
>> };
>> 
>> This closure takes a `MarkFn` which it uses to mark which nmethods should be 
>> deoptimized. This marking can only be done through the `MarkFn` and a 
>> `MarkFn` can only be created in the following code which runs the closure. 
>> ```C++
>> {
>>   NoSafepointVerifier nsv;
>>   assert_locked_or_safepoint(Compile_lock);
>>   marker_closure.marker_do(MarkFn());
>>   anything_deoptimized = deoptimize_all_marked();
>> }
>> if (anything_deoptimized) {
>>   run_deoptimize_closure();
>> }
>> 
>> This ensures that this logic is encapsulated and the `NoSafepointVerifier` 
>> and `assert_locked_or_safepoint(Compile_lock)` makes `deoptimize_all_marked` 
>> not having to scan the whole code cache sound.
>> 
>> The exception to this pattern, from `InstanceKlass::unload_class`, is 
>> discussed in the JBS issue, and gives reasons why not marking for 
>> deoptimization there is ok.
>> 
>> An effect of this encapsulation is that the deoptimization logic was moved 
>> from the `CodeCache` class to the `Deoptimization` class and the class 
>> redefinition logic was moved from the `CodeCache` class to the 
>> `VM_RedefineClasses` class/operation. 
>> 
>> Testing: Tier 1-5
>> 
>> _Update_
>> ---
>> Switched too using a RAII object to track the context instead of putting 
>> code in a closure. But all the encapsulation is still the same. 
>> 
>> Testing: Tier 1-7
>> 
>> _Update_
>> ---
>>> @stefank suggested splitting out unloading klass logic change into a 
>>> separate issue [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718).
>>> 
>>> Will probably also limit this PR to only encapsulation. (Skipping the 
>>> linked list optimisation) And create a separate issue for that as well.
>>> 
>>> But this creates a chain of three dependent issues. 
>>> [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237) depends on 
>>> [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718). And the link 
>>> list optimisation depend will depend on 
>>> [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237).
>>> 
>>> Will mark this as a draft for now and create a PR for 
>>> [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718) first.
>
> Axel Boldt-Christmas has updated the pull request incrementally with three 
> additional commits since the last revision:
> 
>  - Add assertions
>  - Fix marked logic
>  - Erik refactorings

It's nmethod::flush() I'm worried about.  It uses CodeCache_lock, not 
Compile_lock.  It calls CodeCache::free() to destroy the nmethod.

-

PR: https://git.openjdk.org/jdk/pull/9655


Re: RFR: 8291237: Encapsulate nmethod Deoptimization logic [v2]

2022-08-04 Thread Dean Long
On Wed, 3 Aug 2022 20:42:59 GMT, Erik Österlund  wrote:

>> Axel Boldt-Christmas has updated the pull request incrementally with three 
>> additional commits since the last revision:
>> 
>>  - Add assertions
>>  - Fix marked logic
>>  - Erik refactorings
>
> Also, in DeoptimizationContext::deopt_compiled_methods, the SweeperBlocker 
> completely blocks out the sweeper from running. But as I mentioned, even 
> without that, without safepoint checks, we can never flush these things.
> It's worth mentioning that there used to be a special case for OSR nmethods 
> that they could be flushed immediately and skip the zombie step. But I 
> removed that a few tears ago so we wouldn't have to think about that 
> pathological case separately.

@fisk, is there any chance that in the future we might figure out how to allow 
nmethods to be flushed without a safepoint?  Maybe with handshakes or some 
other clever improvement?  If no, then maybe a comment saying so would help.  
If yes, then are there some asserts we could add that would catch such a change?

-

PR: https://git.openjdk.org/jdk/pull/9655


Re: RFR: 8291237: Encapsulate nmethod Deoptimization logic [v2]

2022-08-08 Thread Dean Long
On Mon, 1 Aug 2022 04:58:49 GMT, Axel Boldt-Christmas  wrote:

>> The proposal is to encapsulate the nmethod mark for deoptimization logic in 
>> one place and only allow access to the `mark_for_deoptimization` from a 
>> closure object:
>> ```C++
>> class DeoptimizationMarkerClosure : StackObj {
>> public:
>>   virtual void marker_do(Deoptimization::MarkFn mark_fn) = 0;
>> };
>> 
>> This closure takes a `MarkFn` which it uses to mark which nmethods should be 
>> deoptimized. This marking can only be done through the `MarkFn` and a 
>> `MarkFn` can only be created in the following code which runs the closure. 
>> ```C++
>> {
>>   NoSafepointVerifier nsv;
>>   assert_locked_or_safepoint(Compile_lock);
>>   marker_closure.marker_do(MarkFn());
>>   anything_deoptimized = deoptimize_all_marked();
>> }
>> if (anything_deoptimized) {
>>   run_deoptimize_closure();
>> }
>> 
>> This ensures that this logic is encapsulated and the `NoSafepointVerifier` 
>> and `assert_locked_or_safepoint(Compile_lock)` makes `deoptimize_all_marked` 
>> not having to scan the whole code cache sound.
>> 
>> The exception to this pattern, from `InstanceKlass::unload_class`, is 
>> discussed in the JBS issue, and gives reasons why not marking for 
>> deoptimization there is ok.
>> 
>> An effect of this encapsulation is that the deoptimization logic was moved 
>> from the `CodeCache` class to the `Deoptimization` class and the class 
>> redefinition logic was moved from the `CodeCache` class to the 
>> `VM_RedefineClasses` class/operation. 
>> 
>> Testing: Tier 1-5
>> 
>> _Update_
>> ---
>> Switched too using a RAII object to track the context instead of putting 
>> code in a closure. But all the encapsulation is still the same. 
>> 
>> Testing: Tier 1-7
>> 
>> _Update_
>> ---
>>> @stefank suggested splitting out unloading klass logic change into a 
>>> separate issue [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718).
>>> 
>>> Will probably also limit this PR to only encapsulation. (Skipping the 
>>> linked list optimisation) And create a separate issue for that as well.
>>> 
>>> But this creates a chain of three dependent issues. 
>>> [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237) depends on 
>>> [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718). And the link 
>>> list optimisation depend will depend on 
>>> [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237).
>>> 
>>> Will mark this as a draft for now and create a PR for 
>>> [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718) first.
>
> Axel Boldt-Christmas has updated the pull request incrementally with three 
> additional commits since the last revision:
> 
>  - Add assertions
>  - Fix marked logic
>  - Erik refactorings

If the nmethod shouldn't be on the marked list at certain state transitions, 
how about adding asserts that check this?  Using the same "self looped" trick 
that the Sweeper removal PR uses might help with that.

-

PR: https://git.openjdk.org/jdk/pull/9655


Re: RFR: 8291237: Encapsulate nmethod Deoptimization logic [v6]

2022-08-18 Thread Dean Long
On Thu, 18 Aug 2022 08:18:27 GMT, Axel Boldt-Christmas  
wrote:

>> The proposal is to encapsulate the nmethod mark for deoptimization logic in 
>> one place and only allow access to the `mark_for_deoptimization` from a 
>> closure object:
>> ```C++
>> class DeoptimizationMarkerClosure : StackObj {
>> public:
>>   virtual void marker_do(Deoptimization::MarkFn mark_fn) = 0;
>> };
>> 
>> This closure takes a `MarkFn` which it uses to mark which nmethods should be 
>> deoptimized. This marking can only be done through the `MarkFn` and a 
>> `MarkFn` can only be created in the following code which runs the closure. 
>> ```C++
>> {
>>   NoSafepointVerifier nsv;
>>   assert_locked_or_safepoint(Compile_lock);
>>   marker_closure.marker_do(MarkFn());
>>   anything_deoptimized = deoptimize_all_marked();
>> }
>> if (anything_deoptimized) {
>>   run_deoptimize_closure();
>> }
>> 
>> This ensures that this logic is encapsulated and the `NoSafepointVerifier` 
>> and `assert_locked_or_safepoint(Compile_lock)` makes `deoptimize_all_marked` 
>> not having to scan the whole code cache sound.
>> 
>> The exception to this pattern, from `InstanceKlass::unload_class`, is 
>> discussed in the JBS issue, and gives reasons why not marking for 
>> deoptimization there is ok.
>> 
>> An effect of this encapsulation is that the deoptimization logic was moved 
>> from the `CodeCache` class to the `Deoptimization` class and the class 
>> redefinition logic was moved from the `CodeCache` class to the 
>> `VM_RedefineClasses` class/operation. 
>> 
>> Testing: Tier 1-5
>> 
>> _Update_
>> ---
>> Switched too using a RAII object to track the context instead of putting 
>> code in a closure. But all the encapsulation is still the same. 
>> 
>> Testing: Tier 1-7
>> 
>> _Update_
>> ---
>>> @stefank suggested splitting out unloading klass logic change into a 
>>> separate issue [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718).
>>> 
>>> Will probably also limit this PR to only encapsulation. (Skipping the 
>>> linked list optimisation) And create a separate issue for that as well.
>>> 
>>> But this creates a chain of three dependent issues. 
>>> [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237) depends on 
>>> [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718). And the link 
>>> list optimisation depend will depend on 
>>> [JDK-8291237](https://bugs.openjdk.org/browse/JDK-8291237).
>>> 
>>> Will mark this as a draft for now and create a PR for 
>>> [JDK-8291718](https://bugs.openjdk.org/browse/JDK-8291718) first.
>
> Axel Boldt-Christmas has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Cleanup comment

src/hotspot/share/code/compiledMethod.cpp line 128:

> 126:   if (old_status < new_status) {
> 127: if (old_status == not_enqueued) {
> 128:   
> assert(extract_enqueued_deoptimization_method(_enqueued_deoptimization_link) 
> == nullptr, "Compiled Method should not already be linked");

This doesn't work for the tail of the list, does it?  That's why I suggested 
making the tail loop back to itself.

-

PR: https://git.openjdk.org/jdk/pull/9655


Re: RFR: 8291237: Encapsulate nmethod Deoptimization logic [v6]

2022-08-18 Thread Dean Long
On Thu, 18 Aug 2022 09:32:54 GMT, Axel Boldt-Christmas  
wrote:

>> src/hotspot/share/code/compiledMethod.cpp line 128:
>> 
>>> 126:   if (old_status < new_status) {
>>> 127: if (old_status == not_enqueued) {
>>> 128:   
>>> assert(extract_enqueued_deoptimization_method(_enqueued_deoptimization_link)
>>>  == nullptr, "Compiled Method should not already be linked");
>> 
>> This doesn't work for the tail of the list, does it?  That's why I suggested 
>> making the tail loop back to itself
>> 
>> I would also like to see similar asserts added to make sure we don't recycle 
>> an nmethod that is still on the list, perhaps in nmethod::flush().
>
> I think you have to describe the scenario that does not work, because I am 
> not sure I see it.
> 
> For ease of writing, let me call the currently embedded status `status` and 
> the currently embedded link `next_link` 
> (`=>` means implies here)
> 
> The assert checks this invariant: `status == not_enqueued => next_link == 
> nullptr`
> 
> After adding the first method (which will be the tail) to the list (`root == 
> nullptr`) we will have:
>  * `status > not_enqueued`
>  * `next_link == nullptr`
>  
> After adding any method after that ( `root != nullptr`) we will have:
>  * `status > not_enqueued`
>  * `next_link == previous_root`
>  
>  And there is also a fresh method
>  * `status == not_enqueued`
>  * `next_link == nullptr`
> 
> If we try to enqueue an already enqueued method (calling 
> `enqueue_deoptimization` twice) the method will have  `status != 
> not_enqueued` and will set `next_link == next_link` if `new_status > status`. 
> (So just update the status to a stronger variant, but not change the link)
> 
> All of these possibilities upholds the invariant. 
> 
> Maybe you are looking for some more invariant checks, like
> `status > not_enqueued => InList(method)` which can be asserted by setting 
> `next_link == this` for the tail (when  `root == nullptr`) and assert that 
> `next_link != nullptr` if `status > not_enqueued`. But it just seems like we 
> are adding another flag for something we already check, essentially 
> `next_link != nullptr` iff `status > not_enqueued`.
> 
> There is currently an assert when we iterate over the list; that the number 
> of methods we find in the list is equal to number of methods we enqueued.

Yes, something like (status == not_enqueued) == !InList(method), which is 
strong than the current =>, and would be a sanity check on the status.

> There is currently an assert when we iterate over the list; that the number 
> of methods we find in the list is equal to number of methods we enqueued.

That wouldn't necessarily catch accessing stale nmethod memory that has been 
released by CodeCache::free(), would it?

-

PR: https://git.openjdk.org/jdk/pull/9655


Re: RFR: 8292758: put support for UNSIGNED5 format into its own header file [v2]

2022-09-01 Thread Dean Long
On Fri, 2 Sep 2022 00:04:17 GMT, John R Rose  wrote:

>> Refactor code from inside of CompressedStream into its own unit.
>> 
>> This code is likely to be used in future refactorings, such as JDK-8292818 
>> (replace 96-bit representation for field metadata with variable-sized 
>> streams).
>> 
>> Add gtests.
>
> John R Rose has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains two additional 
> commits since the last revision:
> 
>  - Merge branch 'master' of https://git.openjdk.org/jdk into compressed-stream
>  - 8292758: put support for UNSIGNED5 format into its own header file

src/hotspot/share/code/compressedStream.cpp line 85:

> 83:   const int min_expansion = UNSIGNED5::MAX_LENGTH;
> 84:   if (nsize < min_expansion*2)
> 85: nsize = min_expansion*2;

It's not clear if this is needed or just an optimization.  Maybe add a comment. 
 Also, using MIN2 might be clearer.

-

PR: https://git.openjdk.org/jdk/pull/10067


Re: RFR: 8292758: put support for UNSIGNED5 format into its own header file [v2]

2022-09-02 Thread Dean Long
On Fri, 2 Sep 2022 00:04:17 GMT, John R Rose  wrote:

>> Refactor code from inside of CompressedStream into its own unit.
>> 
>> This code is likely to be used in future refactorings, such as JDK-8292818 
>> (replace 96-bit representation for field metadata with variable-sized 
>> streams).
>> 
>> Add gtests.
>
> John R Rose has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains two additional 
> commits since the last revision:
> 
>  - Merge branch 'master' of https://git.openjdk.org/jdk into compressed-stream
>  - 8292758: put support for UNSIGNED5 format into its own header file

I'm concerned about the "excluded bytes" change.  It appears to change the 
encoding, which would mean that SA would not be able to read streams from 
earlier JVMs, which may or may not be a concern.  I suggest splitting this up 
into the straight-forward refactor (without the excluded bytes change), then 
adding excluded bytes as a follow-up.

-

Changes requested by dlong (Reviewer).

PR: https://git.openjdk.org/jdk/pull/10067


Re: RFR: 8292758: put support for UNSIGNED5 format into its own header file [v2]

2022-09-03 Thread Dean Long
On Fri, 2 Sep 2022 00:04:17 GMT, John R Rose  wrote:

>> Refactor code from inside of CompressedStream into its own unit.
>> 
>> This code is likely to be used in future refactorings, such as JDK-8292818 
>> (replace 96-bit representation for field metadata with variable-sized 
>> streams).
>> 
>> Add gtests.
>
> John R Rose has updated the pull request with a new target base due to a 
> merge or a rebase. The incremental webrev excludes the unrelated changes 
> brought in by the merge/rebase. The pull request contains two additional 
> commits since the last revision:
> 
>  - Merge branch 'master' of https://git.openjdk.org/jdk into compressed-stream
>  - 8292758: put support for UNSIGNED5 format into its own header file

Wouldn't the SA stack walk fail when attaching to a core dump from an earlier 
JVM that does not exclude nulls?

-

PR: https://git.openjdk.org/jdk/pull/10067


Re: RFR: 8292758: put support for UNSIGNED5 format into its own header file [v5]

2022-09-07 Thread Dean Long
On Wed, 7 Sep 2022 01:05:38 GMT, John R Rose  wrote:

>> Refactor code from inside of CompressedStream into its own unit.
>> 
>> This code is likely to be used in future refactorings, such as JDK-8292818 
>> (replace 96-bit representation for field metadata with variable-sized 
>> streams).
>> 
>> Add gtests.
>
> John R Rose has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   add missing "this->"

Marked as reviewed by dlong (Reviewer).

-

PR: https://git.openjdk.org/jdk/pull/10067


Re: RFR: 8313785: Fix -Wconversion warnings in prims code

2023-08-04 Thread Dean Long
On Fri, 4 Aug 2023 14:37:07 GMT, Coleen Phillimore  wrote:

> This patch fixes Wconversion in code in the src/hotspot/share/prims 
> directory.  Most of the changes correct the types.  jfieldID's are created 
> with the int offset returned by InstanceKlass::field_offset().
>  int field_offset  (int index) const { return 
> field(index).offset(); }
> 
> So when we get the field offset back, it's an int.
> 
> Also stackwalker passes jlong, so made that consistent.
> 
> Tested with tier1 on Oracle supported platforms.

src/hotspot/share/prims/jni.cpp line 209:

> 207: 
> 208: 
> 209: intptr_t jfieldIDWorkaround::encode_klass_hash(Klass* k, intptr_t 
> offset) {

The caller always passes an int, so why not change the parameter to int?
Suggestion:

intptr_t jfieldIDWorkaround::encode_klass_hash(Klass* k, int offset) {

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15160#discussion_r1284851860


Re: RFR: 8313785: Fix -Wconversion warnings in prims code

2023-08-04 Thread Dean Long
On Fri, 4 Aug 2023 14:37:07 GMT, Coleen Phillimore  wrote:

> This patch fixes Wconversion in code in the src/hotspot/share/prims 
> directory.  Most of the changes correct the types.  jfieldID's are created 
> with the int offset returned by InstanceKlass::field_offset().
>  int field_offset  (int index) const { return 
> field(index).offset(); }
> 
> So when we get the field offset back, it's an int.
> 
> Also stackwalker passes jlong, so made that consistent.
> 
> Tested with tier1 on Oracle supported platforms.

src/hotspot/share/prims/jni.cpp line 1906:

> 1904: o = JvmtiExport::jni_SetField_probe(thread, obj, o, k, fieldID, 
> false, SigType, (jvalue *)&field_value); \
> 1905:   } \
> 1906:   if (SigType == JVM_SIGNATURE_BOOLEAN) { value = 
> (Argument)(((jboolean)value) & 1); } \

This is already done in bool_field_put(), so it is redundant here.  I think it 
can be safely removed.

src/hotspot/share/prims/jni.cpp line 2099:

> 2097: JvmtiExport::jni_SetField_probe(thread, nullptr, nullptr, 
> id->holder(), fieldID, true, SigType, (jvalue *)&field_value); \
> 2098:   } \
> 2099:   if (SigType == JVM_SIGNATURE_BOOLEAN) { value = 
> (Argument)(((jboolean)value) & 1); } \

Same as above.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15160#discussion_r1284873690
PR Review Comment: https://git.openjdk.org/jdk/pull/15160#discussion_r1284873793


Re: RFR: 8313785: Fix -Wconversion warnings in prims code

2023-08-04 Thread Dean Long
On Fri, 4 Aug 2023 14:37:07 GMT, Coleen Phillimore  wrote:

> This patch fixes Wconversion in code in the src/hotspot/share/prims 
> directory.  Most of the changes correct the types.  jfieldID's are created 
> with the int offset returned by InstanceKlass::field_offset().
>  int field_offset  (int index) const { return 
> field(index).offset(); }
> 
> So when we get the field offset back, it's an int.
> 
> Also stackwalker passes jlong, so made that consistent.
> 
> Tested with tier1 on Oracle supported platforms.

src/hotspot/share/prims/jvm.cpp line 612:

> 610:   // as implemented in the classic virtual machine; return 0 if object 
> is null
> 611:   return handle == nullptr ? 0 :
> 612:  checked_cast(ObjectSynchronizer::FastHashCode (THREAD, 
> JNIHandles::resolve_non_null(handle)));

What about making FastHashCode return jint?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15160#discussion_r1284876719


Re: RFR: 8313785: Fix -Wconversion warnings in prims code

2023-08-04 Thread Dean Long
On Fri, 4 Aug 2023 14:37:07 GMT, Coleen Phillimore  wrote:

> This patch fixes Wconversion in code in the src/hotspot/share/prims 
> directory.  Most of the changes correct the types.  jfieldID's are created 
> with the int offset returned by InstanceKlass::field_offset().
>  int field_offset  (int index) const { return 
> field(index).offset(); }
> 
> So when we get the field offset back, it's an int.
> 
> Also stackwalker passes jlong, so made that consistent.
> 
> Tested with tier1 on Oracle supported platforms.

src/hotspot/share/prims/methodHandles.cpp line 910:

> 908:   InstanceKlass* defc = 
> InstanceKlass::cast(java_lang_Class::as_Klass(clazz));
> 909:   DEBUG_ONLY(clazz = nullptr);  // safety
> 910:   intptr_t vmindex  = java_lang_invoke_MemberName::vmindex(mname());

Why not do the checked_cast here instead of below?
Ideally, the vmindex field would be changed to int (field offsets, 
itable/vtable indexes are all int), but that's more work.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15160#discussion_r1284883812


Re: RFR: 8313785: Fix -Wconversion warnings in prims code

2023-08-04 Thread Dean Long
On Fri, 4 Aug 2023 14:37:07 GMT, Coleen Phillimore  wrote:

> This patch fixes Wconversion in code in the src/hotspot/share/prims 
> directory.  Most of the changes correct the types.  jfieldID's are created 
> with the int offset returned by InstanceKlass::field_offset().
>  int field_offset  (int index) const { return 
> field(index).offset(); }
> 
> So when we get the field offset back, it's an int.
> 
> Also stackwalker passes jlong, so made that consistent.
> 
> Tested with tier1 on Oracle supported platforms.

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/15160#pullrequestreview-1563625051


Re: RFR: 8313785: Fix -Wconversion warnings in prims code [v2]

2023-08-07 Thread Dean Long
On Fri, 4 Aug 2023 23:24:45 GMT, Coleen Phillimore  wrote:

> Why not? It's the same and casts the int where it's needed to be an int.
To me the checked_cast looks less ugly in the initialization vs at the uses 
sites, but in this case with only one use it doesn't really matter.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/15160#discussion_r1286536069


Re: RFR: 8318895: Deoptimization results in incorrect lightweight locking stack

2023-11-08 Thread Dean Long
On Wed, 8 Nov 2023 19:00:53 GMT, Roman Kennke  wrote:

> See JBS issue for details.
> 
> I basically:
>  - took the test-modification and turned it into its own test-case
>  - added test runners for lightweight- and legacy-locking, so that we keep 
> testing both, no matter what is the default
>  - added Axels fix (mentioned in the JBS issue) with the modification to only 
> inflate when exec_mode == Unpack_none, as explained by Richard.
> 
> Testing:
>  - [x] EATests.java
>  - [x] tier1
>  - [ ] tier2

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/16568#pullrequestreview-1721539402


Re: RFR: 8314029: Add file name parameter to Compiler.perfmap [v8]

2023-12-13 Thread Dean Long
On Mon, 11 Dec 2023 22:41:56 GMT, Yi-Fan Tsai  wrote:

>> `jcmd Compiler.perfmap` uses the hard-coded file name for a perf map: 
>> `/tmp/perf-%d.map`. This change adds an optional argument for specifying a 
>> file name.
>> 
>> `jcmd PID help Compiler.perfmap` shows the following usage.
>> 
>> 
>> Compiler.perfmap
>> Write map file for Linux perf tool.
>> 
>> Impact: Low
>> 
>> Syntax : Compiler.perfmap  []
>> 
>> Arguments:
>> filename : [optional] Name of the map file (STRING, no default value)
>> 
>> 
>> The following section of man page is also updated. (`man -l 
>> src/jdk.jcmd/share/man/jcmd.1`)
>> 
>> 
>>Compiler.perfmap [arguments] (Linux only)
>>   Write map file for Linux perf tool.
>> 
>>   Impact: Low
>> 
>>   arguments:
>> 
>>   · filename: (Optional) Name of the map file (STRING, no 
>> default value)
>> 
>>   If filename is not specified, a default file name is chosen 
>> using the pid of the target JVM process.  For example, if the pid is 12345,  
>> then
>>   the default filename will be /tmp/perf-12345.map.
>
> Yi-Fan Tsai has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Update copyright of PerfMapTest

The man page says "no default value" but then right below describes the default 
value, which is confusing.  I would remove "no default value".
The code already deals with patterns, so why not allow a pattern like 
/dir/perf-%x.map and document that the platform-specific process id will be 
passed to String.format() to expand any formatting tokens in the string?

-

PR Comment: https://git.openjdk.org/jdk/pull/15871#issuecomment-1854683037


Re: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2]

2024-01-22 Thread Dean Long
On Mon, 22 Jan 2024 21:26:41 GMT, Volker Simonis  wrote:

>> Currently we don't record dependencies on redefined methods (i.e. 
>> `evol_method` dependencies) in JIT compiled methods if none of the 
>> `can_redefine_classes`, `can_retransform_classes` or 
>> `can_generate_breakpoint_events` JVMTI capabalities is set. This means that 
>> if a JVMTI agent which requests one of these capabilities is dynamically 
>> attached, all the methods which have been JIT compiled until that point, 
>> will be marked for deoptimization and flushed from the code cache. For 
>> large, warmed-up applications this mean deoptimization and instant 
>> recompilation of thousands if not then-thousands of methods, which can lead 
>> to dramatic performance/latency drop-downs for several minutes.
>> 
>> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 
>> 451: Prepare to Disallow the Dynamic Loading of 
>> Agents](https://openjdk.org/jeps/451)) and this problem could be solved by 
>> making the recording of `evol_method` dependencies dependent on the new 
>> `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI 
>> capabilities (because the presence of the flag indicates that an agent will 
>> be loaded eventually).
>> 
>> But there a single, however important exception to this rule and that's JFR. 
>> JFR is advertised as low overhead profiler which can be enabled in 
>> production at any time. However, when JFR is started dynamically (e.g. 
>> through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent 
>> which requests the `can_retransform_classes` and retransforms some classes. 
>> This will inevitably trigger the deoptimization of all compiled methods as 
>> described above.
>> 
>> I'd therefor like to propose to *always* and unconditionally record 
>> `evol_method` dependencies in JIT compiled code by exporting the relevant 
>> properties right at startup in `init_globals()`:
>> ```c++
>>  jint init_globals() {
>>management_init();
>>JvmtiExport::initialize_oop_storage();
>> +#if INCLUDE_JVMTI
>> +  JvmtiExport::set_can_hotswap_or_post_breakpoint(true);
>> +  JvmtiExport::set_all_dependencies_are_recorded(true);
>> +#endif
>> 
>> 
>> My measurements indicate that the overhead of doing so is minimal (around 1% 
>> increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic 
>> application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions 
>> -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 
>> with C2) resulting in an aggregated nmethod size of around ~40bm. 
>> Additionally recording `evol_method` dependencies only increases this size 
>> be about 400kb
>
> Volker Simonis has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Guard the feature with a diagnostic option and update the comments in the 
> code

src/hotspot/share/runtime/init.cpp line 121:

> 119:   if (AlwaysRecordEvolDependencies) {
> 120: JvmtiExport::set_can_hotswap_or_post_breakpoint(true);
> 121: JvmtiExport::set_all_dependencies_are_recorded(true);

I think the check for AlwaysRecordEvolDependencies needs to be moved into 
set_can_hotswap_or_post_breakpoint and set_all_dependencies_are_recorded, 
otherwise don't we risk the value being accidentally reset to false when 
set_can_hotswap_or_post_breakpoint() is called again by 
JvmtiManageCapabilities::update()?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1462685351


Re: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v2]

2024-01-23 Thread Dean Long
On Tue, 23 Jan 2024 16:37:35 GMT, Volker Simonis  wrote:

>> src/hotspot/share/runtime/init.cpp line 121:
>> 
>>> 119:   if (AlwaysRecordEvolDependencies) {
>>> 120: JvmtiExport::set_can_hotswap_or_post_breakpoint(true);
>>> 121: JvmtiExport::set_all_dependencies_are_recorded(true);
>> 
>> I think the check for AlwaysRecordEvolDependencies needs to be moved into 
>> set_can_hotswap_or_post_breakpoint and set_all_dependencies_are_recorded, 
>> otherwise don't we risk the value being accidentally reset to false when 
>> set_can_hotswap_or_post_breakpoint() is called again by 
>> JvmtiManageCapabilities::update()?
>
> A good question, but after deep digging (it took me quite some time to figure 
> this out myself :) I don't think that 
> `can_hotswap_or_post_breakpoint`/`all_dependencies_are_recorded` can ever be 
> reset. Here's how it works:
> 
> - there's a global set of `always_capabilities` which is initialized when the 
> first JVMTI environment is created:
> 
> // capabilities which are always potentially available
> jvmtiCapabilities JvmtiManageCapabilities::always_capabilities;
> 
> void JvmtiManageCapabilities::initialize() {
>   _capabilities_lock = new Mutex(Mutex::nosafepoint, "Capabilities_lock");
>   always_capabilities = init_always_capabilities();
> 
> 
> - as the name implies, this set of capabilities contains all generally 
> available capabilities (except for the `onload` and `solo` capabilites, which 
> are maintained in separate sets).
> - `JvmtiManageCapabilities::update()` only operates on this global sets of 
> capabilites (and *not* on the concrete capabilities of a specific JVMTI 
> environment):
> 
> void JvmtiManageCapabilities::update() {
>   jvmtiCapabilities avail;
>   // all capabilities
>   either(&always_capabilities, &always_solo_capabilities, &avail);
> ...
>   // If can_redefine_classes is enabled in the onload phase then we know that 
> the
>   // dependency information recorded by the compiler is complete.
>   if ((avail.can_redefine_classes || avail.can_retransform_classes) &&
>   JvmtiEnv::get_phase() == JVMTI_PHASE_ONLOAD) {
> JvmtiExport::set_all_dependencies_are_recorded(true);
>   }
> ...
>   JvmtiExport::set_can_hotswap_or_post_breakpoint(
> avail.can_generate_breakpoint_events ||
> avail.can_redefine_classes ||
> avail.can_retransform_classes);
> 
> - This means that `JvmtiManageCapabilities::update()` is always conservative 
> regarding its exports to `JvmtiExport` as it always exports *all* potentially 
> available capabilites once a JVMTI environment is created and not only the 
> specific capabilities requested by concrete JVMTI environment. This means 
> that even if we create a JVMTI agent with an empty set of capabilities, 
> `can_hotswap_or_post_breakpoint`/`all_dependencies_are_recorded` will still 
> be set in `JvmtiExport`.
> - There's no code path (at least I couldn't find one) which takes 
> capabilities away from the `always_capabilities` set. Only if an environment 
> requests `on_load` capabilities, they will be transferred from the global 
> `onload_capabilities` to the `always_capabilities` set:
> 
> jvmtiError JvmtiManageCapabilities::add_capabilities(const jvmtiCapabilities 
> *current,
>  ...

An assert works for me.  Thanks.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17509#discussion_r1464120039


Re: RFR: 8324241: Always record evol_method deps to avoid excessive method flushing [v4]

2024-01-25 Thread Dean Long
On Wed, 24 Jan 2024 14:48:52 GMT, Volker Simonis  wrote:

>> Currently we don't record dependencies on redefined methods (i.e. 
>> `evol_method` dependencies) in JIT compiled methods if none of the 
>> `can_redefine_classes`, `can_retransform_classes` or 
>> `can_generate_breakpoint_events` JVMTI capabalities is set. This means that 
>> if a JVMTI agent which requests one of these capabilities is dynamically 
>> attached, all the methods which have been JIT compiled until that point, 
>> will be marked for deoptimization and flushed from the code cache. For 
>> large, warmed-up applications this mean deoptimization and instant 
>> recompilation of thousands if not then-thousands of methods, which can lead 
>> to dramatic performance/latency drop-downs for several minutes.
>> 
>> One could argue that dynamic agent attach is now deprecated anyway (see [JEP 
>> 451: Prepare to Disallow the Dynamic Loading of 
>> Agents](https://openjdk.org/jeps/451)) and this problem could be solved by 
>> making the recording of `evol_method` dependencies dependent on the new 
>> `-XX:+EnableDynamicAgentLoading` flag isntead of the concrete JVMTI 
>> capabilities (because the presence of the flag indicates that an agent will 
>> be loaded eventually).
>> 
>> But there a single, however important exception to this rule and that's JFR. 
>> JFR is advertised as low overhead profiler which can be enabled in 
>> production at any time. However, when JFR is started dynamically (e.g. 
>> through JCMD or JMX) it will silently load a HotSpot internl JVMTI agent 
>> which requests the `can_retransform_classes` and retransforms some classes. 
>> This will inevitably trigger the deoptimization of all compiled methods as 
>> described above.
>> 
>> I'd therefor like to propose to *always* and unconditionally record 
>> `evol_method` dependencies in JIT compiled code by exporting the relevant 
>> properties right at startup in `init_globals()`:
>> ```c++
>>  jint init_globals() {
>>management_init();
>>JvmtiExport::initialize_oop_storage();
>> +#if INCLUDE_JVMTI
>> +  JvmtiExport::set_can_hotswap_or_post_breakpoint(true);
>> +  JvmtiExport::set_all_dependencies_are_recorded(true);
>> +#endif
>> 
>> 
>> My measurements indicate that the overhead of doing so is minimal (around 1% 
>> increase of nmethod size) and justifies the benefit. E.g. a Spring Petclinic 
>> application started with `-Xbatch -XX:+UnlockDiagnosticVMOptions 
>> -XX:+LogCompilation` compiles about ~11500 methods (~9000 with C1 and ~2500 
>> with C2) resulting in an aggregated nmethod size of around ~40bm. 
>> Additionally recording `evol_method` dependencies only increases this size 
>> be about 400kb
>
> Volker Simonis has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Made the flag experimental and added an assertion to 
> set_can_hotswap_or_post_breakpoint()

Marked as reviewed by dlong (Reviewer).

No, go ahead.

-

PR Review: https://git.openjdk.org/jdk/pull/17509#pullrequestreview-1844939013
PR Comment: https://git.openjdk.org/jdk/pull/17509#issuecomment-1911240540


Re: RFR: 8326524: Rename agent_common.h

2024-02-22 Thread Dean Long
On Thu, 22 Feb 2024 19:38:26 GMT, Kim Barrett  wrote:

> Please review this trivial change that renames the file
> test/hotspot/jtreg/vmTestbase/nsk/share/jvmti/agent_common/agent_common.h to
> agent_common.hpp.
> 
> The #include updates were performed mechanically, and builds would fail if
> there were mistakes.
> 
> The copyright updates were similarly performed mechanically. All but a handful
> of the including files have already had their copyrights updated recently,
> likely as part of JDK-8324681. The only change to the renamed file is a
> copyright update, since no code changes were required.
> 
> Testing: mach5 tier1

If we wanted to minimize changes, we could have agent_common.h include 
agent_common.hpp.  Then we don't have to change all the .cpp files, which have 
other problems, like the includes not being sorted.

-

PR Comment: https://git.openjdk.org/jdk/pull/17970#issuecomment-1960638558


Re: RFR: 8326524: Rename agent_common.h

2024-02-23 Thread Dean Long
On Thu, 22 Feb 2024 19:38:26 GMT, Kim Barrett  wrote:

> Please review this trivial change that renames the file
> test/hotspot/jtreg/vmTestbase/nsk/share/jvmti/agent_common/agent_common.h to
> agent_common.hpp.
> 
> The #include updates were performed mechanically, and builds would fail if
> there were mistakes.
> 
> The copyright updates were similarly performed mechanically. All but a handful
> of the including files have already had their copyrights updated recently,
> likely as part of JDK-8324681. The only change to the renamed file is a
> copyright update, since no code changes were required.
> 
> Testing: mach5 tier1

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/17970#pullrequestreview-1898767675


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock

2024-03-05 Thread Dean Long
On Tue, 6 Feb 2024 22:59:04 GMT, Coleen Phillimore  wrote:

> This change creates a new sort of native recursive lock that can be held 
> during JNI and Java calls, which can be used for synchronization while 
> creating objArrayKlasses at runtime.
> 
> Passes tier1-7.

Is the caller always a JavaThread?  I'm wondering if your new recursive lock 
class could use the existing ObjectMonitor.  I thought David asked the same 
question, but I can't find it.

-

PR Comment: https://git.openjdk.org/jdk/pull/17739#issuecomment-1979796523


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock

2024-03-06 Thread Dean Long
On Tue, 6 Feb 2024 22:59:04 GMT, Coleen Phillimore  wrote:

> This change creates a new sort of native recursive lock that can be held 
> during JNI and Java calls, which can be used for synchronization while 
> creating objArrayKlasses at runtime.
> 
> Passes tier1-7.

OK, that makes sense about loom and JOM.

src/hotspot/share/runtime/mutex.cpp line 537:

> 535: // can be called by jvmti by VMThread.
> 536: if (current->is_Java_thread()) {
> 537:   _sem.wait_with_safepoint_check(JavaThread::cast(current));

Why not use PlatformMutex + OSThreadWaitState instead of a semaphore?

-

PR Comment: https://git.openjdk.org/jdk/pull/17739#issuecomment-1982008253
PR Review Comment: https://git.openjdk.org/jdk/pull/17739#discussion_r1515269443


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock

2024-03-06 Thread Dean Long
On Wed, 6 Mar 2024 23:47:01 GMT, Coleen Phillimore  wrote:

>> src/hotspot/share/runtime/mutex.cpp line 537:
>> 
>>> 535: // can be called by jvmti by VMThread.
>>> 536: if (current->is_Java_thread()) {
>>> 537:   _sem.wait_with_safepoint_check(JavaThread::cast(current));
>> 
>> Why not use PlatformMutex + OSThreadWaitState instead of a semaphore?
>
> Semaphore seemed simpler (?)

OK.  It's a good thing HotSpot doesn't need to worry about priority-inheritance 
for mutexes.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17739#discussion_r1515314037


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock

2024-03-06 Thread Dean Long
On Tue, 6 Feb 2024 22:59:04 GMT, Coleen Phillimore  wrote:

> This change creates a new sort of native recursive lock that can be held 
> during JNI and Java calls, which can be used for synchronization while 
> creating objArrayKlasses at runtime.
> 
> Passes tier1-7.

src/hotspot/share/oops/arrayKlass.cpp line 135:

> 133: ResourceMark rm(THREAD);
> 134: {
> 135:   // Ensure atomic creation of higher dimensions

Isn't this comment still useful?

src/hotspot/share/oops/arrayKlass.cpp line 144:

> 142: ObjArrayKlass* ak =
> 143:   ObjArrayKlass::allocate_objArray_klass(class_loader_data(), 
> dim + 1, this, CHECK_NULL);
> 144: ak->set_lower_dimension(this);

Would it be useful to assert that the lower dimension has been set?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17739#discussion_r1515322667
PR Review Comment: https://git.openjdk.org/jdk/pull/17739#discussion_r1515323404


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock

2024-03-06 Thread Dean Long
On Tue, 6 Feb 2024 22:59:04 GMT, Coleen Phillimore  wrote:

> This change creates a new sort of native recursive lock that can be held 
> during JNI and Java calls, which can be used for synchronization while 
> creating objArrayKlasses at runtime.
> 
> Passes tier1-7.

src/hotspot/share/oops/instanceKlass.cpp line 2765:

> 2763: // To get a consistent list of classes we need MultiArray_lock to 
> ensure
> 2764: // array classes aren't observed while they are being restored.
> 2765: MutexLocker ml(MultiArray_lock);

Doesn't this revert JDK-8274338?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/17739#discussion_r1515330292


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock

2024-03-06 Thread Dean Long
On Tue, 6 Feb 2024 22:59:04 GMT, Coleen Phillimore  wrote:

> This change creates a new sort of native recursive lock that can be held 
> during JNI and Java calls, which can be used for synchronization while 
> creating objArrayKlasses at runtime.
> 
> Passes tier1-7.

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/17739#pullrequestreview-1921100707


Re: RFR: 8308745: ObjArrayKlass::allocate_objArray_klass may call into java while holding a lock [v2]

2024-03-06 Thread Dean Long
On Thu, 7 Mar 2024 01:38:56 GMT, Coleen Phillimore  wrote:

>> This change creates a new sort of native recursive lock that can be held 
>> during JNI and Java calls, which can be used for synchronization while 
>> creating objArrayKlasses at runtime.
>> 
>> Passes tier1-7.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Dean's comments.

> Does a PlatformMutex handle priority-inheritance?

It would depend on the OS and the mutex impementation.  You can ignore my 
comment.  It was from long ago trying to put Java on top of a real-time OS.

-

PR Comment: https://git.openjdk.org/jdk/pull/17739#issuecomment-1982207873


Re: RFR: 8329332: Remove CompiledMethod and CodeBlobLayout classes

2024-04-01 Thread Dean Long
On Fri, 29 Mar 2024 19:35:45 GMT, Vladimir Kozlov  wrote:

> Revert [JDK-8152664](https://bugs.openjdk.org/browse/JDK-8152664) RFE 
> [changes](https://github.com/openjdk/jdk/commit/b853eb7f5ca24eeeda18acbb14287f706499c365)
>  which was used for AOT [JEP 295](https://openjdk.org/jeps/295) 
> implementation in JDK 9. The code was left in HotSpot assuming it will help 
> in a future. But during work on Leyden we decided to not use it. In Leyden 
> cached compiled code will be restored in CodeCache as normal nmethods: no 
> need to change VM's runtime and GC code to process them.
> 
> I may work on optimizing `CodeBlob` and `nmethod` fields layout to reduce 
> header size in separate changes. In these changes I did simple fields 
> reordering to keep small (1 byte) fields together.
> 
> I do not see (and not expected) performance difference with these changes.
> 
> Tested tier1-5, xcomp, stress. Running performance testing.
> 
> I need help with testing on platforms which Oracle does not support.

The `not_used` state was introduced for AOT.  It can go away now.

-

PR Comment: https://git.openjdk.org/jdk/pull/18554#issuecomment-2030282409


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/code/codeBlob.cpp line 88:

> 86:   S390_ONLY(_ctable_offset(0) COMMA)
> 87:   _header_size((uint16_t)header_size),
> 88:   _frame_complete_offset((int16_t)frame_complete_offset),

Rather than a raw cast, it would be better to use checked_cast here, or better 
yet, change the incoming parameter types to match the field type.  That way, if 
the caller is passing a constant, the compiler can check it at compile time.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566601934


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/code/codeBlob.cpp line 120:

> 118: 
> 119:   _header_size((uint16_t)header_size),
> 120:   _frame_complete_offset((int16_t)CodeOffsets::frame_never_safe),

See above.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566604310


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/code/nmethod.hpp line 282:

> 280:   _has_flushed_dependencies:1, // Used for maintenance of 
> dependencies (under CodeCache_lock)
> 281:   _is_unlinked:1,  // mark during class unloading
> 282:   _load_reported:1;// used by jvmti to track if an 
> event has been posted for this nmethod

It seems like the type could be changed from uint8_t to bool.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566631312


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/code/nmethod.hpp line 205:

> 203:   // offsets to find the receiver for non-static native wrapper 
> frames.
> 204:   ByteSize _native_receiver_sp_offset;
> 205:   ByteSize _native_basic_lock_sp_offset;

Don't we need an assert in the accessor functions to make sure nmethod is 
native or not?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566634384


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/code/nmethod.cpp line 1235:

> 1233: int skipped_insts_size   = 
> code_buffer->total_skipped_instructions_size();
> 1234: #ifdef ASSERT
> 1235: assert(((skipped_insts_size >> 16) == 0), "size is bigger than 
> 64Kb: %d", skipped_insts_size);

Suggestion:


I think it's simpler just to use checked_cast below.

src/hotspot/share/code/nmethod.cpp line 1240:

> 1238: int consts_offset = 
> code_buffer->total_offset_of(code_buffer->consts());
> 1239: assert(consts_offset == 0, "const_offset: %d", consts_offset);
> 1240: #endif

Suggestion:

src/hotspot/share/code/nmethod.cpp line 1241:

> 1239: assert(consts_offset == 0, "const_offset: %d", consts_offset);
> 1240: #endif
> 1241: _skipped_instructions_size = (uint16_t)skipped_insts_size;

Suggestion:

_skipped_instructions_size = 
checked_cast(code_buffer->total_skipped_instructions_size());

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566764300
PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566765068
PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566759786


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/code/nmethod.cpp line 1441:

> 1439: int deps_size = align_up((int)dependencies->size_in_bytes(), 
> oopSize);
> 1440: int sum_size  = oops_size + metadata_size + deps_size;
> 1441: assert((sum_size >> 16) == 0, "data size is bigger than 64Kb: %d", 
> sum_size);

I suggest using checked_cast for the assignment below, rather than 
special-purpose checks here.

src/hotspot/share/code/nmethod.cpp line 1445:

> 1443: _metadata_offset  = (uint16_t)oops_size;
> 1444: _dependencies_offset  = _metadata_offset  + 
> (uint16_t)metadata_size;
> 1445: _scopes_pcs_offset= _dependencies_offset  + (uint16_t)deps_size;

Use checked_cast instead of raw casts.

src/hotspot/share/code/nmethod.cpp line 1459:

> 1457: assert((data_offset() + data_end_offset) <= nmethod_size, "wrong 
> nmethod's size: %d < %d", nmethod_size, (data_offset() + data_end_offset));
> 1458: 
> 1459: _entry_offset  = 
> (uint16_t)offsets->value(CodeOffsets::Entry);

Use checked_cast.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566771026
PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566772567
PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566773477


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Mon, 15 Apr 2024 03:24:07 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Union fields which usages do not overlap

src/hotspot/share/memory/heap.hpp line 58:

> 56:   void set_length(size_t length) {
> 57: LP64_ONLY( assert(((length >> 32) == 0), "sanity"); )
> 58: _header._length = (uint32_t)length;

Suggestion:

_header._length = checked_castlength;

src/hotspot/share/memory/heap.hpp line 63:

> 61:   // Accessors
> 62:   void* allocated_space() const  { return (void*)(this + 
> 1); }
> 63:   size_t length() const  { return 
> (size_t)_header._length; }

This cast looks unnecessary.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566784458
PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566784587


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Tue, 16 Apr 2024 03:06:13 GMT, Vladimir Kozlov  wrote:

>> src/hotspot/share/code/nmethod.hpp line 205:
>> 
>>> 203:   // offsets to find the receiver for non-static native wrapper 
>>> frames.
>>> 204:   ByteSize _native_receiver_sp_offset;
>>> 205:   ByteSize _native_basic_lock_sp_offset;
>> 
>> Don't we need an assert in the accessor functions to make sure nmethod is 
>> native or not?
>
> I thought about that but in both places where these accessors are called 
> (`frame::get_native_monitor()` and `frame::get_native_receiver()`) there are 
> such asserts already:
> https://github.com/openjdk/jdk/blob/master/src/hotspot/share/runtime/frame.cpp#L1085

OK, but I'd rather see it in the accessors too.  Some users are checking for 
method()->is_native() and others are checking for is_osr_method(), so we need 
to make sure those are always mutually exclusive: method()->is_native() != 
is_osr_method().

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566806754


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-15 Thread Dean Long
On Tue, 16 Apr 2024 03:12:48 GMT, Vladimir Kozlov  wrote:

>> src/hotspot/share/code/nmethod.hpp line 282:
>> 
>>> 280:   _has_flushed_dependencies:1, // Used for maintenance of 
>>> dependencies (under CodeCache_lock)
>>> 281:   _is_unlinked:1,  // mark during class unloading
>>> 282:   _load_reported:1;// used by jvmti to track if an 
>>> event has been posted for this nmethod
>> 
>> It seems like the type could be changed from uint8_t to bool.
>
> Is there difference in generated code when you use bool instead of uint8_t?
> I used uint8_t to easy change to uint16_t in a future if needed.

I don't think uint8_t vs uint16_t matters, only if it is signed, unsigned, or 
bool.  So if we have more than 8 individual :1 fields, it will expand to a 2nd 
byte.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1566814258


Re: RFR: 8329433: Reduce nmethod header size [v3]

2024-04-17 Thread Dean Long
On Tue, 16 Apr 2024 16:09:21 GMT, Vladimir Kozlov  wrote:

>> src/hotspot/share/code/nmethod.cpp line 1441:
>> 
>>> 1439: int deps_size = align_up((int)dependencies->size_in_bytes(), 
>>> oopSize);
>>> 1440: int sum_size  = oops_size + metadata_size + deps_size;
>>> 1441: assert((sum_size >> 16) == 0, "data size is bigger than 64Kb: 
>>> %d", sum_size);
>> 
>> I suggest using checked_cast for the assignment below, rather than 
>> special-purpose checks here.
>
> Okay. But I will put above code under `#ifdef ASSERT` then.

The ASSERT block above looks unnecessary, now that field assignments below are 
using checked_cast.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18768#discussion_r1569496753


Re: RFR: 8330388: Remove invokedynamic cache index encoding

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 15:26:52 GMT, Matias Saavedra Silva  
wrote:

> Before [JDK-8307190](https://bugs.openjdk.org/browse/JDK-8307190), 
> [JDK-8309673](https://bugs.openjdk.org/browse/JDK-8309673), and 
> [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), invokedynamic 
> operands needed to be rewritten to encoded values to better distinguish indy 
> entries from other cp cache entries. The above changes now distinguish 
> between entries with `to_cp_index()` using the bytecode, which is now 
> propagated by the callers.
> 
> The encoding flips the bits of the index so the encoded index is always 
> negative, leading to access errors if there is no matching decode call. These 
> calls are removed with some methods adjusted to distinguish between indices 
> with the bytecode. Verified with tier 1-5 tests.

src/hotspot/share/ci/ciEnv.cpp line 1513:

> 1511: // process the BSM
> 1512: int pool_index = indy_info->constant_pool_index();
> 1513: BootstrapInfo bootstrap_specifier(cp, pool_index, indy_index);

Why not just change the incoming parameter name to `index`?

src/hotspot/share/classfile/resolutionErrors.hpp line 60:

> 58: 
> 59:   // This function is used to encode an invokedynamic index to 
> differentiate it from a
> 60:   // constant pool index.  It assumes it is being called with a index 
> that is less than 0

Is this comment still correct?

src/hotspot/share/interpreter/bootstrapInfo.cpp line 77:

> 75: return true;
> 76:   } else if (indy_entry->resolution_failed()) {
> 77: int encoded_index = 
> ResolutionErrorTable::encode_indy_index(_indy_index);

That's an improvement, from two levels of encoding to only one!

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18819#discussion_r1569625123
PR Review Comment: https://git.openjdk.org/jdk/pull/18819#discussion_r1569626519
PR Review Comment: https://git.openjdk.org/jdk/pull/18819#discussion_r1569627219


Re: RFR: 8330388: Remove invokedynamic cache index encoding

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 15:26:52 GMT, Matias Saavedra Silva  
wrote:

> Before [JDK-8307190](https://bugs.openjdk.org/browse/JDK-8307190), 
> [JDK-8309673](https://bugs.openjdk.org/browse/JDK-8309673), and 
> [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), invokedynamic 
> operands needed to be rewritten to encoded values to better distinguish indy 
> entries from other cp cache entries. The above changes now distinguish 
> between entries with `to_cp_index()` using the bytecode, which is now 
> propagated by the callers.
> 
> The encoding flips the bits of the index so the encoded index is always 
> negative, leading to access errors if there is no matching decode call. These 
> calls are removed with some methods adjusted to distinguish between indices 
> with the bytecode. Verified with tier 1-5 tests.

Did you consider minimizing changes by leaving 
decode_invokedynamic_index/encode_invokedynamic_index calls in place, but 
having the implementations not change the value?

-

PR Comment: https://git.openjdk.org/jdk/pull/18819#issuecomment-2062609288


Re: RFR: 8330388: Remove invokedynamic cache index encoding

2024-04-17 Thread Dean Long
On Wed, 17 Apr 2024 15:26:52 GMT, Matias Saavedra Silva  
wrote:

> Before [JDK-8307190](https://bugs.openjdk.org/browse/JDK-8307190), 
> [JDK-8309673](https://bugs.openjdk.org/browse/JDK-8309673), and 
> [JDK-8301995](https://bugs.openjdk.org/browse/JDK-8301995), invokedynamic 
> operands needed to be rewritten to encoded values to better distinguish indy 
> entries from other cp cache entries. The above changes now distinguish 
> between entries with `to_cp_index()` using the bytecode, which is now 
> propagated by the callers.
> 
> The encoding flips the bits of the index so the encoded index is always 
> negative, leading to access errors if there is no matching decode call. These 
> calls are removed with some methods adjusted to distinguish between indices 
> with the bytecode. Verified with tier 1-5 tests.

@dougxc should check JVMCI changes.

-

Marked as reviewed by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18819#pullrequestreview-2007498087


Re: RFR: 8329433: Reduce nmethod header size [v8]

2024-04-17 Thread Dean Long
On Thu, 18 Apr 2024 00:41:03 GMT, Vladimir Kozlov  wrote:

>> This is part of changes which try to reduce size of `nmethod` and `codeblob` 
>> data vs code in CodeCache.
>> These changes reduced size of `nmethod` header from 288 to 232 bytes. From 
>> 304 to 248 in optimized VM:
>> 
>> Statistics for 1282 bytecoded nmethods for C2:
>>  total in heap = 5560352 (100%)
>>  header = 389728 (7.009053%)
>> 
>> vs
>> 
>> Statistics for 1322 bytecoded nmethods for C2:
>>  total in heap  = 8307120 (100%)
>>  header = 327856 (3.946687%)
>> 
>> 
>> Several unneeded fields in `nmethod` and `CodeBlob` were removed. Some 
>> fields were changed from `int` to `int16_t` with added corresponding asserts 
>> to make sure their values are fit into 16 bits.
>> 
>> I did additional cleanup after recent `CompiledMethod` removal.
>> 
>> Tested tier1-7,stress,xcomp and performance testing.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Address comment

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/18768#pullrequestreview-2007632424


Re: RFR: 8331087: Move immutable nmethod data from CodeCache

2024-04-26 Thread Dean Long
On Fri, 26 Apr 2024 21:36:50 GMT, Vladimir Kozlov  wrote:

>> Move immutable nmethod's data from CodeCache to C heap. It includes 
>> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
>> speculations, jvmci_data`. It amounts for about 30% (optimized VM) of space 
>> in CodeCache.
>> 
>> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
>> nmethod's data. Bail out compilation if allocation failed.
>> 
>> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
>> nmethod's header size increase.
>> 
>> Tested tier1-5, stress,xcomp
>> 
>> Our performance testing does not show difference.
>> 
>> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.
>
> src/hotspot/share/code/nmethod.cpp line 117:
> 
>> 115:   result = static_cast(thing); \
>> 116:   assert(static_cast(result) == thing, "failed: %d != %d", 
>> static_cast(result), thing);
>> 117: 
> 
> I replaced `checked_cast<>()` with this macro because of next issues:
>  - The existing assert points to `utilities/checkedCast.hpp` file where this 
> method is located and not where failed cast. It does not help when it is used 
> several times in one method (for example, in `nmethod()` constructors).
>  - The existing assert does not print values

I thought @kimbarrett had a draft PR to address the error reporting issue, but 
I can't seem to find it.  To solve the general problem, I think we need a 
version of vmassert() that takes `char* file, int lineno` as arguments, and a 
macro wrapper for checked_cast() that passes `__FILE__` and `__LINEN__` from 
the caller.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18984#discussion_r1581628786


Re: RFR: 8331087: Move immutable nmethod data from CodeCache

2024-04-26 Thread Dean Long
On Fri, 26 Apr 2024 21:16:03 GMT, Vladimir Kozlov  wrote:

> Move immutable nmethod's data from CodeCache to C heap. It includes 
> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
> speculations, jvmci_data`. It amounts for about 30% (optimized VM) of space 
> in CodeCache.
> 
> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
> nmethod's data. Bail out compilation if allocation failed.
> 
> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
> nmethod's header size increase.
> 
> Tested tier1-5, stress,xcomp
> 
> Our performance testing does not show difference.
> 
> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.

src/hotspot/share/code/nmethod.cpp line 1332:

> 1330: #if INCLUDE_JVMCI
> 1331: _speculations_offset = _scopes_data_offset;
> 1332: _jvmci_data_offset   = _speculations_offset;

Why not use 0 for all these?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18984#discussion_r1581642931


Re: RFR: 8331087: Move immutable nmethod data from CodeCache

2024-04-26 Thread Dean Long
On Fri, 26 Apr 2024 21:16:03 GMT, Vladimir Kozlov  wrote:

> Move immutable nmethod's data from CodeCache to C heap. It includes 
> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
> speculations, jvmci_data`. It amounts for about 30% (optimized VM) of space 
> in CodeCache.
> 
> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
> nmethod's data. Bail out compilation if allocation failed.
> 
> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
> nmethod's header size increase.
> 
> Tested tier1-5, stress,xcomp
> 
> Our performance testing does not show difference.
> 
> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.

src/hotspot/share/code/nmethod.cpp line 1484:

> 1482:   // Calculate positive offset as distance between the start of 
> stubs section
> 1483:   // (which is also the end of instructions section) and the start 
> of the handler.
> 1484:   CHECKED_CAST(_unwind_handler_offset, int16_t, (_stub_offset - 
> code_offset() - offsets->value(CodeOffsets::UnwindHandler)));

Suggestion:

  int unwind_handler_offset = code_offset() + 
offsets->value(CodeOffsets::UnwindHandler);
  CHECKED_CAST(_unwind_handler_offset, int16_t, _stub_offset - 
unwind_handler_offset);

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18984#discussion_r1581644356


Re: RFR: 8331087: Move immutable nmethod data from CodeCache

2024-04-26 Thread Dean Long
On Fri, 26 Apr 2024 21:16:03 GMT, Vladimir Kozlov  wrote:

> Move immutable nmethod's data from CodeCache to C heap. It includes 
> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
> speculations, jvmci_data`. It amounts for about 30% (optimized VM) of space 
> in CodeCache.
> 
> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
> nmethod's data. Bail out compilation if allocation failed.
> 
> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
> nmethod's header size increase.
> 
> Tested tier1-5, stress,xcomp
> 
> Our performance testing does not show difference.
> 
> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.

src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/code/NMethod.java line 528:

> 526:   private int getScopesDataOffset()   { return (int) 
> scopesDataOffsetField  .getValue(addr); }
> 527:   private int getScopesPCsOffset(){ return (int) 
> scopesPCsOffsetField   .getValue(addr); }
> 528:   private int getDependenciesOffset() { return (int) 0; }

Suggestion:


No longer used.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/18984#discussion_r1581671710


Re: RFR: 8331087: Move immutable nmethod data from CodeCache

2024-04-28 Thread Dean Long
On Fri, 26 Apr 2024 21:16:03 GMT, Vladimir Kozlov  wrote:

> Move immutable nmethod's data from CodeCache to C heap. It includes 
> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
> speculations, jvmci_data`. It amounts for about 30% (optimized VM) of space 
> in CodeCache.
> 
> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
> nmethod's data. Bail out compilation if allocation failed.
> 
> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
> nmethod's header size increase.
> 
> Tested tier1-5, stress,xcomp
> 
> Our performance testing does not show difference.
> 
> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.

It only makes sense if the immutable data heap is not also used for other 
critical resources.  If malloc or metaspace were used as the immutable data 
heap, normally failures in those heaps are fatal, because other critical 
resources (monitors, classes, etc) are allocated from there, so any failure 
means the JVM is about to die.  There's no reason to find a fall-back method to 
allocate a new nmethod in that case.

-

PR Comment: https://git.openjdk.org/jdk/pull/18984#issuecomment-2081364009


Re: RFR: 8331087: Move immutable nmethod data from CodeCache

2024-04-29 Thread Dean Long
On Sun, 28 Apr 2024 07:02:40 GMT, Dean Long  wrote:

>> Move immutable nmethod's data from CodeCache to C heap. It includes 
>> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
>> speculations`. It amounts for about 30% (optimized VM) of space in CodeCache.
>> 
>> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
>> nmethod's data. Call `vm_exit_out_of_memory()` if allocation failed.
>> 
>> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
>> nmethod's header size increase.
>> 
>> Tested tier1-5, stress,xcomp
>> 
>> Our performance testing does not show difference.
>> 
>> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.
>
> It only makes sense if the immutable data heap is not also used for other 
> critical resources.  If malloc or metaspace were used as the immutable data 
> heap, normally failures in those heaps are fatal, because other critical 
> resources (monitors, classes, etc) are allocated from there, so any failure 
> means the JVM is about to die.  There's no reason to find a fall-back method 
> to allocate a new nmethod in that case.

> Just to be clear @dean-long , you're saying failure to allocate immutable 
> data in the C heap should result in a fatal error? Makes sense to me as the 
> VM must indeed be very close to crashing anyway in that case. It also, 
> obviates the need for propagating `out_of_memory_error` to JVMCI code.

I hadn't thought it through that far, actually.  I was only pointing out that 
the proposed fall-back:

> increase nmethod size and put immutable data there (as before).

isn't worth the trouble.  But making the C heap failure fatal immediately is 
reasonable, especially if it simplifies JVMCI error reporting.

-

PR Comment: https://git.openjdk.org/jdk/pull/18984#issuecomment-2082083104


Re: RFR: 8331087: Move immutable nmethod data from CodeCache [v2]

2024-04-29 Thread Dean Long
On Sun, 28 Apr 2024 23:37:22 GMT, Vladimir Kozlov  wrote:

>> Move immutable nmethod's data from CodeCache to C heap. It includes 
>> `dependencies, nul_chk_table, handler_table, scopes_pcs, scopes_data, 
>> speculations`. It amounts for about 30% (optimized VM) of space in CodeCache.
>> 
>> Use HotSpot's `os::malloc()` to allocate memory in C heap for immutable 
>> nmethod's data. Call `vm_exit_out_of_memory()` if allocation failed.
>> 
>> Shuffle fields order and change some fields size from 4 to 2 bytes to avoid 
>> nmethod's header size increase.
>> 
>> Tested tier1-5, stress,xcomp
>> 
>> Our performance testing does not show difference.
>> 
>> Example of updated `-XX:+PrintNMethodStatistics` output is in JBS comment.
>
> Vladimir Kozlov has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Address comments. Moved jvmci_data back to mutable data section.

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/18984#pullrequestreview-2027786061


Re: RFR: 8330171: Lazy W^X switch implementation

2024-05-12 Thread Dean Long
On Fri, 12 Apr 2024 14:40:05 GMT, Sergey Nazarkin  wrote:

> An alternative for preemptively switching the W^X thread mode on macOS with 
> an AArch64 CPU. This implementation triggers the switch in response to the 
> SIGBUS signal if the *si_addr* belongs to the CodeCache area. With this 
> approach, it is now feasible to eliminate all WX guards and avoid potentially 
> costly operations. However, no significant improvement or degradation in 
> performance has been observed.  Additionally, considering the issue with 
> AsyncGetCallTrace, the patched JVM has been successfully operated with 
> [asgct_bottom](https://github.com/parttimenerd/asgct_bottom) and 
> [async-profiler](https://github.com/async-profiler/async-profiler). 
> 
> Additional testing:
> - [x] MacOS AArch64 server fastdebug *gtets*
> - [ ] MacOS AArch64 server fastdebug *jtreg:hotspot:tier4*
> - [ ] Benchmarking
> 
> @apangin and @parttimenerd could you please check the patch on your 
> scenarios??

I think there is a sweet-spot middle-ground between the two extremes: 
full-lazy, ideal for performance, and fine-grained execute-by-default, ideal 
for security.  I don't think we should change to full-lazy and remove all the 
guard rails at this time.  I am investigating execute-by-default, and it looks 
promising.

-

Changes requested by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/18762#pullrequestreview-2051465621


Re: RFR: 8323640: [TESTBUG]testMemoryFailCount in jdk/internal/platform/docker/TestDockerMemoryMetrics.java always fail because OOM killed [v3]

2024-07-02 Thread Dean Long
On Tue, 23 Jan 2024 13:04:43 GMT, SendaoYan  wrote:

>> 8323640: [TESTBUG]testMemoryFailCount in 
>> jdk/internal/platform/docker/TestDockerMemoryMetrics.java always fail 
>> because OOM killed
>
> SendaoYan has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   8323640: [TESTBUG]testMemoryFailCount in 
> jdk/internal/platform/docker/TestDockerMemoryMetrics.java always fail because 
> OOM killed
>   
>   Signed-off-by: sendaoYan 

Why does 8M trigger the OOM Killer, but 1M does not?

-

PR Comment: https://git.openjdk.org/jdk/pull/17514#issuecomment-2204387282


Re: RFR: 8205957: setfldw001/TestDescription.java fails with bad field value

2024-08-15 Thread Dean Long
On Wed, 14 Aug 2024 19:34:26 GMT, Leonid Mesnik  wrote:

> The summary of the problem. Test is intermittently failing because can't get 
> expected field watch event.
> The test is failing to get event in the 'setfmodw001b' thread with Xcomp only.
> I verified that frame with the method 'run' of setfmodw001b is compiled when 
> line
> 118: setfmodw001.setWatch(4);
> is executed, however the thread is in interp_only mode. The watch events are 
> supported by interpreter only and just ignored by compiled code.
> 
> The reason of failure is race beteween setting interp_only mode in line
> 
> https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/vmTestbase/nsk/jvmti/SetFieldModificationWatch/setfmodw001.java#L75
> 
>  and calling method call_helper while
>  the method run()
> https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/vmTestbase/nsk/jvmti/SetFieldModificationWatch/setfmodw001.java#L116
> 
>  in newly created thread 'setfmodw001b' is invoked.
> 
> The javaCalls:call are used to invoke methods from hotspot, so it might be 
> rare issues. But still, synchronization might be improved.
> The
> void JavaCalls::call_helper(JavaValue* result, const methodHandle& method, 
> JavaCallArguments* args, TRAPS)
> 
> checks if interp_only mode is set and use 'Method::from_interpreted_entry()' 
> if not. However the interp_only might be set later before compiled method is 
> called (or enter first safe point?). This might happens in safepoint during 
> transition via handshake.
> So the running thread is in interp_only mode however the run() method is 
> compiled and executed already and never going to be deoptimized.
> 
> The additional setWatch calls don't try to deptimize method that are already 
> in interp_only mode.
> 
> BTW, the when JVMCI is enabled and verified adapter exists it is also will be 
> loaded even in interp_only mode set. There is not race here, just ignoring of 
> interp_only mode.
> 
> I run failing test with Xcomp ~1000 times and tiers1-5.

This looks correct.  If I understand correctly, interpreted-only mode is only 
set at a safepoint, and JavaCallWrapper offers a safepoint.  (I have argued 
before, while working on a similar bug, that we probably don't strictly need to 
offer a safepoint here, but that's a separate issue.)
I think a NoSafepointVerifier before the `is_interp_only_mode` check would have 
caught this issue, so I suggest you add it at the new location.

-

PR Comment: https://git.openjdk.org/jdk/pull/20587#issuecomment-2289782996


Re: RFR: 8205957: setfldw001/TestDescription.java fails with bad field value

2024-08-19 Thread Dean Long
On Mon, 19 Aug 2024 23:46:28 GMT, Serguei Spitsyn  wrote:

>> The summary of the problem. Test is intermittently failing because can't get 
>> expected field watch event.
>> The test is failing to get event in the 'setfmodw001b' thread with Xcomp 
>> only.
>> I verified that frame with the method 'run' of setfmodw001b is compiled when 
>> line
>> 118: setfmodw001.setWatch(4);
>> is executed, however the thread is in interp_only mode. The watch events are 
>> supported by interpreter only and just ignored by compiled code.
>> 
>> The reason of failure is race beteween setting interp_only mode in line
>> 
>> https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/vmTestbase/nsk/jvmti/SetFieldModificationWatch/setfmodw001.java#L75
>> 
>>  and calling method call_helper while
>>  the method run()
>> https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/vmTestbase/nsk/jvmti/SetFieldModificationWatch/setfmodw001.java#L116
>> 
>>  in newly created thread 'setfmodw001b' is invoked.
>> 
>> The javaCalls:call are used to invoke methods from hotspot, so it might be 
>> rare issues. But still, synchronization might be improved.
>> The
>> void JavaCalls::call_helper(JavaValue* result, const methodHandle& method, 
>> JavaCallArguments* args, TRAPS)
>> 
>> checks if interp_only mode is set and use 'Method::from_interpreted_entry()' 
>> if not. However the interp_only might be set later before compiled method is 
>> called (or enter first safe point?). This might happens in safepoint during 
>> transition via handshake.
>> So the running thread is in interp_only mode however the run() method is 
>> compiled and executed already and never going to be deoptimized.
>> 
>> The additional setWatch calls don't try to deptimize method that are already 
>> in interp_only mode.
>> 
>> BTW, the when JVMCI is enabled and verified adapter exists it is also will 
>> be loaded even in interp_only mode set. There is not race here, just 
>> ignoring of interp_only mode.
>> 
>> I run failing test with Xcomp ~1000 times and tiers1-5.
>
> src/hotspot/share/runtime/javaCalls.cpp line 358:
> 
>> 356: #endif
>> 357: 
>> 358:   CompilationPolicy::compile_if_required(method, CHECK);
> 
> I'd like to understand what is going to happen at the line 358. Are we going 
> to compile method with -Xcomp even though the `interp_only` mode is required? 
> If so, do we waist cycles in doing this?
> This question is not to Leonid. Maybe @dean-long could answer this, please?

Yes, I believe we will compile the method even in interp-only mode.  As -Xcomp 
is a mode to stress the compiler, I don't think it matters if we compile 
eagerly here or wait until interp-only mode is turned off.  Other threads not 
in interp-only mode will compile the method anyway.  But if it makes sense to 
change this behavior, then let's do that in a separate bug/RFE.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20587#discussion_r1722600762


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v2]

2024-08-19 Thread Dean Long
On Mon, 19 Aug 2024 23:19:04 GMT, David Holmes  wrote:

>> This work has been split out from JDK-8328877: [JNI] The JNI Specification 
>> needs to address the limitations of integer UTF-8 String lengths
>> 
>> The modified UTF-8 format used by the VM can require up to six bytes to 
>> represent one unicode character, but six byte characters are stored as 
>> UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the 
>> maximum length is 3*`Integer.MAX_VALUE`.  Though with compact strings this 
>> reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should 
>> therefore define UTF8 lengths as `size_t` to accommodate all possible 
>> representations. Higher-level API's can still use `int` if they know the 
>> strings (eg symbols) are sufficiently constrained in length.  See the 
>> comments in utf8.hpp that explain Strings, compact strings and the encoding.
>> 
>> As the existing JNI `GetStringUTFLength` still requires the current 
>> truncating behaviour of ` UNICODE::utf8_length` we add back 
>> `UNICODE::utf8_length_as_int` for it to use.
>> 
>> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& 
>> length)` use `length` as an IN/OUT parameter: it is the incoming (int) 
>> length of the jbyte/jchar array, and the outgoing (size_t) length of the 
>> UTF8 sequence. This makes some of the call sites a little messy with casts.
>> 
>> Testing:
>>  - tiers 1-4
>>  - GHA
>
> David Holmes has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   Add missing cast for signed-to-unsigned converion.

src/hotspot/share/classfile/javaClasses.cpp line 695:

> 693:   // `length` is used as the incoming number of characters to
> 694:   // convert, and then set as the number of bytes in the UTF8 sequence.
> 695:   size_t  length = java_lang_String::length(java_string, value);

Above comment looks wrong.  `length` is not an in/out reference below, so why 
not leave it as `int` at line 695?  Assigning `int` to `size_t` will introduce 
a -Wsign-conversion warning.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1722633109


Re: RFR: 8205957: setfldw001/TestDescription.java fails with bad field value

2024-08-20 Thread Dean Long
On Wed, 14 Aug 2024 19:34:26 GMT, Leonid Mesnik  wrote:

> The summary of the problem. Test is intermittently failing because can't get 
> expected field watch event.
> The test is failing to get event in the 'setfmodw001b' thread with Xcomp only.
> I verified that frame with the method 'run' of setfmodw001b is compiled when 
> line
> 118: setfmodw001.setWatch(4);
> is executed, however the thread is in interp_only mode. The watch events are 
> supported by interpreter only and just ignored by compiled code.
> 
> The reason of failure is race beteween setting interp_only mode in line
> 
> https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/vmTestbase/nsk/jvmti/SetFieldModificationWatch/setfmodw001.java#L75
> 
>  and calling method call_helper while
>  the method run()
> https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/vmTestbase/nsk/jvmti/SetFieldModificationWatch/setfmodw001.java#L116
> 
>  in newly created thread 'setfmodw001b' is invoked.
> 
> The javaCalls:call are used to invoke methods from hotspot, so it might be 
> rare issues. But still, synchronization might be improved.
> The
> void JavaCalls::call_helper(JavaValue* result, const methodHandle& method, 
> JavaCallArguments* args, TRAPS)
> 
> checks if interp_only mode is set and use 'Method::from_interpreted_entry()' 
> if not. However the interp_only might be set later before compiled method is 
> called (or enter first safe point?). This might happens in safepoint during 
> transition via handshake.
> So the running thread is in interp_only mode however the run() method is 
> compiled and executed already and never going to be deoptimized.
> 
> The additional setWatch calls don't try to deptimize method that are already 
> in interp_only mode.
> 
> BTW, the when JVMCI is enabled and verified adapter exists it is also will be 
> loaded even in interp_only mode set. There is not race here, just ignoring of 
> interp_only mode.
> 
> I run failing test with Xcomp ~1000 times and tiers1-5.

src/hotspot/share/runtime/javaCalls.cpp line 403:

> 401: } else {
> 402:   // Since the call stub sets up like the interpreter we call 
> the from_interpreted_entry
> 403:   // so we can go compiled via a i2c.

Do you think removed comment "Otherwise initial entry method will always run 
interpreted.", or it's understood and redundant?

src/hotspot/share/runtime/javaCalls.cpp line 413:

> 411: // respect to nmethod sweeping.
> 412: address
> 413: verified_entry_point = (address) 
> HotSpotJVMCI::InstalledCode::entryPoint(nullptr, alternative_target());

This line-break choice seems unconventional.  Can we just make it a single line 
like before?

test/hotspot/jtreg/vmTestbase/nsk/jvmti/SetFieldModificationWatch/setfmodw001/TestDescription.java
 line 57:

> 55:  *  /test/lib
> 56:  * @run main/othervm/native -agentlib:setfmodw001 
> -XX:TraceJVMTI=ec+,+ioe,+s -Xlog:jvmti=trace:file=vm.%p.log 
> nsk.jvmti.SetFieldModificationWatch.setfmodw001
> 57:  */

What's the reason for removing this test run?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20587#discussion_r1723692429
PR Review Comment: https://git.openjdk.org/jdk/pull/20587#discussion_r1723691524
PR Review Comment: https://git.openjdk.org/jdk/pull/20587#discussion_r1723690592


Re: RFR: 8205957: setfldw001/TestDescription.java fails with bad field value [v2]

2024-08-20 Thread Dean Long
On Tue, 20 Aug 2024 17:38:01 GMT, Leonid Mesnik  wrote:

>> The summary of the problem. Test is intermittently failing because can't get 
>> expected field watch event.
>> The test is failing to get event in the 'setfmodw001b' thread with Xcomp 
>> only.
>> I verified that frame with the method 'run' of setfmodw001b is compiled when 
>> line
>> 118: setfmodw001.setWatch(4);
>> is executed, however the thread is in interp_only mode. The watch events are 
>> supported by interpreter only and just ignored by compiled code.
>> 
>> The reason of failure is race beteween setting interp_only mode in line
>> 
>> https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/vmTestbase/nsk/jvmti/SetFieldModificationWatch/setfmodw001.java#L75
>> 
>>  and calling method call_helper while
>>  the method run()
>> https://github.com/openjdk/jdk/blob/master/test/hotspot/jtreg/vmTestbase/nsk/jvmti/SetFieldModificationWatch/setfmodw001.java#L116
>> 
>>  in newly created thread 'setfmodw001b' is invoked.
>> 
>> The javaCalls:call are used to invoke methods from hotspot, so it might be 
>> rare issues. But still, synchronization might be improved.
>> The
>> void JavaCalls::call_helper(JavaValue* result, const methodHandle& method, 
>> JavaCallArguments* args, TRAPS)
>> 
>> checks if interp_only mode is set and use 'Method::from_interpreted_entry()' 
>> if not. However the interp_only might be set later before compiled method is 
>> called (or enter first safe point?). This might happens in safepoint during 
>> transition via handshake.
>> So the running thread is in interp_only mode however the run() method is 
>> compiled and executed already and never going to be deoptimized.
>> 
>> The additional setWatch calls don't try to deptimize method that are already 
>> in interp_only mode.
>> 
>> BTW, the when JVMCI is enabled and verified adapter exists it is also will 
>> be loaded even in interp_only mode set. There is not race here, just 
>> ignoring of interp_only mode.
>> 
>> I run failing test with Xcomp ~1000 times and tiers1-5.
>
> Leonid Mesnik has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   fixed identation.

Looks good.

-

Marked as reviewed by dlong (Reviewer).

PR Review: https://git.openjdk.org/jdk/pull/20587#pullrequestreview-2248663420


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread Dean Long
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes  wrote:

>> This work has been split out from JDK-8328877: [JNI] The JNI Specification 
>> needs to address the limitations of integer UTF-8 String lengths
>> 
>> The modified UTF-8 format used by the VM can require up to six bytes to 
>> represent one unicode character, but six byte characters are stored as 
>> UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the 
>> maximum length is 3*`Integer.MAX_VALUE`.  Though with compact strings this 
>> reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should 
>> therefore define UTF8 lengths as `size_t` to accommodate all possible 
>> representations. Higher-level API's can still use `int` if they know the 
>> strings (eg symbols) are sufficiently constrained in length.  See the 
>> comments in utf8.hpp that explain Strings, compact strings and the encoding.
>> 
>> As the existing JNI `GetStringUTFLength` still requires the current 
>> truncating behaviour of ` UNICODE::utf8_length` we add back 
>> `UNICODE::utf8_length_as_int` for it to use.
>> 
>> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& 
>> length)` use `length` as an IN/OUT parameter: it is the incoming (int) 
>> length of the jbyte/jchar array, and the outgoing (size_t) length of the 
>> UTF8 sequence. This makes some of the call sites a little messy with casts.
>> 
>> Testing:
>>  - tiers 1-4
>>  - GHA
>
> David Holmes has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   more missing casts

src/hotspot/share/classfile/javaClasses.cpp line 307:

> 305:   {
> 306: ResourceMark rm;
> 307: size_t utf8_len = static_cast(length);

I think there should be an assert that length is not negative, probably at the 
very beginning of this function.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1731992852


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread Dean Long
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes  wrote:

>> This work has been split out from JDK-8328877: [JNI] The JNI Specification 
>> needs to address the limitations of integer UTF-8 String lengths
>> 
>> The modified UTF-8 format used by the VM can require up to six bytes to 
>> represent one unicode character, but six byte characters are stored as 
>> UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the 
>> maximum length is 3*`Integer.MAX_VALUE`.  Though with compact strings this 
>> reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should 
>> therefore define UTF8 lengths as `size_t` to accommodate all possible 
>> representations. Higher-level API's can still use `int` if they know the 
>> strings (eg symbols) are sufficiently constrained in length.  See the 
>> comments in utf8.hpp that explain Strings, compact strings and the encoding.
>> 
>> As the existing JNI `GetStringUTFLength` still requires the current 
>> truncating behaviour of ` UNICODE::utf8_length` we add back 
>> `UNICODE::utf8_length_as_int` for it to use.
>> 
>> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& 
>> length)` use `length` as an IN/OUT parameter: it is the incoming (int) 
>> length of the jbyte/jchar array, and the outgoing (size_t) length of the 
>> UTF8 sequence. This makes some of the call sites a little messy with casts.
>> 
>> Testing:
>>  - tiers 1-4
>>  - GHA
>
> David Holmes has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   more missing casts

src/hotspot/share/classfile/javaClasses.cpp line 350:

> 348:   // This check is too strict when the input string is not a valid UTF8.
> 349:   // For example, it may be created with arbitrary content via 
> jni_NewStringUTF.
> 350:   if (UTF8::is_legal_utf8((const unsigned char*)utf8_str, 
> strlen(utf8_str), false)) {

Most of the time we use `is_legal_utf8`, we have a char* and have to cast it to 
unsigned char*.  How about adding an inlined overload for is_legal_utf8(const 
char*, size_t) for conenience?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1731997319


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread Dean Long
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes  wrote:

>> This work has been split out from JDK-8328877: [JNI] The JNI Specification 
>> needs to address the limitations of integer UTF-8 String lengths
>> 
>> The modified UTF-8 format used by the VM can require up to six bytes to 
>> represent one unicode character, but six byte characters are stored as 
>> UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the 
>> maximum length is 3*`Integer.MAX_VALUE`.  Though with compact strings this 
>> reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should 
>> therefore define UTF8 lengths as `size_t` to accommodate all possible 
>> representations. Higher-level API's can still use `int` if they know the 
>> strings (eg symbols) are sufficiently constrained in length.  See the 
>> comments in utf8.hpp that explain Strings, compact strings and the encoding.
>> 
>> As the existing JNI `GetStringUTFLength` still requires the current 
>> truncating behaviour of ` UNICODE::utf8_length` we add back 
>> `UNICODE::utf8_length_as_int` for it to use.
>> 
>> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& 
>> length)` use `length` as an IN/OUT parameter: it is the incoming (int) 
>> length of the jbyte/jchar array, and the outgoing (size_t) length of the 
>> UTF8 sequence. This makes some of the call sites a little messy with casts.
>> 
>> Testing:
>>  - tiers 1-4
>>  - GHA
>
> David Holmes has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   more missing casts

src/hotspot/share/classfile/javaClasses.cpp line 555:

> 553:   bool  is_latin1 = java_lang_String::is_latin1(java_string);
> 554: 
> 555:   if (length == 0) return nullptr;

Should this be checking for length <= 0?  It looks like length can indeed be 
negative if UTF8::unicode_length() tries to return the length of a utf8 string 
with length > 0x7fff.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732050412


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread Dean Long
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes  wrote:

>> This work has been split out from JDK-8328877: [JNI] The JNI Specification 
>> needs to address the limitations of integer UTF-8 String lengths
>> 
>> The modified UTF-8 format used by the VM can require up to six bytes to 
>> represent one unicode character, but six byte characters are stored as 
>> UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the 
>> maximum length is 3*`Integer.MAX_VALUE`.  Though with compact strings this 
>> reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should 
>> therefore define UTF8 lengths as `size_t` to accommodate all possible 
>> representations. Higher-level API's can still use `int` if they know the 
>> strings (eg symbols) are sufficiently constrained in length.  See the 
>> comments in utf8.hpp that explain Strings, compact strings and the encoding.
>> 
>> As the existing JNI `GetStringUTFLength` still requires the current 
>> truncating behaviour of ` UNICODE::utf8_length` we add back 
>> `UNICODE::utf8_length_as_int` for it to use.
>> 
>> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& 
>> length)` use `length` as an IN/OUT parameter: it is the incoming (int) 
>> length of the jbyte/jchar array, and the outgoing (size_t) length of the 
>> UTF8 sequence. This makes some of the call sites a little messy with casts.
>> 
>> Testing:
>>  - tiers 1-4
>>  - GHA
>
> David Holmes has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   more missing casts

src/hotspot/share/classfile/javaClasses.cpp line 588:

> 586: size_t utf8_len = static_cast(length);
> 587: const char* base = UNICODE::as_utf8(position, utf8_len);
> 588: Symbol* sym = SymbolTable::new_symbol(base, 
> checked_cast(utf8_len));

With the current limitations of checked_cast(), we would also need to check if 
the result is negative on 32-bit platforms, because then size_t and int will be 
the same size, and checked_cast will never complain.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732062256


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-26 Thread Dean Long
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes  wrote:

>> This work has been split out from JDK-8328877: [JNI] The JNI Specification 
>> needs to address the limitations of integer UTF-8 String lengths
>> 
>> The modified UTF-8 format used by the VM can require up to six bytes to 
>> represent one unicode character, but six byte characters are stored as 
>> UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the 
>> maximum length is 3*`Integer.MAX_VALUE`.  Though with compact strings this 
>> reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should 
>> therefore define UTF8 lengths as `size_t` to accommodate all possible 
>> representations. Higher-level API's can still use `int` if they know the 
>> strings (eg symbols) are sufficiently constrained in length.  See the 
>> comments in utf8.hpp that explain Strings, compact strings and the encoding.
>> 
>> As the existing JNI `GetStringUTFLength` still requires the current 
>> truncating behaviour of ` UNICODE::utf8_length` we add back 
>> `UNICODE::utf8_length_as_int` for it to use.
>> 
>> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& 
>> length)` use `length` as an IN/OUT parameter: it is the incoming (int) 
>> length of the jbyte/jchar array, and the outgoing (size_t) length of the 
>> UTF8 sequence. This makes some of the call sites a little messy with casts.
>> 
>> Testing:
>>  - tiers 1-4
>>  - GHA
>
> David Holmes has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   more missing casts

src/hotspot/share/classfile/javaClasses.cpp line 633:

> 631: }
> 632: 
> 633: int java_lang_String::utf8_length_as_int(oop java_string, typeArrayOop 
> value) {

Why not call java_lang_String::utf8_length() here instead of duplicating the 
code?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732075517


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long
On Tue, 27 Aug 2024 07:07:11 GMT, David Holmes  wrote:

>> src/hotspot/share/classfile/javaClasses.cpp line 307:
>> 
>>> 305:   {
>>> 306: ResourceMark rm;
>>> 307: size_t utf8_len = static_cast(length);
>> 
>> I think there should be an assert that length is not negative, probably at 
>> the very beginning of this function.
>
> Why? As I explained to Coleen this is the length obtained from a Java array. 
> All of the existing code relies on length being >= 0 and doesn't assert that 
> anywhere.

If called from jni_NewString --> java_lang_String::create_oop_from_unicode, 
there's no Java String yet.  Yes, it has probably been like this forever.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732297010


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long
On Tue, 27 Aug 2024 07:09:33 GMT, David Holmes  wrote:

>> src/hotspot/share/classfile/javaClasses.cpp line 555:
>> 
>>> 553:   bool  is_latin1 = java_lang_String::is_latin1(java_string);
>>> 554: 
>>> 555:   if (length == 0) return nullptr;
>> 
>> Should this be checking for length <= 0?  It looks like length can indeed be 
>> negative if UTF8::unicode_length() tries to return the length of a utf8 
>> string with length > 0x7fff.
>
> ??? length is assigned on line 552 and again comes from the length of a Java 
> array.

OK.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732302004


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long
On Tue, 27 Aug 2024 07:20:27 GMT, David Holmes  wrote:

>> src/hotspot/share/classfile/javaClasses.cpp line 588:
>> 
>>> 586: size_t utf8_len = static_cast(length);
>>> 587: const char* base = UNICODE::as_utf8(position, utf8_len);
>>> 588: Symbol* sym = SymbolTable::new_symbol(base, 
>>> checked_cast(utf8_len));
>> 
>> With the current limitations of checked_cast(), we would also need to check 
>> if the result is negative on 32-bit platforms, because then size_t and int 
>> will be the same size, and checked_cast will never complain.
>
> I'm trying to reason if on 32-bit we could even create a large enough string 
> for this to be a problem? Once we have the giant string `as_utf8` will have 
> to allocate an array that is just as large if not larger. So for overflow to 
> be an issue we need a string of length INT_MAX - which is limited to 2GB and 
> then we have to allocate a resource array of 2GB as well. So we need to have 
> allocated 4GB which is our entire address space on 32-bit. So I don't think 
> we can ever hit a problem on 32-bit where the size_t utf8 length would 
> convert to a negative int.

I think the Java string would only need to be INT_MAX/3 in length, if all the 
characters require surrogate encoding.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732326074


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long
On Tue, 27 Aug 2024 07:23:14 GMT, David Holmes  wrote:

>> src/hotspot/share/classfile/javaClasses.cpp line 633:
>> 
>>> 631: }
>>> 632: 
>>> 633: int java_lang_String::utf8_length_as_int(oop java_string, typeArrayOop 
>>> value) {
>> 
>> Why not call java_lang_String::utf8_length() here instead of duplicating the 
>> code?
>
> Because we have to call `UNICODE::utf8_length_as_int` to get the proper 
> truncation - the code is not an exact duplicate.

OK, I missed that.  It's probably not worth it to refactor out the common code 
using C++ lambda expressions.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732331731


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes  wrote:

>> This work has been split out from JDK-8328877: [JNI] The JNI Specification 
>> needs to address the limitations of integer UTF-8 String lengths
>> 
>> The modified UTF-8 format used by the VM can require up to six bytes to 
>> represent one unicode character, but six byte characters are stored as 
>> UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the 
>> maximum length is 3*`Integer.MAX_VALUE`.  Though with compact strings this 
>> reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should 
>> therefore define UTF8 lengths as `size_t` to accommodate all possible 
>> representations. Higher-level API's can still use `int` if they know the 
>> strings (eg symbols) are sufficiently constrained in length.  See the 
>> comments in utf8.hpp that explain Strings, compact strings and the encoding.
>> 
>> As the existing JNI `GetStringUTFLength` still requires the current 
>> truncating behaviour of ` UNICODE::utf8_length` we add back 
>> `UNICODE::utf8_length_as_int` for it to use.
>> 
>> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& 
>> length)` use `length` as an IN/OUT parameter: it is the incoming (int) 
>> length of the jbyte/jchar array, and the outgoing (size_t) length of the 
>> UTF8 sequence. This makes some of the call sites a little messy with casts.
>> 
>> Testing:
>>  - tiers 1-4
>>  - GHA
>
> David Holmes has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   more missing casts

src/hotspot/share/utilities/utf8.cpp line 127:

> 125: prev = c;
> 126:   }
> 127:   return checked_cast(num_chars);

Ideally, this function would return size_t.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1732370827


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long
On Tue, 27 Aug 2024 12:10:36 GMT, David Holmes  wrote:

>> src/hotspot/share/utilities/utf8.cpp line 127:
>> 
>>> 125: prev = c;
>>> 126:   }
>>> 127:   return checked_cast(num_chars);
>> 
>> Ideally, this function would return size_t.
>
> Why? I think that would have a large flow on effect. And this length does fit 
> in an int.

The worse case is len == SIZE_MAX and therefore num_chars == SIZE_MAX, which 
won't fit in an int.  If we say this will never happen because current callers 
never use sizes bigger than int, that makes the code fragile against scenarios 
where a developer might add a new caller.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1733226733


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long
On Tue, 27 Aug 2024 21:21:01 GMT, David Holmes  wrote:

> If you try to accommodate arbitrary future use then every method in the VM 
> would need to enforce every single precondition and invariant it expects 
> "just in case" and that is not practical.

I'm basically arguing for Functional Testing here, or at least having some 
invariants the would allow functional testing.  It may seem impractical to 
retrofit existing code, but when we are changing the input from int to size_t, 
that seems like the perfect time to enforce the new invariants.  If we expect 
"len" to be <= INT_MAX instead of SIZE_MAX, something that is not obvious from 
its type, then why not check that with an assert or at least document it?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1733651059


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long
On Tue, 20 Aug 2024 04:09:04 GMT, David Holmes  wrote:

>> This work has been split out from JDK-8328877: [JNI] The JNI Specification 
>> needs to address the limitations of integer UTF-8 String lengths
>> 
>> The modified UTF-8 format used by the VM can require up to six bytes to 
>> represent one unicode character, but six byte characters are stored as 
>> UTF-16 surrogate pairs. Hence the most bytes per character is 3, and so the 
>> maximum length is 3*`Integer.MAX_VALUE`.  Though with compact strings this 
>> reduces to 2*`Integer.MAX_VALUE`. The low-level UTF8/UNICODE API should 
>> therefore define UTF8 lengths as `size_t` to accommodate all possible 
>> representations. Higher-level API's can still use `int` if they know the 
>> strings (eg symbols) are sufficiently constrained in length.  See the 
>> comments in utf8.hpp that explain Strings, compact strings and the encoding.
>> 
>> As the existing JNI `GetStringUTFLength` still requires the current 
>> truncating behaviour of ` UNICODE::utf8_length` we add back 
>> `UNICODE::utf8_length_as_int` for it to use.
>> 
>> Note that some API's, like ` UNICODE::as_utf8(const T* base, size_t& 
>> length)` use `length` as an IN/OUT parameter: it is the incoming (int) 
>> length of the jbyte/jchar array, and the outgoing (size_t) length of the 
>> UTF8 sequence. This makes some of the call sites a little messy with casts.
>> 
>> Testing:
>>  - tiers 1-4
>>  - GHA
>
> David Holmes has updated the pull request incrementally with one additional 
> commit since the last revision:
> 
>   more missing casts

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/20560#pullrequestreview-2265064454


Re: RFR: 8338257: UTF8 lengths should be size_t not int [v5]

2024-08-27 Thread Dean Long
On Wed, 28 Aug 2024 01:24:43 GMT, David Holmes  wrote:

>>> If you try to accommodate arbitrary future use then every method in the VM 
>>> would need to enforce every single precondition and invariant it expects 
>>> "just in case" and that is not practical.
>> 
>> I'm basically arguing for Functional Testing here, or at least having some 
>> invariants the would allow functional testing.  It may seem impractical to 
>> retrofit existing code, but when we are changing the input from int to 
>> size_t, that seems like the perfect time to enforce the new invariants.  If 
>> we expect "len" to be <= INT_MAX instead of SIZE_MAX, something that is not 
>> obvious from its type, then why not check that with an assert or at least 
>> document it?
>
> Note that I do already document the assumptions here in the general comment 
> in utf8.hpp:
> 
> There is an additional assumption/expectation that our UTF8 API's are never 
> dealing with
> invalid UTF8, and more generally that all UTF8 sequences could form valid 
> Strings.
> Consequently the Unicode length of a UTF8 sequence is assumed to always be 
> representable
> by an int. 
> 
> the check_cast is then the assert that verifies that.

OK, that's good enough for me.  Thanks,

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20560#discussion_r1733935851


Re: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2]

2024-08-29 Thread Dean Long
On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore  wrote:

>> Move JVM implementation access flags that are not specified by the classfile 
>> format into Klass so we can shrink AccessFlags to u2 in a future change.
>> 
>> Tested with tier1-7.
>> 
>> NOTE: there are arm, ppc and s390 changes to this that are just a guess.  
>> Also, graal changes.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Add in graal flags and a comment.

src/hotspot/cpu/s390/c1_MacroAssembler_s390.cpp line 76:

> 74: load_klass(tmp, Roop);
> 75: z_lb(tmp, Address(tmp, Klass::misc_flags_offset()));
> 76: testbit(tmp, exact_log2(KlassFlags::_misc_is_value_based_class));

Suggestion:

z_tm(Klass::misc_flags_offset(), tmp, 
KlassFlags::_misc_is_value_based_class);

or
Suggestion:

z_tm(Address(tmp, Klass::misc_flags_offset()), 
KlassFlags::_misc_is_value_based_class);

src/hotspot/cpu/s390/c1_Runtime1_s390.cpp line 447:

> 445: __ load_klass(klass, Z_ARG1);
> 446: __ z_lb(klass, Address(klass, Klass::misc_flags_offset()));
> 447: __ testbit(klass, exact_log2(KlassFlags::_misc_has_finalizer));

Use z_tm.  See above for example.

src/hotspot/cpu/s390/interp_masm_s390.cpp line 1011:

> 1009: load_klass(tmp, object);
> 1010: z_lb(tmp, Address(tmp, Klass::misc_flags_offset()));
> 1011: testbit(tmp, exact_log2(KlassFlags::_misc_is_value_based_class));

Use z_tm.

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 3511:

> 3509: load_klass(temp, oop);
> 3510: z_lb(temp, Address(temp, Klass::misc_flags_offset()));
> 3511: testbit(temp, exact_log2(KlassFlags::_misc_is_value_based_class));

Use z_tm.

src/hotspot/cpu/s390/macroAssembler_s390.cpp line 6157:

> 6155: load_klass(tmp1, obj);
> 6156: z_lb(tmp1, Address(temp, Klass::misc_flags_offset()));
> 6157: testbit(tmp1, exact_log2(KlassFlags::_misc_is_value_based_class));

Use z_tm.

src/hotspot/cpu/s390/templateTable_s390.cpp line 2325:

> 2323: __ load_klass(Rklass, Rthis);
> 2324: __ z_lb(Rklass, Address(Rklass, Klass::misc_flags_offset()));
> 2325: __ testbit(Rklass, exact_log2(KlassFlags::_misc_has_finalizer));

Use z_tm.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737266627
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737267281
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737268180
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737268350
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737269052
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737269173


Re: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2]

2024-08-29 Thread Dean Long
On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore  wrote:

>> Move JVM implementation access flags that are not specified by the classfile 
>> format into Klass so we can shrink AccessFlags to u2 in a future change.
>> 
>> Tested with tier1-7.
>> 
>> NOTE: there are arm, ppc and s390 changes to this that are just a guess.  
>> Also, graal changes.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Add in graal flags and a comment.

src/hotspot/cpu/x86/c1_MacroAssembler_x86.cpp line 62:

> 60: load_klass(hdr, obj, rscratch1);
> 61: movb(hdr, Address(hdr, Klass::misc_flags_offset()));
> 62: testl(hdr, KlassFlags::_misc_is_value_based_class);

Suggestion:

testb(Address(hdr, Klass::misc_flags_offset()), 
KlassFlags::_misc_is_value_based_class);

src/hotspot/cpu/x86/c1_Runtime1_x86.cpp line 1170:

> 1168: __ load_klass(t, rax, rscratch1);
> 1169: __ movb(t, Address(t, Klass::misc_flags_offset()));
> 1170: __ testl(t, KlassFlags::_misc_has_finalizer);

Use testb(Address, imm)

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 281:

> 279: load_klass(tmpReg, objReg, scrReg);
> 280: movb(tmpReg, Address(tmpReg, Klass::misc_flags_offset()));
> 281: testl(tmpReg, KlassFlags::_misc_is_value_based_class);

Use testb(Address, imm)

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 600:

> 598:   if (DiagnoseSyncOnValueBasedClasses != 0) {
> 599: load_klass(rax_reg, obj, t);
> 600: movb(rax_reg, Address(rax_reg, Klass::misc_flags_offset()));

Use testb(Address, imm)

src/hotspot/cpu/x86/interp_masm_x86.cpp line 1178:

> 1176: if (DiagnoseSyncOnValueBasedClasses != 0) {
> 1177:   load_klass(tmp_reg, obj_reg, rklass_decode_tmp);
> 1178:   movb(tmp_reg, Address(tmp_reg, Klass::misc_flags_offset()));

Use testb(Address, imm)

src/hotspot/cpu/x86/templateTable_x86.cpp line 2582:

> 2580: __ movptr(robj, aaddress(0));
> 2581: __ load_klass(rdi, robj, rscratch1);
> 2582: __ movb(rdi, Address(rdi, Klass::misc_flags_offset()));

Use testb(Address, imm)

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737276368
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737276956
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737277139
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737277271
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737277473
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737277597


Re: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2]

2024-08-29 Thread Dean Long
On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore  wrote:

>> Move JVM implementation access flags that are not specified by the classfile 
>> format into Klass so we can shrink AccessFlags to u2 in a future change.
>> 
>> Tested with tier1-7.
>> 
>> NOTE: there are arm, ppc and s390 changes to this that are just a guess.  
>> Also, graal changes.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Add in graal flags and a comment.

src/hotspot/share/ci/ciKlass.cpp line 233:

> 231: jint ciKlass::misc_flags() {
> 232:   assert(is_loaded(), "not loaded");
> 233:   GUARDED_VM_ENTRY(

To Compiler folks: I don't think the VM_ENTRY is necessary, but if it is, then 
we should consider entering VM mode once and caching/memoizing these immutable 
flag values in the ciKlass.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737286933


Re: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2]

2024-08-29 Thread Dean Long
On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore  wrote:

>> Move JVM implementation access flags that are not specified by the classfile 
>> format into Klass so we can shrink AccessFlags to u2 in a future change.
>> 
>> Tested with tier1-7.
>> 
>> NOTE: there are arm, ppc and s390 changes to this that are just a guess.  
>> Also, graal changes.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Add in graal flags and a comment.

src/hotspot/share/ci/ciKlass.cpp line 231:

> 229: // --
> 230: // ciKlass::misc_flags
> 231: jint ciKlass::misc_flags() {

Suggestion:

u1 ciKlass::misc_flags() {

I think this should match what misc_flags() returns.  Ideally, I think it 
should be a typedef like KlassFlags_t so we don't have to make a lot of changes 
if grows to u2.

src/hotspot/share/ci/ciKlass.hpp line 125:

> 123: 
> 124:   // Fetch Klass::misc_flags.
> 125:   jint   misc_flags();

Suggestion:

  KlassFlags_t   misc_flags();

src/hotspot/share/oops/klass.hpp line 436:

> 434: #endif
> 435:   static ByteSize bitmap_offset(){ return 
> byte_offset_of(Klass, _bitmap); }
> 436:   static ByteSize misc_flags_offset(){ return 
> byte_offset_of(Klass, _misc_flags); }

Suggestion:

  static ByteSize misc_flags_offset(){ return byte_offset_of(Klass, 
_misc_flags._flags); }

src/hotspot/share/oops/klassFlags.hpp line 56:

> 54:   // These flags are write-once before the class is published and then 
> read-only
> 55:   // so don't require atomic updates.
> 56:   u1 _flags;

Suggestion:

  typedef u1 KlassFlags_t;
  KlassFlags_t _flags;

Can we have a typedef so C++ code that doesn't care about the size doesn't need 
to change if we later make it u2 or u4?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737291678
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737299082
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737303285
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737294456


Re: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2]

2024-08-29 Thread Dean Long
On Thu, 29 Aug 2024 22:29:43 GMT, Dean Long  wrote:

>> Coleen Phillimore has updated the pull request incrementally with one 
>> additional commit since the last revision:
>> 
>>   Add in graal flags and a comment.
>
> src/hotspot/share/ci/ciKlass.cpp line 231:
> 
>> 229: // --
>> 230: // ciKlass::misc_flags
>> 231: jint ciKlass::misc_flags() {
> 
> Suggestion:
> 
> u1 ciKlass::misc_flags() {
> 
> I think this should match what misc_flags() returns.  Ideally, I think it 
> should be a typedef like KlassFlags_t so we don't have to make a lot of 
> changes if grows to u2.

Also, using the correct, narrowed type will help with -Wconversion warnings 
later, because u1 can be converted to both int and size_t without a cast.

> src/hotspot/share/oops/klass.hpp line 436:
> 
>> 434: #endif
>> 435:   static ByteSize bitmap_offset(){ return 
>> byte_offset_of(Klass, _bitmap); }
>> 436:   static ByteSize misc_flags_offset(){ return 
>> byte_offset_of(Klass, _misc_flags); }
> 
> Suggestion:
> 
>   static ByteSize misc_flags_offset(){ return 
> byte_offset_of(Klass, _misc_flags._flags); }

We probably shouldn't assume the _flags field starts at offset 0.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737296853
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737306718


Re: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2]

2024-08-29 Thread Dean Long
On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore  wrote:

>> Move JVM implementation access flags that are not specified by the classfile 
>> format into Klass so we can shrink AccessFlags to u2 in a future change.
>> 
>> Tested with tier1-7.
>> 
>> NOTE: there are arm, ppc and s390 changes to this that are just a guess.  
>> Also, graal changes.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Add in graal flags and a comment.

src/hotspot/share/classfile/classFileParser.cpp line 5176:

> 5174:   
> ik->set_declares_nonstatic_concrete_methods(_declares_nonstatic_concrete_methods);
> 5175: 
> 5176:   assert(!_is_hidden || ik->is_hidden(), "must be set already");

Is this an optimization independent of the current change?

src/hotspot/share/opto/library_call.cpp line 3774:

> 3772: 
> 3773: // Use this for testing if Klass is_hidden, has_finalizer, and 
> is_cloneable_fast.
> 3774: Node* LibraryCallKit::generate_misc_flags_guard(Node* kls, int 
> modifier_mask, int modifier_bits, RegionNode* region) {

It looks like we could refactor generate_misc_flags_guard and 
generate_access_flags_guard with a common generate_klass_accessor_guard that 
takes the offset and type as parameters.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737314136
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737318524


Re: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v2]

2024-08-29 Thread Dean Long
On Thu, 29 Aug 2024 18:50:42 GMT, Coleen Phillimore  wrote:

>> Move JVM implementation access flags that are not specified by the classfile 
>> format into Klass so we can shrink AccessFlags to u2 in a future change.
>> 
>> Tested with tier1-7.
>> 
>> NOTE: there are arm, ppc and s390 changes to this that are just a guess.  
>> Also, graal changes.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Add in graal flags and a comment.

src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java
 line 378:

> 376: HotSpotVMConfig config = config();
> 377: int miscFlags = UNSAFE.getByte(getKlassPointer() + 
> config.klassMiscFlagsOffset);
> 378: return (miscFlags & config().jvmAccHasFinalizer) != 0;

Suggestion:

return (miscFlags & config.jvmAccHasFinalizer) != 0;

src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java
 line 1117:

> 1115: HotSpotVMConfig config = config();
> 1116: int miscFlags = UNSAFE.getByte(getKlassPointer() + 
> config.klassMiscFlagsOffset);
> 1117: return (miscFlags & config().jvmAccIsCloneableFast) != 0;

Suggestion:

return (miscFlags & config.jvmAccIsCloneableFast) != 0;

src/jdk.internal.vm.ci/share/classes/jdk/vm/ci/hotspot/HotSpotResolvedObjectTypeImpl.java
 line 1118:

> 1116: int miscFlags = UNSAFE.getByte(getKlassPointer() + 
> config.klassMiscFlagsOffset);
> 1117: return (miscFlags & config().jvmAccIsCloneableFast) != 0;
> 1118: }

Maybe introduce getMiscFlags() helper like existing getAccessFlags()?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737327588
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737331700
PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1737330084


Re: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3]

2024-08-30 Thread Dean Long
On Fri, 30 Aug 2024 14:00:44 GMT, Coleen Phillimore  wrote:

>> Move JVM implementation access flags that are not specified by the classfile 
>> format into Klass so we can shrink AccessFlags to u2 in a future change.
>> 
>> Tested with tier1-7.
>> 
>> NOTE: there are arm, ppc and s390 changes to this that are just a guess.  
>> Also, graal changes.
>
> Coleen Phillimore has updated the pull request incrementally with three 
> additional commits since the last revision:
> 
>  - Fix jvmci code.
>  - Some C2 refactoring.
>  - Assembly corrections from Matias and Dean.

src/hotspot/share/opto/library_call.cpp line 3777:

> 3775:   Node* p = basic_plus_adr(kls, in_bytes(Klass::misc_flags_offset()));
> 3776:   Node* mods = make_load(nullptr, p, TypeInt::UBYTE, T_BOOLEAN, 
> MemNode::unordered);
> 3777:   return generate_mods_flags_guard(mods, modifier_mask, modifier_bits, 
> region);

Suggestion:

  return generate_mods_flags_guard(mods, modifier_mask, modifier_bits, region, 
Klass::misc_flags_offset(), TypeInt::UBYTE, T_BOOLEAN);

This looks much better, but can't you leave the basic_plus_adr and make_load in 
generate_mods_flags_guard, and pass in the needed specialization?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1739380935


Re: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v3]

2024-08-30 Thread Dean Long
On Fri, 30 Aug 2024 21:35:49 GMT, Coleen Phillimore  wrote:

>> src/hotspot/share/opto/library_call.cpp line 3777:
>> 
>>> 3775:   Node* p = basic_plus_adr(kls, in_bytes(Klass::misc_flags_offset()));
>>> 3776:   Node* mods = make_load(nullptr, p, TypeInt::UBYTE, T_BOOLEAN, 
>>> MemNode::unordered);
>>> 3777:   return generate_mods_flags_guard(mods, modifier_mask, 
>>> modifier_bits, region);
>> 
>> Suggestion:
>> 
>>   return generate_mods_flags_guard(mods, modifier_mask, modifier_bits, 
>> region, Klass::misc_flags_offset(), TypeInt::UBYTE, T_BOOLEAN);
>> 
>> This looks much better, but can't you leave the basic_plus_adr and make_load 
>> in generate_mods_flags_guard, and pass in the needed specialization?
>
> Really, this is better? it adds three parameters.  I made this change.

It reduces duplicate code, which is usually good.  Yes, I like it better.

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20719#discussion_r1739475163


Re: RFR: 8339112: Move JVM Klass flags out of AccessFlags [v5]

2024-08-30 Thread Dean Long
On Fri, 30 Aug 2024 21:51:49 GMT, Coleen Phillimore  wrote:

>> Move JVM implementation access flags that are not specified by the classfile 
>> format into Klass so we can shrink AccessFlags to u2 in a future change.
>> 
>> Tested with tier1-7.
>> 
>> NOTE: there are arm, ppc and s390 changes to this that are just a guess.  
>> Also, graal changes.
>
> Coleen Phillimore has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Add parameters and rename generate_klass_flags_guard.

Marked as reviewed by dlong (Reviewer).

-

PR Review: https://git.openjdk.org/jdk/pull/20719#pullrequestreview-2273596650


Re: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling

2024-09-11 Thread Dean Long
On Fri, 6 Sep 2024 13:12:29 GMT, Coleen Phillimore  wrote:

>> This patch cleans up the use of `get_new_method()` so callers don't have to 
>> worry about throwing `NoSuchMethodError`. The method is refactored to throw 
>> the error and avoid ever returning nullptr. Verified with tier1-5 tests.
>
> What should this return for a deleted method?
> 
> // --
> // ciMethod::equals
> //
> // Returns true if the methods are the same, taking redefined methods
> // into account.
> bool ciMethod::equals(const ciMethod* m) const {
>   if (this == m) return true;
>   VM_ENTRY_MARK;
>   Method* m1 = this->get_Method();
>   Method* m2 = m->get_Method();
>   if (m1->is_old()) m1 = m1->get_new_method();
>   if (m2->is_old()) m2 = m2->get_new_method();
>   return m1 != Universe::no_such_method_error() && m1 == m2;  // ???
> }

@coleenp , I think it is enough for ciMethod::equals() to simply compare the 
values of orig_method_idnum() and not deal with old/deleted methods directly, 
but the last time I checked what orig_method_idnum() really meant I got 
confused by the idnum renumbering, so I wasn't able to convince myself that 
using orig_method_idnum() for comparison was correct.

-

PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2345200106


Re: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling

2024-09-11 Thread Dean Long
On Fri, 6 Sep 2024 12:51:38 GMT, Coleen Phillimore  wrote:

> We do the same thing with illegal_access_error() where the arguments may not 
> match and there's a special case for this and no_such_method_error() in 
> dependencies. Are the compilers confused by this too?

If the compiler used the value from the itable, I think it could get confused.  
There are places in deoptimization for example, that use the callee signature 
instead of the call-site signature, to deal with signature mismatch caused by 
the invokedynamic appendix.

-

PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2345203264


Re: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v2]

2024-09-11 Thread Dean Long
On Tue, 10 Sep 2024 06:50:13 GMT, David Holmes  wrote:

>> So that implies that you trust my reading of this code.  It is complicated 
>> enough that testing both seems like a safe thing to do and somewhat 
>> clarifying, or else adding an assert like:
>> 
>> assert(new_method != nullptr || old_method->is_deleted(), "this is the 
>> only way this happens");
>> return new_method == nullptr ? nsme : new_method;
>
> The assert works for me.

Can we assert the stronger statement: `(new_method == nullptr) == 
(old_method->is_deleted())` ?

-

PR Review Comment: https://git.openjdk.org/jdk/pull/20874#discussion_r1756060487


Re: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4]

2024-09-11 Thread Dean Long
On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva  
wrote:

>> This patch cleans up the use of `get_new_method()` so callers don't have to 
>> worry about throwing `NoSuchMethodError`. The method is refactored to throw 
>> the error and avoid ever returning nullptr. Verified with tier1-5 tests.
>
> Matias Saavedra Silva has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Coleen suggestion

I think using a callee with the wrong signature could cause problems in other 
places, not just the compiler.  Doesn't GC oopmap scanning depend on the 
signature of the callee method?  And it might seem harmless if those "extra" 
arguments are not scanned, I believe there is a JVMTI API that can query those 
values.

-

PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2345216935


Re: RFR: 8338471: Refactor Method::get_new_method() for better NoSuchMethodError handling [v4]

2024-09-11 Thread Dean Long
On Wed, 11 Sep 2024 21:02:41 GMT, Matias Saavedra Silva  
wrote:

>> This patch cleans up the use of `get_new_method()` so callers don't have to 
>> worry about throwing `NoSuchMethodError`. The method is refactored to throw 
>> the error and avoid ever returning nullptr. Verified with tier1-5 tests.
>
> Matias Saavedra Silva has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Coleen suggestion

I think the JVMTI API I was thinking of was PopFrame, which says:
" the operand stack is restored--the argument values are added back and if the 
invoke was not invokestatic, objectref is added back as well "
" Note however, that any changes to the arguments, which occurred in the called 
method, remain; when execution continues, the first instruction to execute will 
be the invoke."

-

PR Comment: https://git.openjdk.org/jdk/pull/20874#issuecomment-2345219547


  1   2   3   >