Re: RFR: 8334777: Test javax/management/remote/mandatory/notif/NotifReconnectDeadlockTest.java failed with NullPointerException [v2]
On Fri, 28 Jun 2024 18:14:59 GMT, Kevin Walls wrote: >> Disable running this test with -Xcomp. >> >> The NPE seen in this test is due to a timeout establishing the connection. >> ServerCommunicatorAdmin hits its timeout, during an addNotificationListener >> call on a new connection. >> >> -Xcomp causes this slowdown and the failure is reproducible. There is no >> need to test this test with -Xcomp, we should exclude it as we do for >> various other tests. > > Kevin Walls has updated the pull request incrementally with one additional > commit since the last revision: > > comment Okay - Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/19930#pullrequestreview-2165445432
Re: RFR: 8334777: Test javax/management/remote/mandatory/notif/NotifReconnectDeadlockTest.java failed with NullPointerException [v2]
On Fri, 28 Jun 2024 18:14:59 GMT, Kevin Walls wrote: >> Disable running this test with -Xcomp. >> >> The NPE seen in this test is due to a timeout establishing the connection. >> ServerCommunicatorAdmin hits its timeout, during an addNotificationListener >> call on a new connection. >> >> -Xcomp causes this slowdown and the failure is reproducible. There is no >> need to test this test with -Xcomp, we should exclude it as we do for >> various other tests. > > Kevin Walls has updated the pull request incrementally with one additional > commit since the last revision: > > comment Thanks for the reviews! - PR Comment: https://git.openjdk.org/jdk/pull/19930#issuecomment-2216919357
Integrated: 8334777: Test javax/management/remote/mandatory/notif/NotifReconnectDeadlockTest.java failed with NullPointerException
On Thu, 27 Jun 2024 15:53:09 GMT, Kevin Walls wrote: > Disable running this test with -Xcomp. > > The NPE seen in this test is due to a timeout establishing the connection. > ServerCommunicatorAdmin hits its timeout, during an addNotificationListener > call on a new connection. > > -Xcomp causes this slowdown and the failure is reproducible. There is no > need to test this test with -Xcomp, we should exclude it as we do for various > other tests. This pull request has now been integrated. Changeset: 2a296475 Author:Kevin Walls URL: https://git.openjdk.org/jdk/commit/2a2964759c73b3b9ab6afaad109383c89952977b Stats: 4 lines in 1 file changed: 3 ins; 0 del; 1 mod 8334777: Test javax/management/remote/mandatory/notif/NotifReconnectDeadlockTest.java failed with NullPointerException Reviewed-by: cjplummer, dholmes - PR: https://git.openjdk.org/jdk/pull/19930
RFR: 8335966: Remove incorrect problem listing of java/lang/instrument/NativeMethodPrefixAgent.java in ProblemList-Virtual.txt
Can I please get a review of this change which removes an incorrect entry from `ProblemList-Virtual.txt`? As noted in https://bugs.openjdk.org/browse/JDK-8335966 this entry is a left over which went unnoticed when we did the changes in https://bugs.openjdk.org/browse/JDK-8333130. The refactored test `NativeMethodPrefixApp` has been functioning without the errors for which `NativeMethodPrefixAgent` was previously problem listed. So I haven't replaced that problem list entry and instead have just removed it. `NativeMethodPrefixApp` is known to run into an intermittent timeout with virtual threads and that's tracked in https://bugs.openjdk.org/browse/JDK-8334167 and I think it's not problem listing worthy given the relatively low frequency of such timeouts. Furthermore, in the coming days, I plan to propose a change to this `NativeMethodPrefixApp` test which should fix the timeout failures. To verify that this removal doesn't cause any unexpected issues, I've triggered a CI run of the relevant tier. - Commit messages: - 8335966: Remove incorrect problem listing of java/lang/instrument/NativeMethodPrefixAgent.java in ProblemList-Virtual.txt Changes: https://git.openjdk.org/jdk/pull/20094/files Webrev: https://webrevs.openjdk.org/?repo=jdk&pr=20094&range=00 Issue: https://bugs.openjdk.org/browse/JDK-8335966 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod Patch: https://git.openjdk.org/jdk/pull/20094.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20094/head:pull/20094 PR: https://git.openjdk.org/jdk/pull/20094
Re: RFR: 8335882: platform/cgroup/TestSystemSettings.java fails on Alpine Linux
On Mon, 8 Jul 2024 14:19:21 GMT, Severin Gehwolf wrote: > Please review this simple test fix to exclude the test from being run on an > Alpine Linux system. Apparently default Alpine Linux systems don't have > cgroups set up by default the way other distros do. More info on the bug. I > propose to not run the test on musl systems. Marked as reviewed by mbaesken (Reviewer). The issue is gone in our tests, with your patch added. - PR Review: https://git.openjdk.org/jdk/pull/20076#pullrequestreview-2165630718 PR Comment: https://git.openjdk.org/jdk/pull/20076#issuecomment-2217120494
Re: RFR: 8072701: resume001 failed due to ERROR: timeout for waiting for a BreakpintEvent
On Tue, 9 Jul 2024 04:43:59 GMT, Chris Plummer wrote: > The test hits a breakpoint on thread2 with SUSPEND_EVENT_THREAD policy, so > only thread2 is suspended. It then does a vm.suspend(), which suspends all > threads and bumps the suspendCount of thread2 up to 2. It then does an > eventSet.resume(), which decrements the thread2 suspendCount to 1, so now all > threads are suspended with a suspendCount of 1. thread2 is then resumed and > we expect to hit the next thread2 breakpoint. The problem is that thread2 > can't hit the breakpoint until the main thread has proceeded far enough, and > if the vm.suspend() that suspended the main thread happens too quickly, it > won't have proceeded far enough, so thread2 never hits the breakpoint. > > Essentially we need a vm.resume() to allow the main thread to run, but we > need to do it in a way that does nullify part of what the test is testing > for. So in order to allow the vm.resume() but not subvert the intent of the > test, we first do a thread2.suspend() so the vm.resume() won't resume thread2. > > Testing in progress: tier1 and tier5 svc testing Read through and counted suspend/resumes on my fingers, seems good. - Marked as reviewed by kevinw (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20088#pullrequestreview-2165680314
Re: RFR: 8335966: Remove incorrect problem listing of java/lang/instrument/NativeMethodPrefixAgent.java in ProblemList-Virtual.txt
On Tue, 9 Jul 2024 08:39:39 GMT, Jaikiran Pai wrote: > Can I please get a review of this change which removes an incorrect entry > from `ProblemList-Virtual.txt`? > > As noted in https://bugs.openjdk.org/browse/JDK-8335966 this entry is a left > over which went unnoticed when we did the changes in > https://bugs.openjdk.org/browse/JDK-8333130. > > The refactored test `NativeMethodPrefixApp` has been functioning without the > errors for which `NativeMethodPrefixAgent` was previously problem listed. So > I haven't replaced that problem list entry and instead have just removed it. > > `NativeMethodPrefixApp` is known to run into an intermittent timeout with > virtual threads and that's tracked in > https://bugs.openjdk.org/browse/JDK-8334167 and I think it's not problem > listing worthy given the relatively low frequency of such timeouts. > Furthermore, in the coming days, I plan to propose a change to this > `NativeMethodPrefixApp` test which should fix the timeout failures. > > To verify that this removal doesn't cause any unexpected issues, I've > triggered a CI run of the relevant tier. looks good - Marked as reviewed by kevinw (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20094#pullrequestreview-2165691687
Integrated: 8335882: platform/cgroup/TestSystemSettings.java fails on Alpine Linux
On Mon, 8 Jul 2024 14:19:21 GMT, Severin Gehwolf wrote: > Please review this simple test fix to exclude the test from being run on an > Alpine Linux system. Apparently default Alpine Linux systems don't have > cgroups set up by default the way other distros do. More info on the bug. I > propose to not run the test on musl systems. This pull request has now been integrated. Changeset: f3ff4f74 Author:Severin Gehwolf URL: https://git.openjdk.org/jdk/commit/f3ff4f7427c3c3f5cb2a115a61462bb9d28de1cd Stats: 1 line in 1 file changed: 0 ins; 0 del; 1 mod 8335882: platform/cgroup/TestSystemSettings.java fails on Alpine Linux Reviewed-by: stuefe, mbaesken - PR: https://git.openjdk.org/jdk/pull/20076
Re: RFR: 8335882: platform/cgroup/TestSystemSettings.java fails on Alpine Linux
On Mon, 8 Jul 2024 14:19:21 GMT, Severin Gehwolf wrote: > Please review this simple test fix to exclude the test from being run on an > Alpine Linux system. Apparently default Alpine Linux systems don't have > cgroups set up by default the way other distros do. More info on the bug. I > propose to not run the test on musl systems. Thanks for the reviews! The docs build failure in GHA is infra related: Run # If runner architecture is x64 set JAVA_HOME_17_X64 otherwise set to JAVA_HOME_17_arm64 [build.sh][INFO] CYGWIN_OR_MSYS=0 [build.sh][INFO] JAVA_HOME: /usr/lib/jvm/temurin-17-jdk-amd64 [build.sh][INFO] Downloading https://archive.apache.org/dist/ant/binaries/apache-ant-1.10.8-bin.zip to /home/runner/work/jdk/jdk/jtreg/src/make/../build/deps/apache-ant-1.10.8-bin.zip Error: sh][ERROR] wget exited with exit code 4 Error: Process completed with exit code 1. - PR Comment: https://git.openjdk.org/jdk/pull/20076#issuecomment-2217257118
Re: RFR: 8335710: serviceability/dcmd/vm/SystemDumpMapTest.java and SystemMapTest.java fail on Linux Alpine after 8322475
On Mon, 8 Jul 2024 11:06:45 GMT, Matthias Baesken wrote: > Unfortunately those 2 tests fail now on Linux Alpine (x86_64) : > serviceability/dcmd/vm/SystemDumpMapTest.java > > Missing patterns in dump: > 0x\\p{XDigit}+-0x\\p{XDigit}+ +\\d+ +[rwsxp-]+ +\\d+ +\\d+ > +(4K|8K|16K|64K|2M|16M|64M) +com.*\[vdso\] > test SystemDumpMapTest.jmx(): failure > java.lang.RuntimeException: java.lang.RuntimeException: Missing patterns > at SystemDumpMapTest.run_test(SystemDumpMapTest.java:100) > at SystemDumpMapTest.run(SystemDumpMapTest.java:106) > at SystemDumpMapTest.jmx(SystemDumpMapTest.java:112) > at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at > org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132) > at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599) > at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174) > at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46) > at > org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822) > at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147) > at > org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146) > at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1597) > > > Reason is that the vdso lib is not there on all Linux flavors; e.g. we do not > seem to have it on our Alpine Linux test system. > So better avoid the check because it is based on false assumptions. Looks good. - Marked as reviewed by lucy (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20072#pullrequestreview-2166005745
Re: RFR: 8335619: Add an @apiNote to j.l.i.ClassFileTransformer to warn about recursive class loading and ClassCircularityErrors [v2]
On Fri, 5 Jul 2024 07:28:13 GMT, Volker Simonis wrote: > @AlanBateman would you mind having a quick look at this trivial, > documentation-only PR and let me know if you're OK with it? (Catching up as I was away for a few days). I agree it would be useful to have an API note on this topic. There were several reports of deadlocks, ClassCircularityError, and other hard to diagnose issues when support for instrumentation was originally introduced in JDK 5. I think agent maintainers learned to exclude many of the core classes in java.base but that is always a moving target and there have been a few more sightings/reports recently (maybe due to newer features doing more in Java rather than in the VM). I agree that the ClassFileTransformer class description is the best place for this. My initial reaction was to wonder about the j.l.instrument package description but it is more focused on starting agents and the JAR file attributes so I think your proposed location is best.. As regards the text then it might be better to start with a sentence to say that the "transform" method may be called from critical points in JDK core classes or called to transform JDK core classes. You've got some of this in second sentence but I think better to lead with it before revealing one of the several possible things that can happen. I think part of the lead in needs to say "great care must be taken" or something like that to get across the point that the ClassFileTransformer implementor has the potential to cause many possible issues. For the same class and linkage error then probably best to not use the phrase "transitively requires" as it's too close "requires transitive" used in module declarations. Note that the `@jvms` tag can be used to reference the JVMS section. You'll see a few examples in other classes. - PR Comment: https://git.openjdk.org/jdk/pull/20011#issuecomment-2217495130
Re: RFR: 8335710: serviceability/dcmd/vm/SystemDumpMapTest.java and SystemMapTest.java fail on Linux Alpine after 8322475
On Mon, 8 Jul 2024 11:06:45 GMT, Matthias Baesken wrote: > Unfortunately those 2 tests fail now on Linux Alpine (x86_64) : > serviceability/dcmd/vm/SystemDumpMapTest.java > > Missing patterns in dump: > 0x\\p{XDigit}+-0x\\p{XDigit}+ +\\d+ +[rwsxp-]+ +\\d+ +\\d+ > +(4K|8K|16K|64K|2M|16M|64M) +com.*\[vdso\] > test SystemDumpMapTest.jmx(): failure > java.lang.RuntimeException: java.lang.RuntimeException: Missing patterns > at SystemDumpMapTest.run_test(SystemDumpMapTest.java:100) > at SystemDumpMapTest.run(SystemDumpMapTest.java:106) > at SystemDumpMapTest.jmx(SystemDumpMapTest.java:112) > at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at > org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132) > at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599) > at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174) > at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46) > at > org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822) > at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147) > at > org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146) > at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1597) > > > Reason is that the vdso lib is not there on all Linux flavors; e.g. we do not > seem to have it on our Alpine Linux test system. > So better avoid the check because it is based on false assumptions. Wait, didn't I fix this already? - PR Comment: https://git.openjdk.org/jdk/pull/20072#issuecomment-2217528047
Re: RFR: 8335710: serviceability/dcmd/vm/SystemDumpMapTest.java and SystemMapTest.java fail on Linux Alpine after 8322475
On Tue, 9 Jul 2024 12:26:01 GMT, Thomas Stuefe wrote: > Wait, didn't I fix this already? Weird, I could have sworn I fixed that. Ah well. - PR Comment: https://git.openjdk.org/jdk/pull/20072#issuecomment-2217557748
Re: RFR: 8335710: serviceability/dcmd/vm/SystemDumpMapTest.java and SystemMapTest.java fail on Linux Alpine after 8322475
On Mon, 8 Jul 2024 11:06:45 GMT, Matthias Baesken wrote: > Unfortunately those 2 tests fail now on Linux Alpine (x86_64) : > serviceability/dcmd/vm/SystemDumpMapTest.java > > Missing patterns in dump: > 0x\\p{XDigit}+-0x\\p{XDigit}+ +\\d+ +[rwsxp-]+ +\\d+ +\\d+ > +(4K|8K|16K|64K|2M|16M|64M) +com.*\[vdso\] > test SystemDumpMapTest.jmx(): failure > java.lang.RuntimeException: java.lang.RuntimeException: Missing patterns > at SystemDumpMapTest.run_test(SystemDumpMapTest.java:100) > at SystemDumpMapTest.run(SystemDumpMapTest.java:106) > at SystemDumpMapTest.jmx(SystemDumpMapTest.java:112) > at > java.base/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) > at java.base/java.lang.reflect.Method.invoke(Method.java:580) > at > org.testng.internal.MethodInvocationHelper.invokeMethod(MethodInvocationHelper.java:132) > at org.testng.internal.TestInvoker.invokeMethod(TestInvoker.java:599) > at org.testng.internal.TestInvoker.invokeTestMethod(TestInvoker.java:174) > at org.testng.internal.MethodRunner.runInSequence(MethodRunner.java:46) > at > org.testng.internal.TestInvoker$MethodInvocationAgent.invoke(TestInvoker.java:822) > at org.testng.internal.TestInvoker.invokeTestMethods(TestInvoker.java:147) > at > org.testng.internal.TestMethodWorker.invokeTestMethods(TestMethodWorker.java:146) > at org.testng.internal.TestMethodWorker.run(TestMethodWorker.java:128) > at java.base/java.util.ArrayList.forEach(ArrayList.java:1597) > > > Reason is that the vdso lib is not there on all Linux flavors; e.g. we do not > seem to have it on our Alpine Linux test system. > So better avoid the check because it is based on false assumptions. I dislike having no test that checks that standard OS-side segments are shown. Does Alpine have a [heap] segment at least? Could scan for that. - PR Review: https://git.openjdk.org/jdk/pull/20072#pullrequestreview-2166183789
Re: RFR: 8335710: serviceability/dcmd/vm/SystemDumpMapTest.java and SystemMapTest.java fail on Linux Alpine after 8322475
On Tue, 9 Jul 2024 12:32:06 GMT, Thomas Stuefe wrote: > I dislike having no test that checks that standard OS-side segments are shown. > > Does Alpine have a [heap] segment at least? Could scan for that. Checked and Alpine processes have a `[heap]` segment. Could you try that? - PR Comment: https://git.openjdk.org/jdk/pull/20072#issuecomment-2217648113
Re: RFR: 8335688: Fix -Wzero-as-null-pointer-constant warnings from fflush calls in jvmti tests
On Thu, 4 Jul 2024 12:15:09 GMT, Kim Barrett wrote: > Please review this change to some jvmti tests, which were calling fflush with > an argument of 0. Most of these are in C++ code, where we change them to use > nullptr as the argument. A couple are in C, where we change them to use NULL. > This removes a bunch of -Wzero-as-null-pointer-constant when building with > that enabled. > > Testing: mach5 tier1 Looks good. - Marked as reviewed by coleenp (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20032#pullrequestreview-2166246812
Re: RFR: 8335688: Fix -Wzero-as-null-pointer-constant warnings from fflush calls in jvmti tests
On Fri, 5 Jul 2024 07:51:08 GMT, Julian Waters wrote: >> Please review this change to some jvmti tests, which were calling fflush with >> an argument of 0. Most of these are in C++ code, where we change them to use >> nullptr as the argument. A couple are in C, where we change them to use >> NULL. >> This removes a bunch of -Wzero-as-null-pointer-constant when building with >> that enabled. >> >> Testing: mach5 tier1 > > Marked as reviewed by jwaters (Committer). Thanks for reviews @TheShermanTanker and @coleenp - PR Comment: https://git.openjdk.org/jdk/pull/20032#issuecomment-2217703027
Integrated: 8335688: Fix -Wzero-as-null-pointer-constant warnings from fflush calls in jvmti tests
On Thu, 4 Jul 2024 12:15:09 GMT, Kim Barrett wrote: > Please review this change to some jvmti tests, which were calling fflush with > an argument of 0. Most of these are in C++ code, where we change them to use > nullptr as the argument. A couple are in C, where we change them to use NULL. > This removes a bunch of -Wzero-as-null-pointer-constant when building with > that enabled. > > Testing: mach5 tier1 This pull request has now been integrated. Changeset: 7e11fb70 Author:Kim Barrett URL: https://git.openjdk.org/jdk/commit/7e11fb702696df733ca89d325200f2e9414402d9 Stats: 175 lines in 25 files changed: 0 ins; 0 del; 175 mod 8335688: Fix -Wzero-as-null-pointer-constant warnings from fflush calls in jvmti tests Reviewed-by: jwaters, coleenp - PR: https://git.openjdk.org/jdk/pull/20032
Re: RFR: 8335619: Add an @apiNote to j.l.i.ClassFileTransformer to warn about recursive class loading and ClassCircularityErrors [v3]
> Since Java 5 the `java.lang.instrument` package provides services that allow > Java programming language agents to instrument (i.e. modify the bytecode) of > programs running on the Java Virtual Machine. The `java.lang.instrument` > functionality is based and implemented on top of the native Java Virtual > Machine Tool Interface (JVMTI) also introduced in Java 5. But because the > `java.lang.instrument` API is a pure Java API and uses Java classes to > instrument Java classes it imposes some usage restrictions which are not very > well documented in its API specification. > > E.g. the section on ["Bytecode > Instrumentation"](https://docs.oracle.com/en/java/javase/21/docs/specs/jvmti.html#bci) > in the JVMTI specification explicitly warns that special "*Care must be > taken to avoid perturbing dependencies, especially when instrumenting core > classes*". The risk of such "perturbing dependencies" is obviously much > higher in a Java API like `java.lang.instrument`, but a more detailed > explanation and warning is missing from its API documentation. > > The most evident class file transformation restriction is that while a class > A is being loaded and transformed it is not possible to use this same class > directly or transitively from the `ClassFileTransformer::transform()` method. > Violating this rule will result in a `ClassCircularityError` (the exact error > type is disputable as can be seen in [8164165: JVM throws incorrect exception > when ClassFileTransformer.transform() triggers class loading of class already > being loaded](https://bugs.openjdk.org/browse/JDK-8164165), but the result > would be a `LinkageError in any case). > > The risk to run into such a `ClassCircularityError` error increases with the > amount of code a transforming agent is transitively using from the > `transform()` method. Using popular libraries like ASM, ByteBuddy, etc. for > transformation further increases the probability of running into such issues, > especially if the agent aims to transform core JDK library classes. > > By default, the occurrence of a `ClassCircularityError` in > `ClassFileTransformer::transform()` will be handled gracefully with the only > consequence that the current transformation target will be loaded unmodified > (see `ClassFileTransformer` API spec: "*throwing an exception has the same > effect as returning null*"). But unfortunately, it can also have a subtle but > at the same time much more far-reaching consequence. If the > `ClassCircularityError` occurs during the resolution of a constant pool entry > in another, ... Volker Simonis has updated the pull request incrementally with one additional commit since the last revision: Addressed @AlanBateman's suggestions and updated copyright year - Changes: - all: https://git.openjdk.org/jdk/pull/20011/files - new: https://git.openjdk.org/jdk/pull/20011/files/080999e1..5fb85533 Webrevs: - full: https://webrevs.openjdk.org/?repo=jdk&pr=20011&range=02 - incr: https://webrevs.openjdk.org/?repo=jdk&pr=20011&range=01-02 Stats: 21 lines in 1 file changed: 6 ins; 2 del; 13 mod Patch: https://git.openjdk.org/jdk/pull/20011.diff Fetch: git fetch https://git.openjdk.org/jdk.git pull/20011/head:pull/20011 PR: https://git.openjdk.org/jdk/pull/20011
Re: RFR: 8335619: Add an @apiNote to j.l.i.ClassFileTransformer to warn about recursive class loading and ClassCircularityErrors [v2]
On Tue, 9 Jul 2024 12:17:25 GMT, Alan Bateman wrote: >> @AlanBateman would you mind having a quick look at this trivial, >> documentation-only PR and let me know if you're OK with it? >> >> Thank you and best regards, >> Volker > >> @AlanBateman would you mind having a quick look at this trivial, >> documentation-only PR and let me know if you're OK with it? > > (Catching up as I was away for a few days). > > I agree it would be useful to have an API note on this topic. There were > several reports of deadlocks, ClassCircularityError, and other hard to > diagnose issues when support for instrumentation was originally introduced in > JDK 5. I think agent maintainers learned to exclude many of the core classes > in java.base but that is always a moving target and there have been a few > more sightings/reports recently (maybe due to newer features doing more in > Java rather than in the VM). > > I agree that the ClassFileTransformer class description is the best place for > this. My initial reaction was to wonder about the j.l.instrument package > description but it is more focused on starting agents and the JAR file > attributes so I think your proposed location is best.. > > As regards the text then it might be better to start with a sentence to say > that the "transform" method may be called from critical points in JDK core > classes or called to transform JDK core classes. You've got some of this in > second sentence but I think better to lead with it before revealing one of > the several possible things that can happen. I think part of the lead in > needs to say "great care must be taken" or something like that to get across > the point that the ClassFileTransformer implementor has the potential to > cause many possible issues. For the same class and linkage error then > probably best to not use the phrase "transitively requires" as it's too close > "requires transitive" used in module declarations. > > Note that the `@jvms` tag can be used to reference the JVMS section. You'll > see a few examples in other classes. Thanks a lot for looking into this @AlanBateman. I've tried to integrate your suggestions into the new version of the PR. I already used the `@jvms` tag in the latest version of my PR based on @liach's suggestion. I also updated the other reference to the JVMS above the new `@apiNote` which didn't used the tag either (and was wrong because the JVMS sections have changed since Java 5 :) Please let me know if this sounds better now or if you see further room for improvement. - PR Comment: https://git.openjdk.org/jdk/pull/20011#issuecomment-2217856802
Re: RFR: 8335743: jhsdb jstack cannot print some information on the waiting thread [v2]
On Mon, 8 Jul 2024 12:16:51 GMT, KIRIYAMA Takuya wrote: >> This bug was introduced by JDK-8284161. "Object.wait (long timeoutMillis)" >> was changed to call "Object.wait0 (long timeoutMillis)" in JDK-8284161. >> >> When "jhdsb jstack" is executed, the stack and lock information are printed >> in "sun.jvm.hotspot.runtime.JavaVFrame.printLockInfo(PrintStream tty, int >> frameCount)". This method checks whether the method is java.lang.Object.wait >> (...) and then reports the waitstate only if the check passes. >> https://github.com/openjdk/jdk/blob/7bc8f9c150cbf457edf6144adba734ecd5ca5a0f/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/JavaVFrame.java#L103 >> >> >> if (getMethod().getName().asString().equals("wait") && >> >> getMethod().getMethodHolder().getName().asString().equals("java/lang/Object")) >> { >> >> >> However, after JDK-8284161, the waiting thread waits on >> "java.lang.Object.wait0 (long timeoutMillis)", so it no longer reports the >> waitstate. >> >> I changed "printLockInfo(PrintStream tty, int frameCount)" to check for >> "java.lang.Object.wait0 (...)". >> I confirmed that the lock information is correctly printed with this fix. >> I tested hotspot/jtreg/serviceability/sa/ and >> jdk/sun/tools/jhsdb/heapconfig/, and there were no regressions by this fix. >> >> Would anyone review this change, please? > > KIRIYAMA Takuya has updated the pull request incrementally with one > additional commit since the last revision: > > 8335743: jhsdb jstack cannot print some information on the waiting thread Marked as reviewed by kevinw (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/20049#pullrequestreview-2167070972
Re: RFR: 8072701: resume001 failed due to ERROR: timeout for waiting for a BreakpintEvent
On Tue, 9 Jul 2024 04:43:59 GMT, Chris Plummer wrote: > The test hits a breakpoint on thread2 with SUSPEND_EVENT_THREAD policy, so > only thread2 is suspended. It then does a vm.suspend(), which suspends all > threads and bumps the suspendCount of thread2 up to 2. It then does an > eventSet.resume(), which decrements the thread2 suspendCount to 1, so now all > threads are suspended with a suspendCount of 1. thread2 is then resumed and > we expect to hit the next thread2 breakpoint. The problem is that thread2 > can't hit the breakpoint until the main thread has proceeded far enough, and > if the vm.suspend() that suspended the main thread happens too quickly, it > won't have proceeded far enough, so thread2 never hits the breakpoint. > > Essentially we need a vm.resume() to allow the main thread to run, but we > need to do it in a way that does nullify part of what the test is testing > for. So in order to allow the vm.resume() but not subvert the intent of the > test, we first do a thread2.suspend() so the vm.resume() won't resume thread2. > > Testing in progress: tier1 and tier5 svc testing test/hotspot/jtreg/vmTestbase/nsk/jdi/ThreadReference/resume/resume001.java line 356: > 354: } > 355: > 356: // We need to resume the main thread because thread2 > might be blocked on it, This does not look correct to me. This is the last test scenario - thread2.resume should resumes the thread while vm is suspended. thread2 should not be blocked on main thread. Looking at the debuggee I suppose the blocking is possible during logging. I'd suggest to update the debugee and remove any logging between breakpoints 2 and 3 - PR Review Comment: https://git.openjdk.org/jdk/pull/20088#discussion_r1671037516
[jdk23] Integrated: 8335124: com/sun/management/ThreadMXBean/ThreadCpuTimeArray.java failed with CPU time out of expected range
On Wed, 3 Jul 2024 08:24:43 GMT, Kevin Walls wrote: > Clean backport to jdk23 of a simple test update, to try and avoid spurious > failures. This pull request has now been integrated. Changeset: 70ad622b Author:Kevin Walls URL: https://git.openjdk.org/jdk/commit/70ad622bc23fc5a1808466193fd6c7ea2f178ec9 Stats: 3 lines in 1 file changed: 2 ins; 1 del; 0 mod 8335124: com/sun/management/ThreadMXBean/ThreadCpuTimeArray.java failed with CPU time out of expected range Reviewed-by: cjplummer Backport-of: 79a3554e1da604627b3a010dc269c1bd914c79d3 - PR: https://git.openjdk.org/jdk/pull/1
Re: RFR: 8335966: Remove incorrect problem listing of java/lang/instrument/NativeMethodPrefixAgent.java in ProblemList-Virtual.txt
On Tue, 9 Jul 2024 08:39:39 GMT, Jaikiran Pai wrote: > Can I please get a review of this change which removes an incorrect entry > from `ProblemList-Virtual.txt`? > > As noted in https://bugs.openjdk.org/browse/JDK-8335966 this entry is a left > over which went unnoticed when we did the changes in > https://bugs.openjdk.org/browse/JDK-8333130. > > The refactored test `NativeMethodPrefixApp` has been functioning without the > errors for which `NativeMethodPrefixAgent` was previously problem listed. So > I haven't replaced that problem list entry and instead have just removed it. > > `NativeMethodPrefixApp` is known to run into an intermittent timeout with > virtual threads and that's tracked in > https://bugs.openjdk.org/browse/JDK-8334167 and I think it's not problem > listing worthy given the relatively low frequency of such timeouts. > Furthermore, in the coming days, I plan to propose a change to this > `NativeMethodPrefixApp` test which should fix the timeout failures. > > To verify that this removal doesn't cause any unexpected issues, I've > triggered a CI run of the relevant tier. Marked as reviewed by amenkov (Reviewer). - PR Review: https://git.openjdk.org/jdk/pull/20094#pullrequestreview-2167328097
Re: RFR: 8315884: New Object to ObjectMonitor mapping [v3]
On Mon, 8 Jul 2024 16:21:16 GMT, Axel Boldt-Christmas wrote: >> When inflating a monitor the `ObjectMonitor*` is written directly over the >> `markWord` and any overwritten data is displaced into a displaced >> `markWord`. This is problematic for concurrent GCs which needs extra care or >> looser semantics to use this displaced data. In Lilliput this data also >> contains the klass forcing this to be something that the GC has to take into >> account everywhere. >> >> This patch introduces an alternative solution where locking only uses the >> lock bits of the `markWord` and inflation does not override and displace the >> `markWord`. This is done by keeping associations between objects and >> `ObjectMonitor*` in an external hash table. Different caching techniques are >> used to speedup lookups from compiled code. >> >> A diagnostic VM option is introduced called `UseObjectMonitorTable`. It is >> only supported in combination with the LM_LIGHTWEIGHT locking mode (the >> default). >> >> This patch has been evaluated to be performance neutral when >> `UseObjectMonitorTable` is turned off (the default). >> >> Below is a more detailed explanation of this change and how `LM_LIGHTWEIGHT` >> and `UseObjectMonitorTable` works. >> >> # Cleanups >> >> Cleaned up displaced header usage for: >> * BasicLock >> * Contains some Zero changes >> * Renames one exported JVMCI field >> * ObjectMonitor >> * Updates comments and tests consistencies >> >> # Refactoring >> >> `ObjectMonitor::enter` has been refactored an a >> `ObjectMonitorContentionMark` witness object has been introduced to the >> signatures. Which signals that the contentions reference counter is being >> held. More details are given below in the section about deflation. >> >> The initial purpose of this was to allow `UseObjectMonitorTable` to interact >> more seamlessly with the `ObjectMonitor::enter` code. >> >> _There is even more `ObjectMonitor` refactoring which can be done here to >> create a more understandable and enforceable API. There are a handful of >> invariants / assumptions which are not always explicitly asserted which >> could be trivially abstracted and verified by the type system by using >> similar witness objects._ >> >> # LightweightSynchronizer >> >> Working on adapting and incorporating the following section as a comment in >> the source code >> >> ## Fast Locking >> >> CAS on locking bits in markWord. >> 0b00 (Fast Locked) <--> 0b01 (Unlocked) >> >> When locking and 0b00 (Fast Locked) is observed, it may be beneficial to >> avoid inflating by spinning a bit. >> >> If 0b10 (Inflated) is observed or there is to... > > Axel Boldt-Christmas has updated the pull request incrementally with two > additional commits since the last revision: > > - Add JVMCI symbol exports > - Revert "More graceful JVMCI VM option interaction" > >This reverts commit 2814350370cf142e130fe1d38610c646039f976d. This is really great work, Axel! I've been reading this code for a while, and have done one pass looking through the PR with a few comments. src/hotspot/share/opto/library_call.cpp line 4620: > 4618: Node *unlocked_val = _gvn.MakeConX(markWord::unlocked_value); > 4619: Node *chk_unlocked = _gvn.transform(new > CmpXNode(lmasked_header, unlocked_val)); > 4620: Node *test_not_unlocked = _gvn.transform(new > BoolNode(chk_unlocked, BoolTest::ne)); I don't really know what this does. Someone from the c2 compiler group should look at this. src/hotspot/share/runtime/arguments.cpp line 1830: > 1828: FLAG_SET_CMDLINE(LockingMode, LM_LIGHTWEIGHT); > 1829: warning("UseObjectMonitorTable requires LM_LIGHTWEIGHT"); > 1830: } Maybe we want this to have the opposite sense - turn off UseObjectMonitorTable if not LM_LIGHTWEIGHT? src/hotspot/share/runtime/javaThread.inline.hpp line 258: > 256: } > 257: > 258: _om_cache.clear(); This could be shorter, ie: if (UseObjectMonitorTable) _om_cache.clear(); I think the not having an assert was to make the caller unconditional, which is good. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 393: > 391: > 392: ObjectMonitor* LightweightSynchronizer::get_or_insert_monitor(oop > object, JavaThread* current, const ObjectSynchronizer::InflateCause cause, > bool try_read) { > 393: assert(LockingMode == LM_LIGHTWEIGHT, "must be"); This assert should be assert(UseObjectMonitorTable not LM_LIGHTWEIGHT). src/hotspot/share/runtime/lightweightSynchronizer.cpp line 732: > 730: > 731: markWord mark = object->mark(); > 732: assert(!mark.is_unlocked(), "must be unlocked"); "must be locked" makes more sense. src/hotspot/share/runtime/lightweightSynchronizer.cpp line 763: > 761: assert(mark.has_monitor(), "must be"); > 762: // The monitor exists > 763: ObjectMonitor* monitor = ObjectSynchronizer::read_monitor(current, > object, mark); This looks in the table for the monitor in U
Re: RFR: 8335966: Remove incorrect problem listing of java/lang/instrument/NativeMethodPrefixAgent.java in ProblemList-Virtual.txt
On Tue, 9 Jul 2024 08:39:39 GMT, Jaikiran Pai wrote: > Can I please get a review of this change which removes an incorrect entry > from `ProblemList-Virtual.txt`? > > As noted in https://bugs.openjdk.org/browse/JDK-8335966 this entry is a left > over which went unnoticed when we did the changes in > https://bugs.openjdk.org/browse/JDK-8333130. > > The refactored test `NativeMethodPrefixApp` has been functioning without the > errors for which `NativeMethodPrefixAgent` was previously problem listed. So > I haven't replaced that problem list entry and instead have just removed it. > > `NativeMethodPrefixApp` is known to run into an intermittent timeout with > virtual threads and that's tracked in > https://bugs.openjdk.org/browse/JDK-8334167 and I think it's not problem > listing worthy given the relatively low frequency of such timeouts. > Furthermore, in the coming days, I plan to propose a change to this > `NativeMethodPrefixApp` test which should fix the timeout failures. > > To verify that this removal doesn't cause any unexpected issues, I've > triggered a CI run of the relevant tier. Thank you Kevin and Alex for the reviews. tier5, where this problem listing file is in use, completed without issues. I will go ahead and integrate this in the next few hours. - PR Comment: https://git.openjdk.org/jdk/pull/20094#issuecomment-2219305134
Integrated: 8335966: Remove incorrect problem listing of java/lang/instrument/NativeMethodPrefixAgent.java in ProblemList-Virtual.txt
On Tue, 9 Jul 2024 08:39:39 GMT, Jaikiran Pai wrote: > Can I please get a review of this change which removes an incorrect entry > from `ProblemList-Virtual.txt`? > > As noted in https://bugs.openjdk.org/browse/JDK-8335966 this entry is a left > over which went unnoticed when we did the changes in > https://bugs.openjdk.org/browse/JDK-8333130. > > The refactored test `NativeMethodPrefixApp` has been functioning without the > errors for which `NativeMethodPrefixAgent` was previously problem listed. So > I haven't replaced that problem list entry and instead have just removed it. > > `NativeMethodPrefixApp` is known to run into an intermittent timeout with > virtual threads and that's tracked in > https://bugs.openjdk.org/browse/JDK-8334167 and I think it's not problem > listing worthy given the relatively low frequency of such timeouts. > Furthermore, in the coming days, I plan to propose a change to this > `NativeMethodPrefixApp` test which should fix the timeout failures. > > To verify that this removal doesn't cause any unexpected issues, I've > triggered a CI run of the relevant tier. This pull request has now been integrated. Changeset: dcf4e0d5 Author:Jaikiran Pai URL: https://git.openjdk.org/jdk/commit/dcf4e0d51f392afe2711223484e932e3826e8864 Stats: 2 lines in 1 file changed: 0 ins; 2 del; 0 mod 8335966: Remove incorrect problem listing of java/lang/instrument/NativeMethodPrefixAgent.java in ProblemList-Virtual.txt Reviewed-by: kevinw, amenkov - PR: https://git.openjdk.org/jdk/pull/20094
Re: RFR: 8335743: jhsdb jstack cannot print some information on the waiting thread [v2]
On Mon, 8 Jul 2024 12:16:51 GMT, KIRIYAMA Takuya wrote: >> This bug was introduced by JDK-8284161. "Object.wait (long timeoutMillis)" >> was changed to call "Object.wait0 (long timeoutMillis)" in JDK-8284161. >> >> When "jhdsb jstack" is executed, the stack and lock information are printed >> in "sun.jvm.hotspot.runtime.JavaVFrame.printLockInfo(PrintStream tty, int >> frameCount)". This method checks whether the method is java.lang.Object.wait >> (...) and then reports the waitstate only if the check passes. >> https://github.com/openjdk/jdk/blob/7bc8f9c150cbf457edf6144adba734ecd5ca5a0f/src/jdk.hotspot.agent/share/classes/sun/jvm/hotspot/runtime/JavaVFrame.java#L103 >> >> >> if (getMethod().getName().asString().equals("wait") && >> >> getMethod().getMethodHolder().getName().asString().equals("java/lang/Object")) >> { >> >> >> However, after JDK-8284161, the waiting thread waits on >> "java.lang.Object.wait0 (long timeoutMillis)", so it no longer reports the >> waitstate. >> >> I changed "printLockInfo(PrintStream tty, int frameCount)" to check for >> "java.lang.Object.wait0 (...)". >> I confirmed that the lock information is correctly printed with this fix. >> I tested hotspot/jtreg/serviceability/sa/ and >> jdk/sun/tools/jhsdb/heapconfig/, and there were no regressions by this fix. >> >> Would anyone review this change, please? > > KIRIYAMA Takuya has updated the pull request incrementally with one > additional commit since the last revision: > > 8335743: jhsdb jstack cannot print some information on the waiting thread Update looks good. Thanks - Marked as reviewed by dholmes (Reviewer). PR Review: https://git.openjdk.org/jdk/pull/20049#pullrequestreview-2168167270
Re: RFR: 8072701: resume001 failed due to ERROR: timeout for waiting for a BreakpintEvent
On Tue, 9 Jul 2024 19:17:26 GMT, Alex Menkov wrote: > This does not look correct to me. > This is the last test scenario - thread2.resume should resumes the thread > while vm is suspended. > thread2 should not be blocked on main thread. > Looking at the debuggee I suppose the blocking is possible during logging. > I'd suggest to update the debugee and remove any logging between breakpoints > 2 and 3 It looks like the debuggee gets as far as the following: public void runt2() { log("method 'runt2' enter 1"); i1++; i2--; log("method 'run2t' exit"); return; } It prints the first log and hits a breakpoint setup on the 2nd line. The debugger resumes thread2 after this, but we never see the 2nd log. Whenever we see this failure, the following logs from the mainThread are always delayed (by a lot): debugee.stderr> **> mainThread: mainThread is out of: synchronized (lockingObject) { debugee.stderr> **> mainThread: waiting for an instruction from the debugger ... I think this delay is resulting in the the mainThread being in the middle of doing one of these logs when the vm.suspend() is done. This leaves mainThread suspended while holding a lock needed for doing logging (logging is just a simple System.err.prinln()). I'm trying to prove this by getting a debuggee thread dump when getting the 3rd Breakpoint event times out, but for some reason once I added this code I could no longer reproduce the problem (still trying though). I don't like the idea of avoiding this issue by getting rid of certain problematic logging. It seems error prone. Someone could add some new logging in the future. I'll see if there is a better solution than the vm.resume(). Perhaps I could just resume mainThread. However, I think with virtual threads I/O can be dependent on other threads like an "unparker" thread. Another solution might be to have the debugger and debuggee do an additional handshake so we can guarantee that mainThread is done with these two log statements. Currently when we get to the 2nd log statement, that just means the debuggee is waiting for a "quit" command from the debugger. We could at this point have the debuggee send a command to the debugger, and have the debugger wait for this command before it does the vm.suspend(). - PR Review Comment: https://git.openjdk.org/jdk/pull/20088#discussion_r1671707426