On Wed, 8 Jan 2025 11:53:43 GMT, Joachim Kern <jk...@openjdk.org> wrote:
> The test java/lang/ProcessHandle/InfoTest.java still fails sporadically on > AIX. The test exclusion was removed through > [JDK-8211847](https://bugs.openjdk.org/browse/JDK-8211847) under the > assumption the problem was gone. But it turned out that it was wrong. > > We can see an exception like: > > java.lang.AssertionError: reported cputime less than expected: PT0.2S, > actual: Optional[PT0.021179882S] > at org.testng.Assert.fail(Assert.java:99) > at InfoTest.test1(InfoTest.java:110) > > After a discussion with Roger Riggs and the team, we came to the following > conclusion. > The problem is based on 2 independent causes; one fundamental and one > AIX-specific. > > The fundamental cause is as follows: > Modern hardware provides many hardware threads (up to several hundred) that > enable the worker threads of the processes to be processed in real parallel. > To ensure that such a worker thread does not take up a hardware thread > resource for itself, it is rolled out by the OS after a few ms at the latest > to make room for another worker thread, possibly from another process. > The OS continuously adds up all the times that each worker thread of a > process is active as process cpu time. > > It is easy to see that there is no correlation between the CPU time of a > process and the real time(wall time). > > If you have a system with many hardware threads and few worker threads, these > are active almost all the time. If they are rolled out, they are immediately > rolled back in due to a lack of competition. If a process has several worker > threads, the CPU time will increase faster than the real time. In this case, > cpu time > real time is to be expected, which is what the test wants. > > However, if the same system is heavily loaded, i.e. there are a lot of worker > threads competing on one hardware thread, each individual worker thread can > only become active relatively rarely. Even if a process has several worker > threads, the total CPU time will be less than the past real time. This is > even more pronounced if the individual worker threads have to wait for each > other via synchronization objects. Since this is the normal case, cpu time < > real time usually applies. > > Therefore, such a test makes little sense in principle. > > The AIX-specific cause of the problem lies in the API used to get the cpu > time. The `/proc/<pid>/psinfo` file is evaluated to obtain the cpu time. The > /proc directory is only present on AIX for portability reasons. The data in > it is only updated at long intervals. For example, the cpu time is only up... Looks basically good to me. Thanks for improving it. I have minor questions and remarks. I'd be fine with removing the dead code with this PR, too. Your choice. I think we should use src/java.base/aix/native/libjava/ProcessHandleImpl_aix.c line 167: > 165: pid_t the_pid = pid; > 166: struct procentry64 ProcessBuffer; > 167: struct fdsinfo64 FileDescBuffer; How is `FileDescBuffer` used? I can only see its size used. Wouldn't using `sizeof(fdsinfo64)` below be better? src/java.base/aix/native/libjava/ProcessHandleImpl_aix.c line 169: > 167: struct fdsinfo64 FileDescBuffer; > 168: > 169: if (getprocs64(&ProcessBuffer, sizeof(ProcessBuffer), NULL, > sizeof(FileDescBuffer), &the_pid, 1 ) <= 0) { Extra whitespace before `)`. ------------- PR Review: https://git.openjdk.org/jdk/pull/22966#pullrequestreview-2539498869 PR Comment: https://git.openjdk.org/jdk/pull/22966#issuecomment-2579776128 PR Review Comment: https://git.openjdk.org/jdk/pull/22966#discussion_r1908513238 PR Review Comment: https://git.openjdk.org/jdk/pull/22966#discussion_r1908514055