On Thu, 9 Jan 2025 11:37:17 GMT, Joachim Kern <jk...@openjdk.org> wrote:
>> The test java/lang/ProcessHandle/InfoTest.java still fails sporadically on >> AIX. The test exclusion was removed through >> [JDK-8211847](https://bugs.openjdk.org/browse/JDK-8211847) under the >> assumption the problem was gone. But it turned out that it was wrong. >> >> We can see an exception like: >> >> java.lang.AssertionError: reported cputime less than expected: PT0.2S, >> actual: Optional[PT0.021179882S] >> at org.testng.Assert.fail(Assert.java:99) >> at InfoTest.test1(InfoTest.java:110) >> >> After a discussion with Roger Riggs and the team, we came to the following >> conclusion. >> The problem is based on 2 independent causes; one fundamental and one >> AIX-specific. >> >> The fundamental cause is as follows: >> Modern hardware provides many hardware threads (up to several hundred) that >> enable the worker threads of the processes to be processed in real parallel. >> To ensure that such a worker thread does not take up a hardware thread >> resource for itself, it is rolled out by the OS after a few ms at the latest >> to make room for another worker thread, possibly from another process. >> The OS continuously adds up all the times that each worker thread of a >> process is active as process cpu time. >> >> It is easy to see that there is no correlation between the CPU time of a >> process and the real time(wall time). >> >> If you have a system with many hardware threads and few worker threads, >> these are active almost all the time. If they are rolled out, they are >> immediately rolled back in due to a lack of competition. If a process has >> several worker threads, the CPU time will increase faster than the real >> time. In this case, cpu time > real time is to be expected, which is what >> the test wants. >> >> However, if the same system is heavily loaded, i.e. there are a lot of >> worker threads competing on one hardware thread, each individual worker >> thread can only become active relatively rarely. Even if a process has >> several worker threads, the total CPU time will be less than the past real >> time. This is even more pronounced if the individual worker threads have to >> wait for each other via synchronization objects. Since this is the normal >> case, cpu time < real time usually applies. >> >> Therefore, such a test makes little sense in principle. >> >> The AIX-specific cause of the problem lies in the API used to get the cpu >> time. The `/proc/<pid>/psinfo` file is evaluated to obtain the cpu time. The >> /proc directory is only present on AIX for portability reasons. The data in >> it is only updated at long... > > Joachim Kern has updated the pull request incrementally with two additional > commits since the last revision: > > - remove extra white space > - omit unused variable Marked as reviewed by clanger (Reviewer). ------------- PR Review: https://git.openjdk.org/jdk/pull/22966#pullrequestreview-2539880841