On Thu, 9 Jan 2025 11:37:17 GMT, Joachim Kern <jk...@openjdk.org> wrote:

>> The test java/lang/ProcessHandle/InfoTest.java still fails sporadically on 
>> AIX. The test exclusion was removed through 
>> [JDK-8211847](https://bugs.openjdk.org/browse/JDK-8211847) under the 
>> assumption the problem was gone. But it turned out that it was wrong.
>> 
>> We can see an exception like:
>> 
>> java.lang.AssertionError: reported cputime less than expected: PT0.2S, 
>> actual: Optional[PT0.021179882S]
>> at org.testng.Assert.fail(Assert.java:99)
>> at InfoTest.test1(InfoTest.java:110)
>> 
>> After a discussion with Roger Riggs and the team, we came to the following 
>> conclusion.
>> The problem is based on 2 independent causes; one fundamental and one 
>> AIX-specific.
>> 
>> The fundamental cause is as follows:
>> Modern hardware provides many hardware threads (up to several hundred) that 
>> enable the worker threads of the processes to be processed in real parallel. 
>> To ensure that such a worker thread does not take up a hardware thread 
>> resource for itself, it is rolled out by the OS after a few ms at the latest 
>> to make room for another worker thread, possibly from another process.
>> The OS continuously adds up all the times that each worker thread of a 
>> process is active as process cpu time.
>> 
>> It is easy to see that there is no correlation between the CPU time of a 
>> process and the real time(wall time).
>> 
>> If you have a system with many hardware threads and few worker threads, 
>> these are active almost all the time. If they are rolled out, they are 
>> immediately rolled back in due to a lack of competition. If a process has 
>> several worker threads, the CPU time will increase faster than the real 
>> time. In this case, cpu time > real time is to be expected, which is what 
>> the test wants.
>> 
>> However, if the same system is heavily loaded, i.e. there are a lot of 
>> worker threads competing on one hardware thread, each individual worker 
>> thread can only become active relatively rarely. Even if a process has 
>> several worker threads, the total CPU time will be less than the past real 
>> time. This is even more pronounced if the individual worker threads have to 
>> wait for each other via synchronization objects. Since this is the normal 
>> case, cpu time < real time usually applies.
>> 
>> Therefore, such a test makes little sense in principle.
>> 
>> The AIX-specific cause of the problem lies in the API used to get the cpu 
>> time. The `/proc/<pid>/psinfo` file is evaluated to obtain the cpu time. The 
>> /proc directory is only present on AIX for portability reasons. The data in 
>> it is only updated at long...
>
> Joachim Kern has updated the pull request incrementally with two additional 
> commits since the last revision:
> 
>  - remove extra white space
>  - omit unused variable

Marked as reviewed by clanger (Reviewer).

-------------

PR Review: https://git.openjdk.org/jdk/pull/22966#pullrequestreview-2539880841

Reply via email to