On Wed, 8 Jan 2025 11:53:43 GMT, Joachim Kern <jk...@openjdk.org> wrote:

> The test java/lang/ProcessHandle/InfoTest.java still fails sporadically on 
> AIX. The test exclusion was removed through 
> [JDK-8211847](https://bugs.openjdk.org/browse/JDK-8211847) under the 
> assumption the problem was gone. But it turned out that it was wrong.
> 
> We can see an exception like:
> 
> java.lang.AssertionError: reported cputime less than expected: PT0.2S, 
> actual: Optional[PT0.021179882S]
> at org.testng.Assert.fail(Assert.java:99)
> at InfoTest.test1(InfoTest.java:110)
> 
> After a discussion with Roger Riggs and the team, we came to the following 
> conclusion.
> The problem is based on 2 independent causes; one fundamental and one 
> AIX-specific.
> 
> The fundamental cause is as follows:
> Modern hardware provides many hardware threads (up to several hundred) that 
> enable the worker threads of the processes to be processed in real parallel. 
> To ensure that such a worker thread does not take up a hardware thread 
> resource for itself, it is rolled out by the OS after a few ms at the latest 
> to make room for another worker thread, possibly from another process.
> The OS continuously adds up all the times that each worker thread of a 
> process is active as process cpu time.
> 
> It is easy to see that there is no correlation between the CPU time of a 
> process and the real time(wall time).
> 
> If you have a system with many hardware threads and few worker threads, these 
> are active almost all the time. If they are rolled out, they are immediately 
> rolled back in due to a lack of competition. If a process has several worker 
> threads, the CPU time will increase faster than the real time. In this case, 
> cpu time > real time is to be expected, which is what the test wants.
> 
> However, if the same system is heavily loaded, i.e. there are a lot of worker 
> threads competing on one hardware thread, each individual worker thread can 
> only become active relatively rarely. Even if a process has several worker 
> threads, the total CPU time will be less than the past real time. This is 
> even more pronounced if the individual worker threads have to wait for each 
> other via synchronization objects. Since this is the normal case, cpu time < 
> real time usually applies.
> 
> Therefore, such a test makes little sense in principle.
> 
> The AIX-specific cause of the problem lies in the API used to get the cpu 
> time. The `/proc/<pid>/psinfo` file is evaluated to obtain the cpu time. The 
> /proc directory is only present on AIX for portability reasons. The data in 
> it is only updated at long intervals. For example, the cpu time is only up...

Marked as reviewed by mbaesken (Reviewer).

> The /proc//psinfo file is evaluated to obtain the cpu time

Do you mean` /proc/<pid>/psinfo`   ?

Looks reasonable to me.
One question, do we still need unix_getParentPidAndTimings ? Seems we called it 
only from ProcessHandleImpl_aix.c , see a grep on the old codebase before your 
change  


java.base/aix/native/libjava/ProcessHandleImpl_aix.c:165:    return 
unix_getParentPidAndTimings(env, pid, total, start);
java.base/unix/native/libjava/ProcessHandleImpl_unix.c:98: * implementations 
simply call back to unix_getParentPidAndTimings() and
java.base/unix/native/libjava/ProcessHandleImpl_unix.c:641:pid_t 
unix_getParentPidAndTimings(JNIEnv *env, pid_t pid,
java.base/unix/native/libjava/ProcessHandleImpl_unix.h:59:extern pid_t 
unix_getParentPidAndTimings(JNIEnv *env, pid_t pid,

I created  
https://bugs.openjdk.org/browse/JDK-8347270
8347270: Remove unix_getParentPidAndTimings after JDK-8346880

-------------

PR Review: https://git.openjdk.org/jdk/pull/22966#pullrequestreview-2537265677
PR Comment: https://git.openjdk.org/jdk/pull/22966#issuecomment-2577541379
PR Comment: https://git.openjdk.org/jdk/pull/22966#issuecomment-2577552957
PR Comment: https://git.openjdk.org/jdk/pull/22966#issuecomment-2577776743

Reply via email to