xy720 opened a new pull request, #46959:
URL: https://github.com/apache/doris/pull/46959

   ### What problem does this PR solve?
   
   Issue Number: close #xxx
   
   Related PR: #28608
   
   Problem Summary:
   
   In TabletStatMgr, We use stream().parallel() or parallelStream() in a 
ForkJoinTask,when the parallel(Stream)() method is called, the stream will 
allocate the `ForEach` task to multiple threads. However, when the stream is 
within a ForkJoinTask, it will attempt to steal threads from the ForkJoinPool. 
When the number of threads in the ForkJoinPool is small, thread competition is 
very likely to occur, ultimately leading to a deadlock.
   
   Here is a deadlock stack of 4-core Fe:
   
   Dead Lock Stack:
   
   ```
   "tablet stat mgr" #28 daemon prio=5 os_prio=0 cpu=12322.96ms 
elapsed=2159051.95s allocated=8527M defined_classes=5 tid=0x00007f4d241d6800 
nid=0x24b6 in Object.wait()  [0x00007f4cfb37a000]
      java.lang.Thread.State: WAITING (on object monitor)
           at java.lang.Object.wait(java.base@11.0.24/Native Method)
           - waiting on <no object reference available>
           at 
java.util.concurrent.ForkJoinTask.externalAwaitDone(java.base@11.0.24/ForkJoinTask.java:330)
           - waiting to re-lock in wait() <0x00000005debf6e00> (a 
java.util.concurrent.ForkJoinTask$AdaptedRunnableAction)
           at 
java.util.concurrent.ForkJoinTask.doJoin(java.base@11.0.24/ForkJoinTask.java:398)
           at 
java.util.concurrent.ForkJoinTask.join(java.base@11.0.24/ForkJoinTask.java:721)
           at 
org.apache.doris.catalog.TabletStatMgr.runAfterCatalogReady(TabletStatMgr.java:85)
           at 
org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58)
           at org.apache.doris.common.util.Daemon.run(Daemon.java:119)
   
      Locked ownable synchronizers:
           - None
   
   "ForkJoinPool-1-worker-13" #441579 daemon prio=5 os_prio=0 cpu=839.24ms 
elapsed=191462.96s allocated=356M defined_classes=0 tid=0x00007f4d88008000 
nid=0xb2668 waiting on condition  [0x00007f4cf6807000]
      java.lang.Thread.State: TIMED_WAITING (parking)
           at jdk.internal.misc.Unsafe.park(java.base@11.0.24/Native Method)
           - parking to wait for  <0x00000005c4abe5a8> (a 
java.util.concurrent.ForkJoinPool)
           at 
java.util.concurrent.locks.LockSupport.parkUntil(java.base@11.0.24/LockSupport.java:275)
           at 
java.util.concurrent.ForkJoinPool.runWorker(java.base@11.0.24/ForkJoinPool.java:1619)
           at 
java.util.concurrent.ForkJoinWorkerThread.run(java.base@11.0.24/ForkJoinWorkerThread.java:183)
   
      Locked ownable synchronizers:
           - None
   
   "ForkJoinPool-2-worker-9" #444184 daemon prio=5 os_prio=0 cpu=2.16ms 
elapsed=179817.30s allocated=1076K defined_classes=0 tid=0x00007f4d60dc6000 
nid=0xd4a06 waiting on condition  [0x00007f4ce315f000]
      java.lang.Thread.State: WAITING (parking)
           at jdk.internal.misc.Unsafe.park(java.base@11.0.24/Native Method)
           - parking to wait for  <0x00000005cc189d48> (a 
java.util.concurrent.ForkJoinPool)
           at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.24/LockSupport.java:194)
           at 
java.util.concurrent.ForkJoinPool.runWorker(java.base@11.0.24/ForkJoinPool.java:1628)
           at 
java.util.concurrent.ForkJoinWorkerThread.run(java.base@11.0.24/ForkJoinWorkerThread.java:183)
   
      Locked ownable synchronizers:
           - None
   
   "ForkJoinPool-2-worker-11" #444199 daemon prio=5 os_prio=0 cpu=1.27ms 
elapsed=179757.30s allocated=555K defined_classes=0 tid=0x00007f4d802a1800 
nid=0xd4cd6 waiting on condition  [0x00007f4cdc32e000]
      java.lang.Thread.State: WAITING (parking)
           at jdk.internal.misc.Unsafe.park(java.base@11.0.24/Native Method)
           - parking to wait for  <0x00000005cc189d48> (a 
java.util.concurrent.ForkJoinPool)
           at 
java.util.concurrent.locks.LockSupport.park(java.base@11.0.24/LockSupport.java:194)
           at 
java.util.concurrent.ForkJoinPool.runWorker(java.base@11.0.24/ForkJoinPool.java:1628)
           at 
java.util.concurrent.ForkJoinWorkerThread.run(java.base@11.0.24/ForkJoinWorkerThread.java:183)
   
      Locked ownable synchronizers:
           - None
   ```
   
   This commit will try to dynamic adjust the thread num of ForkJoinPool by 
backend size.
   
   The minimum num of thread num is 8,  maximum num of thread num is 64, 
   and the thread num will round up to multiply of 8.
   
   ### Release note
   
   None
   
   ### Check List (For Author)
   
   - Test <!-- At least one of them must be included. -->
       - [ ] Regression test
       - [ ] Unit Test
       - [ ] Manual test (add detailed scripts or steps below)
       - [ ] No need to test or manual test. Explain why:
           - [ ] This is a refactor/code format and no logic has been changed.
           - [x] Previous test can cover this change.
           - [ ] No code files have been changed.
           - [ ] Other reason <!-- Add your reason?  -->
   
   - Behavior changed:
       - [x] No.
       - [ ] Yes. <!-- Explain the behavior change -->
   
   - Does this need documentation?
       - [x] No.
       - [ ] Yes. <!-- Add document PR link here. eg: 
https://github.com/apache/doris-website/pull/1214 -->
   
   ### Check List (For Reviewer who merge this PR)
   
   - [ ] Confirm the release note
   - [ ] Confirm test cases
   - [ ] Confirm document
   - [ ] Add branch pick label <!-- Add branch pick label that this PR should 
merge into -->
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@doris.apache.org
For additional commands, e-mail: commits-h...@doris.apache.org

Reply via email to