keith-turner commented on code in PR #5129:
URL: https://github.com/apache/accumulo/pull/5129#discussion_r1873494144


##########
core/src/main/java/org/apache/accumulo/core/fate/Fate.java:
##########
@@ -372,6 +375,42 @@ public Fate(T environment, FateStore<T> store, boolean 
runDeadResCleaner,
             break;
           }
         }
+        queueSizeHistory.clear();
+      } else {
+        // The property did not change, but should it based on the queue size? 
Maintain
+        // the last X minutes of queue sizes. If the queue size is always 
larger than the number
+        // of Fate threads multiplied by some factor, then suggest that the
+        // MANAGER_FATE_THREADPOOL_SIZE be increased.
+        final long interval = Math.min(60, TimeUnit.MILLISECONDS
+            
.toMinutes(conf.getTimeInMillis(Property.MANAGER_FATE_QUEUE_CHECK_INTERVAL)));
+        if (interval == 0) {
+          queueSizeHistory.clear();
+        } else {
+          final int sizeFactor = 
conf.getCount(Property.MANAGER_FATE_QUEUE_CHECK_FACTOR);
+          if (queueSizeHistory.size() >= interval * 2) { // this task runs 
every 30s
+            final int warnThreshold = configured * sizeFactor;
+            boolean needMoreThreads = true;
+            for (Integer i : queueSizeHistory) {
+              if (i < warnThreshold) {
+                needMoreThreads = false;
+                break;
+              }
+            }
+            if (needMoreThreads) {
+              log.warn(
+                  "Fate queue size is {} times the number of Fate threads for 
the last {} minutes,"
+                      + " consider increasing property: {}",
+                  sizeFactor, interval, 
Property.MANAGER_FATE_THREADPOOL_SIZE.getKey());
+              // Clear the history so that we don't log for another 5 minutes.
+              queueSizeHistory.clear();
+            } else {
+              while (queueSizeHistory.size() >= interval * 2) {
+                queueSizeHistory.remove();
+              }
+            }
+          }
+          queueSizeHistory.add(workQueue.size());

Review Comment:
   Looking at how things work this queue size may always be zero (or something 
low, not sure what it will actually be).  When [this 
code](https://github.com/apache/accumulo/blob/a2ffa7b956a0ba2b920c0128df194ffd782745cc/core/src/main/java/org/apache/accumulo/core/fate/Fate.java#L109)
 places something on the queue it calls tryTransfer which should try to send 
something directly to a waiting thread and avoid queuing.
   
   The threads that run fate ops are in an infinite loop and just poll the 
transfer queue.  We could try to count the number of threads active in this 
[block of 
code](https://github.com/apache/accumulo/blob/a2ffa7b956a0ba2b920c0128df194ffd782745cc/core/src/main/java/org/apache/accumulo/core/fate/Fate.java#L160-L191)
 and use that to detect business for this check.  Could maybe use an atomic 
integer and leverage the existing try/finally in the code for the counting.
   
   This reminds of another problem I ran into related to this code.  A bit ago 
I was trying to use the fate thread pool metrics to understand what was going 
on with fate and was not getting anything useful out of those metrics.  Looking 
into it determined that because the way this code works and puts task in an 
infinite loop on the pool that was the reason the metrics were not useful.  If 
we added a counter of active fate threads for this PR it may be useful in 
follow on PR to turn off merics for the fate thread pool and add a gauge that 
exposes this counter.  That metrics would be useful, could see how many fate 
threads are busy at any given time.  Can not get that info from the current 
metrics.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to