keith-turner commented on code in PR #5289:
URL: https://github.com/apache/accumulo/pull/5289#discussion_r1941544674


##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactionTask.java:
##########
@@ -49,13 +55,32 @@ class MinorCompactionTask implements Runnable {
     this.commitSession = commitSession;
     this.flushId = flushId;
     this.mincReason = mincReason;
+    this.pauseLimit = 
tablet.getContext().getTableConfiguration(tablet.extent.tableId())
+        .getCount(Property.TABLE_FILE_PAUSE);
+  }
+
+  private static void checkMinorCompactionFiles(String tableId, long 
pauseLimit,
+      long currentFileCount) throws AcceptableThriftTableOperationException {
+    if (pauseLimit > 0 && currentFileCount > pauseLimit) {
+      throw new AcceptableThriftTableOperationException(tableId, null, 
TableOperation.COMPACT,
+          TableOperationExceptionType.OTHER, "Attempted to perform minor 
compaction with "
+              + currentFileCount + " files, exceeding the configured limit of 
" + pauseLimit);
+    }
   }
 
   @Override
   public void run() {
     tablet.minorCompactionStarted();
     try {
       Span span = TraceUtil.startSpan(this.getClass(), "minorCompaction");
+      try {
+        long currentFileCount = tablet.getTabletMemory().getNumEntries();
+        checkMinorCompactionFiles(tablet.extent.tableId().canonical(), 
pauseLimit,
+            currentFileCount); // Add pause check here
+      } catch (AcceptableThriftTableOperationException e) {
+        log.warn("Minor compaction paused due to file count for tablet {}", 
tablet.extent);
+        return;

Review Comment:
   I would add the check 
[here](https://github.com/apache/accumulo/blob/ac8604d5630cd9131d50f362ef85bf106b0e6e81/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java#L584).
  If this code returns null then the minor compaction will never start. That 
code can consider the reason when making this decision.  We could return null 
in the case where `mincReason == SYSTEM` and the tablets file count would 
exceed the limit.  Probably do not want to prevent minor compactions for other 
reasons like log recovery or tablet close (tserver may have too many tablets 
and need to shed them).   Returning null at that point in the code avoids 
putting anything on the thread pool and reserving things in the tablet memory 
(as long as `prepareForMinC()` is not called that is) that need to be properly 
unreserved in the case where it can not minor compact, so it seems like a good 
place to intercept.
   
   There are some race conditions w/ the check at that location, the tablet 
could surge over the limit but would eventually pause in a stable system(one 
that is not migrating tablets).  The bulk import code has the same behavior, it 
can  surge over the limit but will eventually pause.   Avoiding the race 
condition completely would require doing the minor compaction and checking the 
file count in the conditional mutation that adds the file to the tablet.  This 
would be much more complex and not worth the effort IMO.  Doing the check 
before queing the compaction is good even if there is race condition because 
the tablet will eventually pause and when it does it will keep pausing until 
the file count goes down providing the needed back pressure to keep the tablet 
healthy.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to