keith-turner commented on code in PR #5289:
URL: https://github.com/apache/accumulo/pull/5289#discussion_r1941544674
##########
server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/MinorCompactionTask.java:
##########
@@ -49,13 +55,32 @@ class MinorCompactionTask implements Runnable {
this.commitSession = commitSession;
this.flushId = flushId;
this.mincReason = mincReason;
+ this.pauseLimit =
tablet.getContext().getTableConfiguration(tablet.extent.tableId())
+ .getCount(Property.TABLE_FILE_PAUSE);
+ }
+
+ private static void checkMinorCompactionFiles(String tableId, long
pauseLimit,
+ long currentFileCount) throws AcceptableThriftTableOperationException {
+ if (pauseLimit > 0 && currentFileCount > pauseLimit) {
+ throw new AcceptableThriftTableOperationException(tableId, null,
TableOperation.COMPACT,
+ TableOperationExceptionType.OTHER, "Attempted to perform minor
compaction with "
+ + currentFileCount + " files, exceeding the configured limit of
" + pauseLimit);
+ }
}
@Override
public void run() {
tablet.minorCompactionStarted();
try {
Span span = TraceUtil.startSpan(this.getClass(), "minorCompaction");
+ try {
+ long currentFileCount = tablet.getTabletMemory().getNumEntries();
+ checkMinorCompactionFiles(tablet.extent.tableId().canonical(),
pauseLimit,
+ currentFileCount); // Add pause check here
+ } catch (AcceptableThriftTableOperationException e) {
+ log.warn("Minor compaction paused due to file count for tablet {}",
tablet.extent);
+ return;
Review Comment:
I would add the check
[here](https://github.com/apache/accumulo/blob/ac8604d5630cd9131d50f362ef85bf106b0e6e81/server/tserver/src/main/java/org/apache/accumulo/tserver/tablet/Tablet.java#L584).
If this code returns null then the minor compaction will never start. That
code can consider the reason when making this decision. We could return null
in the case where `mincReason == SYSTEM` and the tablets file count would
exceed the limit. Probably do not want to prevent minor compactions for other
reasons like log recovery or tablet close (tserver may have too many tablets
and need to shed them). Returning null at that point in the code avoids
putting anything on the thread pool and reserving things in the tablet memory
(as long as `prepareForMinC()` is not called that is) that need to be properly
unreserved in the case where it can not minor compact, so it seems like a good
place to intercept.
There are some race conditions w/ the check at that location, the tablet
could surge over the limit but would eventually pause in a stable system(one
that is not migrating tablets). The bulk import code has the same behavior, it
can surge over the limit but will eventually pause. Avoiding the race
condition completely would require doing the minor compaction and checking the
file count in the conditional mutation that adds the file to the tablet. This
would be much more complex and not worth the effort IMO. Doing the check
before queing the compaction is good even if there is race condition because
the tablet will eventually pause and when it does it will keep pausing until
the file count goes down providing the needed back pressure to keep the tablet
healthy.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]