keith-turner commented on code in PR #5341:
URL: https://github.com/apache/accumulo/pull/5341#discussion_r1970457723
##########
server/manager/src/main/java/org/apache/accumulo/manager/tableOps/bulkVer2/LoadFiles.java:
##########
@@ -342,12 +341,22 @@ private long loadFiles(TableId tableId, Path bulkDir,
LoadMappingIterator loadMa
loader.start(bulkDir, manager, tid, bulkInfo.setTime);
long t1 = System.currentTimeMillis();
+ KeyExtent prevLastExtent = null; // KeyExtent of last tablet from prior
loadMapEntry
while (lmi.hasNext()) {
loadMapEntry = lmi.next();
- List<TabletMetadata> tablets =
- findOverlappingTablets(fmtTid, loadMapEntry.getKey(), tabletIter);
+ KeyExtent loadMapKey = loadMapEntry.getKey();
+ if (prevLastExtent != null &&
!loadMapKey.isPreviousExtent(prevLastExtent)) {
Review Comment:
Wondering if using a batch scanner would be better here to minimize the
overall number of RPCs made. Would be a large change to the code. The current
code, even if we optimize the use of the scanner will make a lot of RPCs for
some cases (like importing into every 100th tablet in a million tablet table)
and those RPCs will be made serially. A batch scanner would minimize the
number of RPCs made for these cases.
Would be good to gather some performance data before making large changes to
improve performance to ensure they are needed. Can not do it in 2.1, but in
main we could experiment w/ the SplitMillionIT and try doing things like
importing into every 10th tablet for 1000 tablets, every 100th tablet for 1000
tablets, etc.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]