voonhous commented on code in PR #18417:
URL: https://github.com/apache/hudi/pull/18417#discussion_r3013763411
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -502,7 +527,10 @@ private List<StoragePathInfo>
listPartitionPathFiles(List<PartitionPath> partiti
return result;
} catch (IOException e) {
- throw new HoodieIOException("Failed to list partition paths", e);
+ throw new HoodieIOException("On " +
metaClient.getTableConfig().getTableName() + " Failed to list partition paths",
e);
Review Comment:
Nit:
```suggestion
throw new HoodieIOException("On " +
metaClient.getTableConfig().getTableName() + " Failed to list partition paths",
e);```
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -502,7 +527,10 @@ private List<StoragePathInfo>
listPartitionPathFiles(List<PartitionPath> partiti
return result;
} catch (IOException e) {
- throw new HoodieIOException("Failed to list partition paths", e);
+ throw new HoodieIOException("On " +
metaClient.getTableConfig().getTableName() + " Failed to list partition paths",
e);
+ } finally {
+ log.debug("On {}, it took {} ms to fetch files for {} uncached
partitions via getAllFilesInPartitions",
+ metaClient.getTableConfig().getTableName(), timer.endTimer(),
missingPartitionPaths.size());
Review Comment:
Nit: feel free to ignore, timer includes close time here. I assume closing
is cheap, so this doesn't matter.
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -357,18 +358,26 @@ private Map<PartitionPath, List<FileSlice>>
filterFiles(List<PartitionPath> part
.orElseGet(() ->
finalFileSystemView.getLatestFileSlices(partitionPath.path))
.collect(Collectors.toList())
));
+ } finally {
+ log.debug("On {} with query instant as {}, it took {}ms to filter {}
files into file slices across {} partitions",
+ metaClient.getTableConfig().getTableName(),
queryInstant.orElse("N/A"), timer.endTimer(),
+ allFiles.size(), partitions.size());
}
}
protected List<PartitionPath> listPartitionPaths(List<String>
relativePartitionPaths,
Types.RecordType
partitionFields,
Expression
partitionColumnPredicates) {
List<String> matchedPartitionPaths;
+ HoodieTimer timer = HoodieTimer.start();
try {
matchedPartitionPaths =
tableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression(relativePartitionPaths,
partitionFields, partitionColumnPredicates);
} catch (IOException e) {
throw new HoodieIOException("Error fetching partition paths", e);
Review Comment:
Nit: Inconsistent error message enrichment, debug has table name, we should
also include table name here too for better debuggability. (Even though the
exception might contain some clues to this / have partition path.)
Ideally, let's make it the same exception message in
`listPartitionPathFiles`.
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -385,7 +394,13 @@ protected List<PartitionPath>
listPartitionPaths(List<String> relativePartitionP
HoodieTimeline timelineToQuery = findInstantsInRange();
matchedPartitionPaths =
TimelineUtils.getWrittenPartitions(timelineToQuery);
} else {
Review Comment:
Is this intentional? Only the `getPartitionPathWithPathPrefixes` branch is
timed.
The `TimelineUtils.getWrittenPartitions(timelineToQuery)` branch
(incremental queries) gets no timing.
IIUC, the incremental-query partition listing is the not likely to be slow?
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -357,18 +358,26 @@ private Map<PartitionPath, List<FileSlice>>
filterFiles(List<PartitionPath> part
.orElseGet(() ->
finalFileSystemView.getLatestFileSlices(partitionPath.path))
.collect(Collectors.toList())
));
+ } finally {
+ log.debug("On {} with query instant as {}, it took {}ms to filter {}
files into file slices across {} partitions",
+ metaClient.getTableConfig().getTableName(),
queryInstant.orElse("N/A"), timer.endTimer(),
+ allFiles.size(), partitions.size());
}
}
protected List<PartitionPath> listPartitionPaths(List<String>
relativePartitionPaths,
Types.RecordType
partitionFields,
Expression
partitionColumnPredicates) {
List<String> matchedPartitionPaths;
+ HoodieTimer timer = HoodieTimer.start();
try {
matchedPartitionPaths =
tableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression(relativePartitionPaths,
partitionFields, partitionColumnPredicates);
} catch (IOException e) {
throw new HoodieIOException("Error fetching partition paths", e);
+ } finally {
+ log.debug("On {}, it took {} ms to list partition paths with {}
relativePartitionPaths and partition predicates",
Review Comment:
Did you mean to log `partitionColumnPredicates` here too after `and
partition predicates`.
##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -476,6 +491,12 @@ private List<StoragePathInfo>
listPartitionPathFiles(List<PartitionPath> partiti
Set<StoragePath> missingPartitionPaths =
CollectionUtils.diffSet(partitionPaths, cachedPartitionPaths.keySet());
+ if (missingPartitionPaths.isEmpty()) {
+ return cachedPartitionPaths.values().stream()
+ .flatMap(Collection::stream)
+ .collect(Collectors.toList());
+ }
+
Review Comment:
Early return is a behavioral change, this should be called out.
Good optimization, but the PR description says "No functional behavior is
intended to change".
Please update the description accordingly.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]