voonhous commented on code in PR #18417:
URL: https://github.com/apache/hudi/pull/18417#discussion_r3013763411


##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -502,7 +527,10 @@ private List<StoragePathInfo> 
listPartitionPathFiles(List<PartitionPath> partiti
 
       return result;
     } catch (IOException e) {
-      throw new HoodieIOException("Failed to list partition paths", e);
+      throw new HoodieIOException("On " + 
metaClient.getTableConfig().getTableName() + " Failed to list partition paths", 
e);

Review Comment:
   Nit:
   ```suggestion
         throw new HoodieIOException("On " + 
metaClient.getTableConfig().getTableName() + " Failed to list partition paths", 
e);```



##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -502,7 +527,10 @@ private List<StoragePathInfo> 
listPartitionPathFiles(List<PartitionPath> partiti
 
       return result;
     } catch (IOException e) {
-      throw new HoodieIOException("Failed to list partition paths", e);
+      throw new HoodieIOException("On " + 
metaClient.getTableConfig().getTableName() + " Failed to list partition paths", 
e);
+    } finally {
+      log.debug("On {}, it took {} ms to fetch files for {} uncached 
partitions via getAllFilesInPartitions",
+          metaClient.getTableConfig().getTableName(), timer.endTimer(), 
missingPartitionPaths.size());

Review Comment:
   Nit: feel free to ignore, timer includes close time here. I assume closing 
is cheap, so this doesn't matter. 



##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -357,18 +358,26 @@ private Map<PartitionPath, List<FileSlice>> 
filterFiles(List<PartitionPath> part
                       .orElseGet(() -> 
finalFileSystemView.getLatestFileSlices(partitionPath.path))
                       .collect(Collectors.toList())
           ));
+    } finally {
+      log.debug("On {} with query instant as {}, it took {}ms to filter {} 
files into file slices across {} partitions",
+          metaClient.getTableConfig().getTableName(), 
queryInstant.orElse("N/A"), timer.endTimer(),
+          allFiles.size(), partitions.size());
     }
   }
 
   protected List<PartitionPath> listPartitionPaths(List<String> 
relativePartitionPaths,
                                                    Types.RecordType 
partitionFields,
                                                    Expression 
partitionColumnPredicates) {
     List<String> matchedPartitionPaths;
+    HoodieTimer timer = HoodieTimer.start();
     try {
       matchedPartitionPaths = 
tableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression(relativePartitionPaths,
           partitionFields, partitionColumnPredicates);
     } catch (IOException e) {
       throw new HoodieIOException("Error fetching partition paths", e);

Review Comment:
   Nit: Inconsistent error message enrichment, debug has table name, we should 
also include table name here too for better debuggability. (Even though the 
exception might contain some clues to this / have partition path.)
   
   Ideally, let's make it the same exception message in 
`listPartitionPathFiles`.



##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -385,7 +394,13 @@ protected List<PartitionPath> 
listPartitionPaths(List<String> relativePartitionP
           HoodieTimeline timelineToQuery = findInstantsInRange();
           matchedPartitionPaths = 
TimelineUtils.getWrittenPartitions(timelineToQuery);
         } else {

Review Comment:
   Is this intentional? Only the `getPartitionPathWithPathPrefixes` branch is 
timed. 
   
   The `TimelineUtils.getWrittenPartitions(timelineToQuery)` branch 
(incremental queries) gets no timing. 
   
   IIUC, the incremental-query partition listing is the not likely to be slow?



##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -357,18 +358,26 @@ private Map<PartitionPath, List<FileSlice>> 
filterFiles(List<PartitionPath> part
                       .orElseGet(() -> 
finalFileSystemView.getLatestFileSlices(partitionPath.path))
                       .collect(Collectors.toList())
           ));
+    } finally {
+      log.debug("On {} with query instant as {}, it took {}ms to filter {} 
files into file slices across {} partitions",
+          metaClient.getTableConfig().getTableName(), 
queryInstant.orElse("N/A"), timer.endTimer(),
+          allFiles.size(), partitions.size());
     }
   }
 
   protected List<PartitionPath> listPartitionPaths(List<String> 
relativePartitionPaths,
                                                    Types.RecordType 
partitionFields,
                                                    Expression 
partitionColumnPredicates) {
     List<String> matchedPartitionPaths;
+    HoodieTimer timer = HoodieTimer.start();
     try {
       matchedPartitionPaths = 
tableMetadata.getPartitionPathWithPathPrefixUsingFilterExpression(relativePartitionPaths,
           partitionFields, partitionColumnPredicates);
     } catch (IOException e) {
       throw new HoodieIOException("Error fetching partition paths", e);
+    } finally {
+      log.debug("On {}, it took {} ms to list partition paths with {} 
relativePartitionPaths and partition predicates",

Review Comment:
   Did you mean to log `partitionColumnPredicates` here too after `and 
partition predicates`.



##########
hudi-common/src/main/java/org/apache/hudi/BaseHoodieTableFileIndex.java:
##########
@@ -476,6 +491,12 @@ private List<StoragePathInfo> 
listPartitionPathFiles(List<PartitionPath> partiti
     Set<StoragePath> missingPartitionPaths =
         CollectionUtils.diffSet(partitionPaths, cachedPartitionPaths.keySet());
 
+    if (missingPartitionPaths.isEmpty()) {
+      return cachedPartitionPaths.values().stream()
+          .flatMap(Collection::stream)
+          .collect(Collectors.toList());
+    }
+

Review Comment:
   Early return is a behavioral change, this should be called out.
   
   Good optimization, but the PR description says "No functional behavior is 
intended to change". 
   
   Please update the description accordingly.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to