kkrugler commented on a change in pull request #7222:
URL: https://github.com/apache/pinot/pull/7222#discussion_r680164442



##########
File path: 
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-common/src/main/java/org/apache/pinot/plugin/ingestion/batch/common/SegmentGenerationTaskRunner.java
##########
@@ -149,6 +155,9 @@ private SegmentNameGenerator getSegmentNameGenerator() {
             
Boolean.parseBoolean(segmentNameGeneratorConfigs.get(EXCLUDE_SEQUENCE_ID)),
             IngestionConfigUtils.getBatchSegmentIngestionType(tableConfig),
             
IngestionConfigUtils.getBatchSegmentIngestionFrequency(tableConfig), 
dateTimeFormatSpec);
+      case INPUT_FILE_SEGMENT_NAME_GENERATOR:
+        return new 
InputFileSegmentNameGenerator(segmentNameGeneratorConfigs.get(FILE_PATH_PATTERN),

Review comment:
       1. You are correct, we actually could set up the segment name here, 
because there's a new SegmentNameGenerator created for each segment. But I was 
following the current API convention, where the 
`SegmentNameGenerator.generateSegmentName()` method is called with per-segment 
parameters. Otherwise there's no reason for it to be an interface. If you want 
to change how segment naming is handled as part of this PR, I could get rid of 
the interface, and have a single SegmentNameGenerator class with a static 
method that takes the naming type, the config parameter map, and returns a 
segment name.
   2. Though I don't follow why this change would better handle movement of 
input files. The segment is generated once, not dynamically, and during the 
actual job generating segments the input files better not be moving around, as 
their paths are assumed to be stable for the duration of the job (e.g. with 
Hadoop, these paths are stored as the input data for the segment generation 
job).
   3. Regardless, there's still the issue of how to handle segment naming based 
on the original input file path, which was the use case behind issue #7090. I 
could make it work, but it's a bit tricky to extract paths from a URI.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to