kkrugler commented on a change in pull request #7222:
URL: https://github.com/apache/pinot/pull/7222#discussion_r680164442
##########
File path:
pinot-plugins/pinot-batch-ingestion/pinot-batch-ingestion-common/src/main/java/org/apache/pinot/plugin/ingestion/batch/common/SegmentGenerationTaskRunner.java
##########
@@ -149,6 +155,9 @@ private SegmentNameGenerator getSegmentNameGenerator() {
Boolean.parseBoolean(segmentNameGeneratorConfigs.get(EXCLUDE_SEQUENCE_ID)),
IngestionConfigUtils.getBatchSegmentIngestionType(tableConfig),
IngestionConfigUtils.getBatchSegmentIngestionFrequency(tableConfig),
dateTimeFormatSpec);
+ case INPUT_FILE_SEGMENT_NAME_GENERATOR:
+ return new
InputFileSegmentNameGenerator(segmentNameGeneratorConfigs.get(FILE_PATH_PATTERN),
Review comment:
1. You are correct, we actually could set up the segment name here,
because there's a new SegmentNameGenerator created for each segment. But I was
following the current API convention, where the
`SegmentNameGenerator.generateSegmentName()` method is called with per-segment
parameters. Otherwise there's no reason for it to be an interface. If you want
to change how segment naming is handled as part of this PR, I could get rid of
the interface, and have a single SegmentNameGenerator class with a static
method that takes the naming type, the config parameter map, and returns a
segment name.
2. Though I don't follow why this change would better handle movement of
input files. The segment is generated once, not dynamically, and during the
actual job generating segments the input files better not be moving around, as
their paths are assumed to be stable for the duration of the job (e.g. with
Hadoop, these paths are stored as the input data for the segment generation
job).
3. Regardless, there's still the issue of how to handle segment naming based
on the original input file path, which was the use case behind issue #7090. I
could make it work, but it's a bit tricky to extract paths from a URI.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]