[
https://issues.apache.org/jira/browse/KAFKA-13601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508566#comment-17508566
]
Anil Dasari commented on KAFKA-13601:
-------------------------------------
File name in the filed partitioner uses starting offset of the batch and there
will be only one parquet file per worker (consumer/partition) present in
destination (s3 in my case) if worker dies before committing an offset. So new
or restarted worker would override that parquet file as start offset of the
partition remans same.
Please let me know if you have any questions.
> Add option to support sync offset commit in Kafka Connect Sink
> --------------------------------------------------------------
>
> Key: KAFKA-13601
> URL: https://issues.apache.org/jira/browse/KAFKA-13601
> Project: Kafka
> Issue Type: New Feature
> Components: KafkaConnect
> Reporter: Anil Dasari
> Priority: Major
>
> Exactly once in s3 connector with scheduled rotation and field partitioner
> can be achieved with consumer offset sync' commit after message batch flushed
> to sink successfully
> Currently, WorkerSinkTask committing the consumer offsets asynchronously and
> at regular intervals of WorkerConfig.OFFSET_COMMIT_INTERVAL_MS_CONFIG
> [https://github.com/apache/kafka/blob/371f14c3c12d2e341ac96bd52393b43a10acfa84/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L203]
> [https://github.com/apache/kafka/blob/371f14c3c12d2e341ac96bd52393b43a10acfa84/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L196]
> [https://github.com/apache/kafka/blob/371f14c3c12d2e341ac96bd52393b43a10acfa84/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/WorkerSinkTask.java#L354]
>
> Add config to allow user to select synchronous commit over
> WorkerConfig.OFFSET_COMMIT_INTERVAL_MS_CONFIG
--
This message was sent by Atlassian Jira
(v8.20.1#820001)