[
https://issues.apache.org/jira/browse/KAFKA-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020100#comment-16020100
]
Randall Hauch edited comment on KAFKA-3821 at 5/22/17 8:13 PM:
---------------------------------------------------------------
The problem with the connector directly using {{OffsetStorageWriter}} is that
it cannot guarantee order relative to the source records that Kafka Connect is
already processing. In my cases, the offset/partition should be updated as part
of the sequence of normal source records, and that order must be maintained.
The best and simplest example is a connector that still wants to record that it
is still making progress in its source, but for whatever reason is not
producing any source records.
But imagine a case where the connector just recorded an offset via
{{OffsetStorageWriter}} and then immediately produces a new {{SourceRecord}}
with a new offset. This order is important, and it's really bad if the offset
of the {{SourceRecord}} gets written before the connector's call.
Of course, the opposite case is bad, too: imagine the connector producing
{{SourceRecord}} that is enqueued and not immediately processed, but the
connector progresses a bit and wants to record its new offset. If it did the
latter by explicit writing to the {{OffsetStorageWriter}}, that might happen
before the offset in the {{SourceRecord}} is captured.
Bottom line is that connectors need to be able to specify the order of
{{SourceRecords}} and offset updates, and that likely means they all need to be
sent through the same poll mechanism.
was (Author: rhauch):
The problem with the connector directly using {{OffsetStorageWriter}} is that
it cannot guarantee order relative to the source records that Kafka Connect is
already processing. In my cases, the offset/partition should be updated as part
of the sequence of normal source records, and that order must be maintained.
The best and simplest example is a connector that still wants to record that it
is still making progress in its source, but for whatever reason is not
producing any source records.
> Allow Kafka Connect source tasks to produce offset without writing to topics
> ----------------------------------------------------------------------------
>
> Key: KAFKA-3821
> URL: https://issues.apache.org/jira/browse/KAFKA-3821
> Project: Kafka
> Issue Type: Improvement
> Components: KafkaConnect
> Affects Versions: 0.9.0.1
> Reporter: Randall Hauch
> Labels: needs-kip
>
> Provide a way for a {{SourceTask}} implementation to record a new offset for
> a given partition without necessarily writing a source record to a topic.
> Consider a connector task that uses the same offset when producing an unknown
> number of {{SourceRecord}} objects (e.g., it is taking a snapshot of a
> database). Once the task completes those records, the connector wants to
> update the offsets (e.g., the snapshot is complete) but has no more records
> to be written to a topic. With this change, the task could simply supply an
> updated offset.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)