[ 
https://issues.apache.org/jira/browse/KAFKA-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16020100#comment-16020100
 ] 

Randall Hauch edited comment on KAFKA-3821 at 5/22/17 8:13 PM:
---------------------------------------------------------------

The problem with the connector directly using {{OffsetStorageWriter}} is that 
it cannot guarantee order relative to the source records that Kafka Connect is 
already processing. In my cases, the offset/partition should be updated as part 
of the sequence of normal source records, and that order must be maintained.

The best and simplest example is a connector that still wants to record that it 
is still making progress in its source, but for whatever reason is not 
producing any source records.

But imagine a case where the connector just recorded an offset via 
{{OffsetStorageWriter}} and then immediately produces a new {{SourceRecord}} 
with a new offset. This order is important, and it's really bad if the offset 
of the {{SourceRecord}} gets written before the connector's call. 

Of course, the opposite case is bad, too: imagine the connector producing 
{{SourceRecord}} that is enqueued and not immediately processed, but the 
connector progresses a bit and wants to record its new offset. If it did the 
latter by explicit writing to the {{OffsetStorageWriter}}, that might happen 
before the offset in the {{SourceRecord}} is captured.

Bottom line is that connectors need to be able to specify the order of 
{{SourceRecords}} and offset updates, and that likely means they all need to be 
sent through the same poll mechanism.


was (Author: rhauch):
The problem with the connector directly using {{OffsetStorageWriter}} is that 
it cannot guarantee order relative to the source records that Kafka Connect is 
already processing. In my cases, the offset/partition should be updated as part 
of the sequence of normal source records, and that order must be maintained.

The best and simplest example is a connector that still wants to record that it 
is still making progress in its source, but for whatever reason is not 
producing any source records.

> Allow Kafka Connect source tasks to produce offset without writing to topics
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-3821
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3821
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 0.9.0.1
>            Reporter: Randall Hauch
>              Labels: needs-kip
>
> Provide a way for a {{SourceTask}} implementation to record a new offset for 
> a given partition without necessarily writing a source record to a topic.
> Consider a connector task that uses the same offset when producing an unknown 
> number of {{SourceRecord}} objects (e.g., it is taking a snapshot of a 
> database). Once the task completes those records, the connector wants to 
> update the offsets (e.g., the snapshot is complete) but has no more records 
> to be written to a topic. With this change, the task could simply supply an 
> updated offset.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to