[ 
https://issues.apache.org/jira/browse/KAFKA-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564900#comment-16564900
 ] 

Gunnar Morling commented on KAFKA-3821:
---------------------------------------

One other alternative came to my mind which avoids modelling 
{{OffsetSourceRecord}} as a sub-class of {{SourceRecord}}. There could be this 
new method:

{code}
interface SourceTask {
    default OffsetPosition getOffset() {
       return null;
    }
}
{code}

This method would be called by Kafka Connect always after calling {{poll()}}. 
{{OffsetPosition}} would be a container containing {{Map<String,?> 
sourcePartition}} and {{Map<String,?> sourceOffset}}. The default 
implementation would be a default method returning null, i.e. the change would 
be backwards-compatible.

If a connector implements the new method, it can return its current source 
offset, without emitting another actual source record (by returning an empty 
list from {{poll()}}). This would address the two use cases we have in Debezium 
for this:

* Emit an offset indicating that an initial DB snapshot has been completed 
after the last snapshot record has been emitted
* Regularly emit the processed offsets from the source DB (e.g. MySQL binlog 
position) also if we don't emit any actual source records for a longer period 
of time. Currently it can happen due to filter configuration (the user is only 
interested in capturing some of the tables from their source DB) that we 
process the DB logs for a long time without a way communicate the processed 
offsets to Kafka Connect. This will cause large parts of the log to be 
reprocessed after a connector restart and also causes larger parts of the logs 
than needed to be retained in the source DB).

Would that be an acceptable way forward? I've come to think that modelling 
{{OffsetSourceRecord}} just isn't right; it feels a bit like Java's {{Stack}} 
class which extends {{List}} and that way exposes lots of methods which 
shouldn't exist in its API.

> Allow Kafka Connect source tasks to produce offset without writing to topics
> ----------------------------------------------------------------------------
>
>                 Key: KAFKA-3821
>                 URL: https://issues.apache.org/jira/browse/KAFKA-3821
>             Project: Kafka
>          Issue Type: Improvement
>          Components: KafkaConnect
>    Affects Versions: 0.9.0.1
>            Reporter: Randall Hauch
>            Priority: Major
>              Labels: needs-kip
>             Fix For: 2.1.0
>
>
> Provide a way for a {{SourceTask}} implementation to record a new offset for 
> a given partition without necessarily writing a source record to a topic.
> Consider a connector task that uses the same offset when producing an unknown 
> number of {{SourceRecord}} objects (e.g., it is taking a snapshot of a 
> database). Once the task completes those records, the connector wants to 
> update the offsets (e.g., the snapshot is complete) but has no more records 
> to be written to a topic. With this change, the task could simply supply an 
> updated offset.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to