[ https://issues.apache.org/jira/browse/KAFKA-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564900#comment-16564900 ]
Gunnar Morling commented on KAFKA-3821: --------------------------------------- One other alternative came to my mind which avoids modelling {{OffsetSourceRecord}} as a sub-class of {{SourceRecord}}. There could be this new method: {code} interface SourceTask { default OffsetPosition getOffset() { return null; } } {code} This method would be called by Kafka Connect always after calling {{poll()}}. {{OffsetPosition}} would be a container containing {{Map<String,?> sourcePartition}} and {{Map<String,?> sourceOffset}}. The default implementation would be a default method returning null, i.e. the change would be backwards-compatible. If a connector implements the new method, it can return its current source offset, without emitting another actual source record (by returning an empty list from {{poll()}}). This would address the two use cases we have in Debezium for this: * Emit an offset indicating that an initial DB snapshot has been completed after the last snapshot record has been emitted * Regularly emit the processed offsets from the source DB (e.g. MySQL binlog position) also if we don't emit any actual source records for a longer period of time. Currently it can happen due to filter configuration (the user is only interested in capturing some of the tables from their source DB) that we process the DB logs for a long time without a way communicate the processed offsets to Kafka Connect. This will cause large parts of the log to be reprocessed after a connector restart and also causes larger parts of the logs than needed to be retained in the source DB). Would that be an acceptable way forward? I've come to think that modelling {{OffsetSourceRecord}} just isn't right; it feels a bit like Java's {{Stack}} class which extends {{List}} and that way exposes lots of methods which shouldn't exist in its API. > Allow Kafka Connect source tasks to produce offset without writing to topics > ---------------------------------------------------------------------------- > > Key: KAFKA-3821 > URL: https://issues.apache.org/jira/browse/KAFKA-3821 > Project: Kafka > Issue Type: Improvement > Components: KafkaConnect > Affects Versions: 0.9.0.1 > Reporter: Randall Hauch > Priority: Major > Labels: needs-kip > Fix For: 2.1.0 > > > Provide a way for a {{SourceTask}} implementation to record a new offset for > a given partition without necessarily writing a source record to a topic. > Consider a connector task that uses the same offset when producing an unknown > number of {{SourceRecord}} objects (e.g., it is taking a snapshot of a > database). Once the task completes those records, the connector wants to > update the offsets (e.g., the snapshot is complete) but has no more records > to be written to a topic. With this change, the task could simply supply an > updated offset. -- This message was sent by Atlassian JIRA (v7.6.3#76005)