I have a need for one of my SourceConnector implementations to configure a 
bunch of tasks and, when those are all “done”, request a task reconfiguration 
so that it can run a single task. Think: many tasks to make snapshot of 
database tables, then when those are completed reconfigure itself so that it 
then started _one_ task to read the transaction log.

Unfortunately, I can’t figure out a way for the connector to “monitor” the 
progress of its tasks, especially when those tasks are distributed across the 
cluster. The only way I can think of to get around this is to have my connector 
start *one* task that performs the snapshot and then starts reading the 
transaction log. Unfortunately, that means to parallelize the snapshotting 
work, the task would need to manage its own threads. That’s possible, but 
undesirable for many reasons, not the least of which is that the work can’t be 
distributed as multiple tasks amongst the cluster of Kafka Connect workers.

On the other hand, a simple enhancement to Kafka Connect would make this very 
easy: add to the ConnectorContext a method that returned the 
OffsetStorageReader. The connector could start a thread to periodically poll 
the offsets for various partitions, and effectively watch the progress of the 
tasks. Not only that, the connector’s 'taskConfigs(int)’ method could use the 
OffsetStorageReader to read previously-recorded offsets to more intelligently 
configure its tasks. This seems very straightforward, backward compatible, and 
non-intrusive.

Is there any interest in this? If so, I can create an issue and work on a pull 
request.

Best regards,

Randall Hauch

Reply via email to