I have a need for one of my SourceConnector implementations to configure a bunch of tasks and, when those are all “done”, request a task reconfiguration so that it can run a single task. Think: many tasks to make snapshot of database tables, then when those are completed reconfigure itself so that it then started _one_ task to read the transaction log.
Unfortunately, I can’t figure out a way for the connector to “monitor” the progress of its tasks, especially when those tasks are distributed across the cluster. The only way I can think of to get around this is to have my connector start *one* task that performs the snapshot and then starts reading the transaction log. Unfortunately, that means to parallelize the snapshotting work, the task would need to manage its own threads. That’s possible, but undesirable for many reasons, not the least of which is that the work can’t be distributed as multiple tasks amongst the cluster of Kafka Connect workers. On the other hand, a simple enhancement to Kafka Connect would make this very easy: add to the ConnectorContext a method that returned the OffsetStorageReader. The connector could start a thread to periodically poll the offsets for various partitions, and effectively watch the progress of the tasks. Not only that, the connector’s 'taskConfigs(int)’ method could use the OffsetStorageReader to read previously-recorded offsets to more intelligently configure its tasks. This seems very straightforward, backward compatible, and non-intrusive. Is there any interest in this? If so, I can create an issue and work on a pull request. Best regards, Randall Hauch