Hi, I'm creating a Kafka source connector that load's some data that is in the form of individual files that are being created continuously. I was planning initially to create one task per file - that would allow the framework to balance the work across all workers in a straightforward way. In the poll() method of the source task, I would read and return all records in the file, and when poll would reach the end of the file, it would terminate and the task would be "finished".
This notion of a task being "finished" and running out of things to do is where I ran into a problem. It doesn't seem to fit into connect's model. The worker thread calls poll() continuously on a source task & there's no simple way in the framework to finish a task (for example: returning null from poll will cause the worker thread to call poll again after a short pause). >From this, I believe that source tasks are supposed to produce an *infinite* >stream of data - and I should allocate the work between tasks in some other >fashion than make each individual file a task. Is this correct? Thanks, Gautam