Hi,

I'm creating a Kafka source connector that load's some data that is in the form 
of individual files that are being created continuously. I was planning 
initially to create one task per file - that would allow the framework to 
balance the work across all workers in a straightforward way. In the poll() 
method of the source task, I would read and return all records in the file, and 
when poll would reach the end of the file, it would terminate and the task 
would be "finished".

This notion of a task being "finished" and running out of things to do is where 
I ran into a problem. It doesn't seem to fit into connect's model. The worker 
thread calls poll() continuously on a source task & there's no simple way in 
the framework to finish a task (for example: returning null from poll will 
cause the worker thread to call poll again after a short pause).

>From this, I believe that source tasks are supposed to produce an *infinite* 
>stream of data - and I should allocate the work between tasks in some other 
>fashion than make each individual file a task.

Is this correct?

Thanks,
Gautam

Reply via email to