Hi Sanjeev, Just a couple of thoughts here. It seems to me that the BatchSource API is a bit complicated and it can be achieved by using existing functions framework.
- BatchSourceTrigger: can be implemented using a one-instance function. That is used for discovering the batch source tasks and returning the discovered tasks. So the discovered tasks are published to its output topic. - BatchSource: can be implemented using a function that is receiving the batch source tasks and execute the source task. So it seems that this can be achieved using the existing framework by combining two functions together. It seems that we can achieve with a much clearer approach and keep the function & connector API relatively simple and consistent. Thoughts? - Sijie On Wed, May 20, 2020 at 8:33 AM Sanjeev Kulkarni <sanjee...@gmail.com> wrote: > Pinging the community about this. Would love feedback on this. > Thanks! > > On Wed, May 13, 2020 at 10:34 PM Sanjeev Kulkarni <sanjee...@gmail.com> > wrote: > > > Hi all, > > > > The current interfaces for sources in Pulsar IO are geared towards > > streaming sources where data is available on a continuous basis. There > > exist a whole bunch of data sources where data is not available on a > > continuous/streaming fashion, but rather arrives periodically/in spurts. > > These set of 'Batch Sources' have a set of common characteristics that > > might warrant framework level support in Pulsar IO. > > > > Jerry and myself have jotted down the ideas around this in PIP-65. Please > > review it and let us know what you think. > > > > > > > https://github.com/apache/pulsar/wiki/PIP-65:-Adapting-Pulsar-IO-Sources-to-support-Batch-Sources > > > > Thanks! > > >