Hi Sanjeev,

Just a couple of thoughts here. It seems to me that the BatchSource API is
a bit complicated and it can be achieved by using existing functions
framework.

- BatchSourceTrigger: can be implemented using a one-instance function.
That is used for discovering the batch source tasks and returning the
discovered tasks. So the discovered tasks are published to its output topic.
- BatchSource: can be implemented using a function that is receiving the
batch source tasks and execute the source task.

So it seems that this can be achieved using the existing framework by
combining two functions together. It seems that we can achieve with a much
clearer approach and keep the function & connector API relatively simple
and consistent. Thoughts?

- Sijie

On Wed, May 20, 2020 at 8:33 AM Sanjeev Kulkarni <sanjee...@gmail.com>
wrote:

> Pinging the community about this. Would love feedback on this.
> Thanks!
>
> On Wed, May 13, 2020 at 10:34 PM Sanjeev Kulkarni <sanjee...@gmail.com>
> wrote:
>
> > Hi all,
> >
> > The current interfaces for sources in Pulsar IO are geared towards
> > streaming sources where data is available on a continuous basis. There
> > exist a whole bunch of data sources where data is not available on a
> > continuous/streaming fashion, but rather arrives periodically/in spurts.
> > These set of 'Batch Sources' have a set of common characteristics that
> > might warrant framework level support in Pulsar IO.
> >
> > Jerry and myself have jotted down the ideas around this in PIP-65. Please
> > review it and let us know what you think.
> >
> >
> >
> https://github.com/apache/pulsar/wiki/PIP-65:-Adapting-Pulsar-IO-Sources-to-support-Batch-Sources
> >
> > Thanks!
> >
>

Reply via email to