Re: [DISCUSS] PIP-65: Adapting Pulsar IO Sources to support Batch Sources

2020-05-22 Thread Sanjeev Kulkarni
Hi Devin, The complexity here is orchestrating multiple functions that notionally form the same connector. Thus stopping a batch connector would be equivalent of stopping two functions and the corresponding complexities of dealing with failure in-between the calls. Same goes for other calls. On Th

Re: [DISCUSS] PIP-65: Adapting Pulsar IO Sources to support Batch Sources

2020-05-22 Thread Sanjeev Kulkarni
Hi Sijie, As described in the proposal, there will be no change(either in the api or in the runtime) for the existing streaming sources as all. The changes proposed by this PIP are 1. Adding an explicit api for writing batch source. 2. Providing an executor based implementation for the above. In th

Re: [DISCUSS] PIP-65: Adapting Pulsar IO Sources to support Batch Sources

2020-05-21 Thread Devin Bost
I apologize for not fully understanding the context here, but is the concern about using the existing function architecture the complexity of needing two sequential operations in a function flow to be synchronous with respect to transactions, such as to avoid race conditions and issues with paralle

Re: [DISCUSS] PIP-65: Adapting Pulsar IO Sources to support Batch Sources

2020-05-20 Thread Sijie Guo
Hi Jerry, I understand the concerns. I think it falls into a broker discussion of function composition. I am fine with the current proposal. But I wish that we don't introduce a lot of specialized code in the runtime to just handle this use case. It would be better if we can reuse the existing fu

Re: [DISCUSS] PIP-65: Adapting Pulsar IO Sources to support Batch Sources

2020-05-20 Thread Jerry Peng
Hi Sijie, We have considered a two stag function as a way implement a "batch" source, however because there are two independent functions, it adds complexity to management especially when there are failures. The two functions will need to be submitted and registered in an atomic fashion which can

Re: [DISCUSS] PIP-65: Adapting Pulsar IO Sources to support Batch Sources

2020-05-20 Thread Sijie Guo
Hi Sanjeev, Just a couple of thoughts here. It seems to me that the BatchSource API is a bit complicated and it can be achieved by using existing functions framework. - BatchSourceTrigger: can be implemented using a one-instance function. That is used for discovering the batch source tasks and re

Re: [DISCUSS] PIP-65: Adapting Pulsar IO Sources to support Batch Sources

2020-05-20 Thread Sanjeev Kulkarni
Pinging the community about this. Would love feedback on this. Thanks! On Wed, May 13, 2020 at 10:34 PM Sanjeev Kulkarni wrote: > Hi all, > > The current interfaces for sources in Pulsar IO are geared towards > streaming sources where data is available on a continuous basis. There > exist a whol

[DISCUSS] PIP-65: Adapting Pulsar IO Sources to support Batch Sources

2020-05-13 Thread Sanjeev Kulkarni
Hi all, The current interfaces for sources in Pulsar IO are geared towards streaming sources where data is available on a continuous basis. There exist a whole bunch of data sources where data is not available on a continuous/streaming fashion, but rather arrives periodically/in spurts. These set