Hi Devin,
The complexity here is orchestrating multiple functions that notionally
form the same connector. Thus stopping a batch connector would be
equivalent of stopping two functions and the corresponding complexities of
dealing with failure in-between the calls. Same goes for other calls.
On Th
Hi Sijie,
As described in the proposal, there will be no change(either in the api or
in the runtime) for the existing streaming sources as all. The changes
proposed by this PIP are
1. Adding an explicit api for writing batch source.
2. Providing an executor based implementation for the above.
In th
I apologize for not fully understanding the context here, but is the
concern about using the existing function architecture the complexity of
needing two sequential operations in a function flow to be synchronous with
respect to transactions, such as to avoid race conditions and issues with
paralle
Hi Jerry,
I understand the concerns. I think it falls into a broker discussion of
function composition.
I am fine with the current proposal. But I wish that we don't introduce a
lot of specialized code in the runtime to just handle this use case. It
would be better if we can reuse the existing fu
Hi Sijie,
We have considered a two stag function as a way implement a "batch" source,
however because there are two independent functions, it adds complexity to
management especially when there are failures. The two functions will need
to be submitted and registered in an atomic fashion which can
Hi Sanjeev,
Just a couple of thoughts here. It seems to me that the BatchSource API is
a bit complicated and it can be achieved by using existing functions
framework.
- BatchSourceTrigger: can be implemented using a one-instance function.
That is used for discovering the batch source tasks and re
Pinging the community about this. Would love feedback on this.
Thanks!
On Wed, May 13, 2020 at 10:34 PM Sanjeev Kulkarni
wrote:
> Hi all,
>
> The current interfaces for sources in Pulsar IO are geared towards
> streaming sources where data is available on a continuous basis. There
> exist a whol
Hi all,
The current interfaces for sources in Pulsar IO are geared towards
streaming sources where data is available on a continuous basis. There
exist a whole bunch of data sources where data is not available on a
continuous/streaming fashion, but rather arrives periodically/in spurts.
These set