Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-21 Thread Thomas Weise
Here is a summary of where we are at with the PR: * Added capability to construct sources at switch time through a factory interface. This can support all previously discussed scenarios. The simple case (sources with fixed start position) is still simple, but for scenarios that require deferred in

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-15 Thread Thomas Weise
Hi Arvid, Thanks for your reply --> On Mon, Jun 14, 2021 at 2:55 PM Arvid Heise wrote: > > Hi Thomas, > > Thanks for bringing this up. I think this is a tough nut to crack :/. > Imho 1 and 3 or 1+3 can work but it is ofc a pity if the source implementor > is not aware of HybridSource. I'm also w

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-14 Thread Arvid Heise
Hi Thomas, Thanks for bringing this up. I think this is a tough nut to crack :/. Imho 1 and 3 or 1+3 can work but it is ofc a pity if the source implementor is not aware of HybridSource. I'm also worried that we may not have a universal interface to specify start offset/time. I guess it also would

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-11 Thread Thomas Weise
Thanks for the suggestions and feedback on the PR. A variation of hybrid source that can switch back and forth was brought up before and it is something that will be eventually required. It was also suggested by Stephan that in the future there may be more than one implementation of hybrid source

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-07 Thread Steven Wu
> hybrid sounds to me more like the source would constantly switch back and forth Initially, the focus of hybrid source is more like a sequenced chain. But in the future it would be cool that hybrid sources can intelligently switch back and forth between historical data source (like Iceberg) and

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-07 Thread Arvid Heise
Sorry for joining the party so late, but it's such an interesting FLIP with a huge impact that I wanted to add my 2 cents. [1] I'm mirroring some basic question from the PR review to this thread because it's about the name: We could rename the thing to ConcatenatedSource(s), SourceSequence, or sim

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-06 Thread Steven Wu
> Converter function relies on the specific enumerator capabilities to set the new start position (e.g. fileSourceEnumerator.getEndTimestamp() and kafkaSourceEnumerator.setTimestampOffsetsInitializer(..) I guess the premise is that a converter is for a specific tuple of (upstream source, downstrea

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-06 Thread Thomas Weise
Hi Steven, Thank you for the thorough review of the PR and for bringing this back to the mailing list. All, I updated the FLIP-150 page to highlight aspects in which the PR deviates from the original proposal [1]. The goal would be to update the FLIP soon and bring it to a vote, as previously su

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-06-01 Thread Steven Wu
discussed the PR with Thosmas offline. Thomas, please correct me if I missed anything. Right now, the PR differs from the FLIP-150 doc regarding the converter. * Current PR uses the enumerator checkpoint state type as the input for the converter * FLIP-150 defines a new EndStateT interface. It see

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-05-20 Thread Thomas Weise
Hi Nicholas, Thanks for taking a look at the PR! 1. Regarding switching mechanism: There has been previous discussion in this thread regarding the pros and cons of how the switching can be exposed to the user. With fixed start positions, no special switching interface to transfer information be

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-05-17 Thread Nicholas Jiang
Hi Thomas, Sorry for later reply for your POC. I have reviewed the based abstract implementation of your pull request: https://github.com/apache/flink/pull/15924. IMO, for the switching mechanism, this level of abstraction is not concise enough, which doesn't make connector contribution easier.

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-04-25 Thread Becket Qin
Thanks for the clarification, Thomas. Yes, that makes sense to me. Cheers, Jiangjie (Becket) Qin On Mon, Apr 26, 2021 at 1:03 AM Thomas Weise wrote: > Hi Becket, > > I agree and am not planning to hard wire a specific combination of > sources (like S3 + Kafka). That also wouldn't help for the

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-04-25 Thread Thomas Weise
Hi Becket, I agree and am not planning to hard wire a specific combination of sources (like S3 + Kafka). That also wouldn't help for the use case I want to address, because there are customized connectors that we need to be able to plug in. Rather, the suggested simplification would be for the fl

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-04-24 Thread Becket Qin
Sorry for the late reply. Starting from a specific connector sounds reasonable to me. That said, I would suggest to keep the possibility of future generalization as much as possible. We have already seen some variation of source combinations from different users, HDFS + Kafka, S3 + Kafka, S3 + SQL

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-04-17 Thread Stephan Ewen
Thanks, Thomas! @Becket and @Nicholas - would you be ok with that approach? On Thu, Apr 15, 2021 at 6:30 PM Thomas Weise wrote: > Hi Stephan, > > Thanks for the feedback! > > I agree with the approach of starting with a simple implementation > that can address a well understood, significant po

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-04-15 Thread Thomas Weise
Hi Stephan, Thanks for the feedback! I agree with the approach of starting with a simple implementation that can address a well understood, significant portion of use cases. I'm planning to continue work on the prototype that I had shared. There is production level usage waiting for it fairly so

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-04-13 Thread Stephan Ewen
Thanks all for this discussion. Looks like there are lots of ideas and folks that are eager to do things, so let's see how we can get this moving. My take on this is the following: There will probably not be one Hybrid source, but possibly multiple ones, because of different strategies/requiremen

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-03-24 Thread Thomas Weise
Hi, As mentioned in my previous email, I had been working on a prototype for the hybrid source. You can find it at https://github.com/tweise/flink/pull/1 It contains: * Switching with configurable chain of sources * Fixed or dynamic start positions * Test with MockSource and FileSource The purp

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-03-14 Thread Thomas Weise
Hi Nicholas, Thanks for the reply. I had implemented a small PoC. It switches a configurable sequence of sources with predefined bounds. I'm using the unmodified MockSource for illustration. It does not require a "Switchable" interface. I looked at the code you shared and the delegation and signal

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-03-08 Thread Nicholas Jiang
Hi Kezhu, Thanks for your detailed points for the Hybrid Source. I follow your opinions and make a corresponding explanation as follows: 1.Would the Hybrid Source be possible to use this feature to switch/chain multiple homogeneous sources? "HybridSource" supports to switch/chain multiple homoge

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-03-08 Thread Nicholas Jiang
Hi Thomas, Thanks for your detailed reply for the design of Hybrid Source. I would reply to the questions you mentioned as follows: 1.Has there been related activity since the FLIP was created? Yes, the FLIP has the initial version of Hybrid Source implementation. You'd better refer to the reposit

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-02-04 Thread Kezhu Wang
Hi all, Thanks for starting the discussion! I think there are differences between start-position and checkpoint-state. Checkpoint states are actually intermediate progress-capture in reading data. Source clients don't care about it in coding phase, there are purely implementation details. While s

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-02-03 Thread Thomas Weise
Thanks for initiating this discussion and creating the proposal! I would like to contribute to this effort. Has there been related activity since the FLIP was created? If not, I would like to start work on a PoC to validate the design. Questions/comments: There could be more use cases for a hyb

Re: [DISCUSS]FLIP-150: Introduce Hybrid Source

2021-01-08 Thread Aljoscha Krettek
Hi Nicholas, Thanks for starting the discussion! I think we might be able to simplify this a bit and re-use existing functionality. There is already `Source.restoreEnumerator()` and `SplitEnumerator.snapshotState(). This seems to be roughly what the Hybrid Source needs. When the initial sou

[DISCUSS]FLIP-150: Introduce Hybrid Source

2020-11-03 Thread Nicholas Jiang
Hi devs, I'd like to start a new FLIP to introduce the Hybrid Source. The hybrid source is a source that contains a list of concrete sources. The hybrid source reads from each contained source in the defined order. It switches from source A to the next source B when source A finishes. In practice