Here is a summary of where we are at with the PR:
* Added capability to construct sources at switch time through a
factory interface. This can support all previously discussed
scenarios. The simple case (sources with fixed start position) is
still simple, but for scenarios that require deferred in
Hi Arvid,
Thanks for your reply -->
On Mon, Jun 14, 2021 at 2:55 PM Arvid Heise wrote:
>
> Hi Thomas,
>
> Thanks for bringing this up. I think this is a tough nut to crack :/.
> Imho 1 and 3 or 1+3 can work but it is ofc a pity if the source implementor
> is not aware of HybridSource. I'm also w
Hi Thomas,
Thanks for bringing this up. I think this is a tough nut to crack :/.
Imho 1 and 3 or 1+3 can work but it is ofc a pity if the source implementor
is not aware of HybridSource. I'm also worried that we may not have a
universal interface to specify start offset/time.
I guess it also would
Thanks for the suggestions and feedback on the PR.
A variation of hybrid source that can switch back and forth was
brought up before and it is something that will be eventually
required. It was also suggested by Stephan that in the future there
may be more than one implementation of hybrid source
> hybrid sounds to me more like the source would constantly switch back and
forth
Initially, the focus of hybrid source is more like a sequenced chain.
But in the future it would be cool that hybrid sources can intelligently
switch back and forth between historical data source (like Iceberg) and
Sorry for joining the party so late, but it's such an interesting FLIP with
a huge impact that I wanted to add my 2 cents. [1]
I'm mirroring some basic question from the PR review to this thread because
it's about the name:
We could rename the thing to ConcatenatedSource(s), SourceSequence, or
sim
> Converter function relies on the specific enumerator capabilities to set
the new start position (e.g.
fileSourceEnumerator.getEndTimestamp() and
kafkaSourceEnumerator.setTimestampOffsetsInitializer(..)
I guess the premise is that a converter is for a specific tuple of
(upstream source, downstrea
Hi Steven,
Thank you for the thorough review of the PR and for bringing this back
to the mailing list.
All,
I updated the FLIP-150 page to highlight aspects in which the PR
deviates from the original proposal [1]. The goal would be to update
the FLIP soon and bring it to a vote, as previously su
discussed the PR with Thosmas offline. Thomas, please correct me if I
missed anything.
Right now, the PR differs from the FLIP-150 doc regarding the converter.
* Current PR uses the enumerator checkpoint state type as the input for the
converter
* FLIP-150 defines a new EndStateT interface.
It see
Hi Nicholas,
Thanks for taking a look at the PR!
1. Regarding switching mechanism:
There has been previous discussion in this thread regarding the pros
and cons of how the switching can be exposed to the user.
With fixed start positions, no special switching interface to transfer
information be
Hi Thomas,
Sorry for later reply for your POC. I have reviewed the based abstract
implementation of your pull request:
https://github.com/apache/flink/pull/15924. IMO, for the switching
mechanism, this level of abstraction is not concise enough, which doesn't
make connector contribution easier.
Thanks for the clarification, Thomas. Yes, that makes sense to me.
Cheers,
Jiangjie (Becket) Qin
On Mon, Apr 26, 2021 at 1:03 AM Thomas Weise wrote:
> Hi Becket,
>
> I agree and am not planning to hard wire a specific combination of
> sources (like S3 + Kafka). That also wouldn't help for the
Hi Becket,
I agree and am not planning to hard wire a specific combination of
sources (like S3 + Kafka). That also wouldn't help for the use case I
want to address, because there are customized connectors that we need
to be able to plug in.
Rather, the suggested simplification would be for the fl
Sorry for the late reply. Starting from a specific connector sounds
reasonable to me.
That said, I would suggest to keep the possibility of future generalization
as much as possible. We have already seen some variation of source
combinations from different users, HDFS + Kafka, S3 + Kafka, S3 + SQL
Thanks, Thomas!
@Becket and @Nicholas - would you be ok with that approach?
On Thu, Apr 15, 2021 at 6:30 PM Thomas Weise wrote:
> Hi Stephan,
>
> Thanks for the feedback!
>
> I agree with the approach of starting with a simple implementation
> that can address a well understood, significant po
Hi Stephan,
Thanks for the feedback!
I agree with the approach of starting with a simple implementation
that can address a well understood, significant portion of use cases.
I'm planning to continue work on the prototype that I had shared.
There is production level usage waiting for it fairly so
Thanks all for this discussion. Looks like there are lots of ideas and
folks that are eager to do things, so let's see how we can get this moving.
My take on this is the following:
There will probably not be one Hybrid source, but possibly multiple ones,
because of different strategies/requiremen
Hi,
As mentioned in my previous email, I had been working on a prototype for
the hybrid source.
You can find it at https://github.com/tweise/flink/pull/1
It contains:
* Switching with configurable chain of sources
* Fixed or dynamic start positions
* Test with MockSource and FileSource
The purp
Hi Nicholas,
Thanks for the reply. I had implemented a small PoC. It switches a
configurable sequence of sources with predefined bounds. I'm using the
unmodified MockSource for illustration. It does not require a "Switchable"
interface. I looked at the code you shared and the delegation and signal
Hi Kezhu,
Thanks for your detailed points for the Hybrid Source. I follow your
opinions and make a corresponding explanation as follows:
1.Would the Hybrid Source be possible to use this feature to switch/chain
multiple homogeneous sources?
"HybridSource" supports to switch/chain multiple homoge
Hi Thomas,
Thanks for your detailed reply for the design of Hybrid Source. I would
reply to the questions you mentioned as follows:
1.Has there been related activity since the FLIP was created?
Yes, the FLIP has the initial version of Hybrid Source implementation. You'd
better refer to the reposit
Hi all,
Thanks for starting the discussion!
I think there are differences between start-position and checkpoint-state.
Checkpoint states
are actually intermediate progress-capture in reading data. Source clients
don't care about it
in coding phase, there are purely implementation details. While
s
Thanks for initiating this discussion and creating the proposal!
I would like to contribute to this effort. Has there been related activity
since the FLIP was created?
If not, I would like to start work on a PoC to validate the design.
Questions/comments:
There could be more use cases for a hyb
Hi Nicholas,
Thanks for starting the discussion!
I think we might be able to simplify this a bit and re-use existing
functionality.
There is already `Source.restoreEnumerator()` and
`SplitEnumerator.snapshotState(). This seems to be roughly what the
Hybrid Source needs. When the initial sou
Hi devs,
I'd like to start a new FLIP to introduce the Hybrid Source. The hybrid
source is a source that contains a list of concrete sources. The hybrid
source reads from each contained source in the defined order. It switches
from source A to the next source B when source A finishes.
In practice
25 matches
Mail list logo