Hey Steven,
Your conclusion at this point sounds reasonable to me. That being said, I
think we need to consider a bit more about the extensibility of Flink in
the future. I would be happy to drive some efforts in that direction. So
later on, the timestamp alignment of Iceberg may be able to levera
might be the same as => might NOT be the same as
On Fri, May 6, 2022 at 8:13 PM Steven Wu wrote:
> The conclusion of this discussion could be that we don't see much value in
> leveraging FLIP-182 with Iceberg source. That would totally be fine.
>
> For me, one big sticking point is the alignment
The conclusion of this discussion could be that we don't see much value in
leveraging FLIP-182 with Iceberg source. That would totally be fine.
For me, one big sticking point is the alignment timestamp for the (Iceberg)
source might be the same as the Flink application watermark.
On Thu, May 5, 2
Option 1 sounds reasonable but I would be tempted to wait for a second
motivational use case before generalizing the framework. However I wouldn’t
oppose this extension if others feel it’s useful and good thing to do
Piotrek
> Wiadomość napisana przez Becket Qin w dniu 06.05.2022,
> o godz. 0
I think the key point here is essentially what information should Flink
expose to the user pluggables. Apparently split / local task watermark is
something many user pluggables would be interested in. Right now it is
calculated by the Flink framework but not exposed to the users space, i.e.
SourceR
On Wed, May 4, 2022 at 11:03 AM Steven Wu wrote:
> Any opinion on different timestamp for source alignment (vs Flink application
> watermark)? For Iceberg source, we might want to enforce alignment on kafka
> timestamp but Flink application watermark may use event time field from
> payload.
I
It seems that the iceberg source benefits from performing the
alignment in the enumerator and holding back the splits until they
actually can be processed. That is probably true for any similar
source that assigns work in smaller increments as centralizing the
"ready to process" decision in the enu
Hi Steven,
Ok, thanks for the clarification. I'm not sure how much could be leveraged?
Maybe just re-using the watermark alignment configuration? Please correct
me if I'm wrong, but I think for the sole purpose of this use case, I don't
see a good motivation behind expanding our APIs. Clearly this
Piotr,
With FLIP-27, Iceberg source already implemented alignment by tracking
watermark and holding back split assignment when necessary.
The purpose of this discussion is to see if Iceberg source can leverage
some of the watermark alignment work from Flink framework.
Thanks,
Steven
On Thu, May
Ok, I see. Thanks to both of you for the explanation.
Do we need changes to Apache Flink for this feature? Can it be implemented
in the Sources without changes in the framework? I presume source can
access min/max watermark from the split, so as long as it also knows
exactly which splits have fini
Piotr, thanks a lot for your feedback.
> I can see this being an issue if the existence of too many blocked splits
is occupying too many resources.
This is not desirable. Eagerly assigning many splits to a reader can defeat
the benefits of pull based dynamic split assignments. Iceberg readers
req
Hey Piotr,
I think the mechanism FLIP-182 provided is a reasonable default one, which
ensures the watermarks are only drifted by an upper bound. However,
admittedly there are also other strategies for different purposes.
In the Iceberg case, I am not sure if a static strictly allowed watermark
dr
Hi Steven,
Isn't this redundant to FLIP-182 and FLIP-217? Can not Iceberg just emit
all splits and let FLIP-182/FLIP-217 handle the watermark alignment and
block the splits that are too much into the future? I can see this being an
issue if the existence of too many blocked splits is occupying too
add dev@ group to the thread as Thomas suggested
Arvid,
The scenario 3 (Dynamic assignment + temporary no split) in the FLIP-180
(idleness) can happen to Iceberg source alignment, as readers can be
temporarily starved due to the holdback by the enumerator when assigning
new splits upon request.
14 matches
Mail list logo