Re: [DISCUSS] FLIP-276: Data Consistency of Streaming and Batch ETL in Flink and Table Store

Shammon FY Thu, 01 Dec 2022 02:49:29 -0800

Hi @Martijn

Thanks for your comments, and I'd like to reply to them

1. It sounds good to me, I'll update the content structure in FLIP later
and give the problems first.

2. "Each ETL job creates snapshots with checkpoint info on sink tables in
Table Store"  -> That reads like you're proposing that snapshots need to be
written to Table Store?

Yes. To support the data consistency in the FLIP, we need to get through
checkpoints in Flink and snapshots in store, this requires a close
combination of Flink and store implementation. In the first stage we plan
to implement it based on Flink and Table Store only, snapshots written to
external storage don't support consistency.

3. If you introduce a MetaService, it becomes the single point of failure
because it coordinates everything. But I can't find anything in the FLIP on
making the MetaService high available or how to deal with failovers there.

I think you raise a very important problem and I missed it in FLIP. The
MetaService is a single point and should support failover, we will do it in
future in the first stage we only support standalone mode, THX

4. The FLIP states under Rejected Alternatives "Currently watermark in
Flink cannot align data." which is not true, given that there is FLIP-182
https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources

Watermark alignment in FLIP-182 is different from requirements "watermark
align data" in our FLIP. FLIP-182 aims to fix watermark generation in
different sources for "slight imbalance or data skew", which means in some
cases the source must generate watermark even if they should not. When the
operator collects watermarks, the data processing is as described in our
FLIP, and the data cannot be aligned through the barrier like Checkpoint.

5. Given the MetaService role, it feels like this is introducing a tight
dependency between Flink and the Table Store. How pluggable is this
solution, given the changes that need to be made to Flink in order to
support this?

This is a good question, and I will try to expand it. Most of the work will
be completed in the Table Store, such as the new SplitEnumerator and Source
implementation. The changes in Flink are as followed:
1) Flink job should put its job id in context when creating source/sink to
help MetaService to create relationship between source and sink tables,
it's tiny
2) Notify a listener when job is terminated in Flink, and the listener
implementation in Table Store will send "delete event" to MetaService.
3) The changes are related to Flink Checkpoint includes
  a) Support triggering checkpoint with checkpoint id by SplitEnumerator
  b) Create the SplitEnumerator in Table Store with a strategy to perform
the specific checkpoint when all "SplitEnumerator"s in the job manager
trigger it.

Best,
Shammon

On Thu, Dec 1, 2022 at 3:43 PM Martijn Visser <martijnvis...@apache.org>
wrote:

> Hi all,
>
> A couple of first comments on this:
> 1. I'm missing the problem statement in the overall introduction. It
> immediately goes into proposal mode, I would like to first read what is the
> actual problem, before diving into solutions.
> 2. "Each ETL job creates snapshots with checkpoint info on sink tables in
> Table Store"  -> That reads like you're proposing that snapshots need to be
> written to Table Store?
> 3. If you introduce a MetaService, it becomes the single point of failure
> because it coordinates everything. But I can't find anything in the FLIP on
> making the MetaService high available or how to deal with failovers there.
> 4. The FLIP states under Rejected Alternatives "Currently watermark in
> Flink cannot align data." which is not true, given that there is FLIP-182
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-182%3A+Support+watermark+alignment+of+FLIP-27+Sources
>
> 5. Given the MetaService role, it feels like this is introducing a tight
> dependency between Flink and the Table Store. How pluggable is this
> solution, given the changes that need to be made to Flink in order to
> support this?
>
> Best regards,
>
> Martijn
>
>
> On Thu, Dec 1, 2022 at 4:49 AM Shammon FY <zjur...@gmail.com> wrote:
>
> > Hi devs:
> >
> > I'd like to start a discussion about FLIP-276: Data Consistency of
> > Streaming and Batch ETL in Flink and Table Store[1]. In the whole data
> > stream processing, there are consistency problems such as how to manage
> the
> > dependencies of multiple jobs and tables, how to define and handle E2E
> > delays, and how to ensure the data consistency of queries on flowing
> data?
> > This FLIP aims to support data consistency and answer these questions.
> >
> > I'v discussed the details of this FLIP with @Jingsong Lee and @libenchao
> > offline several times. We hope to support data consistency of queries on
> > tables, managing relationships between Flink jobs and tables and revising
> > tables on streaming in Flink and Table Store to improve the whole data
> > stream processing.
> >
> > Looking forward to your feedback.
> >
> > [1]
> >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-276%3A+Data+Consistency+of+Streaming+and+Batch+ETL+in+Flink+and+Table+Store
> >
> >
> > Best,
> > Shammon
> >
>

Re: [DISCUSS] FLIP-276: Data Consistency of Streaming and Batch ETL in Flink and Table Store

Reply via email to