Re: Identify watermark in the iceberg table properties

2021-08-22 Thread Steven Wu
I am also in favor of supporting publishing the Flink watermark as snapshot metadata in Flink sink by the committer operator. `IcebergFilesCommitter` can override the `public void processWatermark(Watermark mark)` to intercept the latest watermark value. There was also a recent discussion (FLIP-1

Re: Identify watermark in the iceberg table properties

2021-08-22 Thread Ryan Blue
I can comment on how Netflix did this. We added a watermark for arrival time to each commit, produced from the lowest processing time across partitions of the incoming data. That was added to the snapshot metadata for each commit, along with relevant context like the processing region because we ha

Re: Identify watermark in the iceberg table properties

2021-08-17 Thread Peidian Li
+1, we haven the same needs, hope the solution to this problem. Thanks.

Re: Identify watermark in the iceberg table properties

2021-08-16 Thread Anjali Norwood
+ Sundaram, as he may have some input. regards, Anjali. On Fri, Aug 13, 2021 at 2:41 AM 1 wrote: > Hi,all: > > I need to embed the iceberg table, which is regarded as real-time > table, into our workflow. That is to say, Flink writes data into Iceberg > table in real-time, I need something to

Identify watermark in the iceberg table properties

2021-08-13 Thread 1
Hi,all: I need to embed the iceberg table, which is regarded as real-time table, into our workflow. That is to say, Flink writes data into Iceberg table in real-time, I need something to indicate the data completeness on the ingestion path so that downstream batch consumer jobs can be trigge