I am also in favor of supporting publishing the Flink watermark as snapshot
metadata in Flink sink by the committer operator. `IcebergFilesCommitter`
can override the `public void processWatermark(Watermark mark)` to
intercept the latest watermark value.
There was also a recent discussion (FLIP-1
I can comment on how Netflix did this. We added a watermark for arrival
time to each commit, produced from the lowest processing time across
partitions of the incoming data. That was added to the snapshot metadata
for each commit, along with relevant context like the processing region
because we ha
+1, we haven the same needs, hope the solution to this problem. Thanks.
+ Sundaram, as he may have some input.
regards,
Anjali.
On Fri, Aug 13, 2021 at 2:41 AM 1 wrote:
> Hi,all:
>
> I need to embed the iceberg table, which is regarded as real-time
> table, into our workflow. That is to say, Flink writes data into Iceberg
> table in real-time, I need something to
Hi,all:
I need to embed the iceberg table, which is regarded as real-time table, into
our workflow. That is to say, Flink writes data into Iceberg table in
real-time, I need something to indicate the data completeness on the ingestion
path so that downstream batch consumer jobs can be trigge