Re: [Structured Streaming] Robust watermarking calculation with future timestamps

2019-11-14 Thread Jungtaek Lim
(dropping user@ as cross-posting mailing lists for mail threads would bother both lists, and it seems more appropriate to dev@) AFAIK there's no API for custom watermark, and you're right picking max timestamp would introduce the issues you provided. Other streaming frameworks may pick min timesta

[Structured Streaming] Robust watermarking calculation with future timestamps

2019-11-13 Thread Anastasios Zouzias
Hi all, We currently have the following issue with a Spark Structured Streaming (SS) application. The application reads messages from thousands of source systems, stores them in Kafka and Spark aggregates them using SS and watermarking (15 minutes). The root problem is that a few of the source sy