Re: Structured Streaming Kafka - Weird behavior with performance and logs

2019-05-14 Thread Suket Arora
// Continuous trigger with one-second checkpointing intervaldf.writeStream .format("console") .trigger(Trigger.Continuous("1 second")) .start() On Tue, 14 May 2019 at 22:10, suket arora wrote: > Hey Austin, > > If you truly want to process as a stream

Re: Handling of watermark in structured streaming

2019-05-14 Thread Suket Arora
df = inputStream.withWatermark("eventtime", "20 seconds").groupBy("sharedId", window("20 seconds", "10 seconds") // ProcessingTime trigger with two-seconds micro-batch interval df.writeStream .format("console") .trigger(Trigger.ProcessingTime("2 seconds")) .start() On Tue, 14 May 201

Re: Handling of watermark in structured streaming

2019-05-14 Thread suket arora
Hi Joe, As per the spark structured streaming documentation and I quote "withWatermark must be called on the same column as the timestamp column used in the aggregate. For example, df.withWatermark("time", "1 min").groupBy("time2").count() is invalid in Append output mode, as watermark is defined o