Re: dropDuplicates and watermark in structured streaming

2020-02-28 Thread Tathagata Das
why do you have two watermarks? once you apply the watermark to a column (i.e., "time"), it can be used in all later operations as long as the column is preserved. So the above code should be equivalent to df.withWarmark("time","window size").dropDulplicates("id").groupBy(window("time","window siz

Re: dropDuplicates and watermark in structured streaming

2020-02-27 Thread Tathagata Das
1. Yes. All times in event time, not processing time. So you may get 10AM event time data at 11AM processing time, but it will still be compared again all data within 9-10AM event times. 2. Show us your code. On Thu, Feb 27, 2020 at 2:30 AM lec ssmi wrote: > Hi: > I'm new to structured stre