why do you have two watermarks? once you apply the watermark to a column
(i.e., "time"), it can be used in all later operations as long as the
column is preserved. So the above code should be equivalent to
df.withWarmark("time","window
size").dropDulplicates("id").groupBy(window("time","window siz
Such as :
df.withWarmark("time","window
size").dropDulplicates("id").withWatermark("time","real
watermark").groupBy(window("time","window size","window
size")).agg(count("id"))
can It make count(distinct id) success?
lec ssmi 于2020年2月28日周五 下午1:11写道:
> Such as :
> df.
Such as :
df.withWarmark("time","window
size").dropDulplicates("id").withWatermark("time","real
watermark").groupBy(window("time","window size","window
size")).agg(count("id"))
can It make count(distinct count) success?
Tathagata Das 于2020年2月28日周五 上午10:25写道:
> 1. Yes. All times
1. Yes. All times in event time, not processing time. So you may get 10AM
event time data at 11AM processing time, but it will still be compared
again all data within 9-10AM event times.
2. Show us your code.
On Thu, Feb 27, 2020 at 2:30 AM lec ssmi wrote:
> Hi:
> I'm new to structured stre
Hi:
I'm new to structured streaming. Because the built-in API cannot
perform the Count Distinct operation of Window, I want to use
dropDuplicates first, and then perform the window count.
But in the process of using, there are two problems:
1. Because it is streaming computing, in