The input dataset has multiple days worth of data, so I thought the
watermark should have been crossed. To debug, I changed the query to the
code below. My expectation was that since I am doing 1 day windows with
late arrivals permitted for 1 second, when it sees records for the next
day, it would
In append mode, the aggregation outputs a row only when the watermark has
been crossed and the corresponding aggregate is *final*, that is, will not
be updated any more.
See
http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#handling-late-data-and-watermarking
On Mon,
Hi,
I am running Spark 2.2 and trying out structured streaming. I have the
following code:
from pyspark.sql import functions as F
df=frame \
.withWatermark("timestamp","1 minute") \
.groupby(F.window("timestamp","1 day"),*groupby_cols) \
.agg(f.sum('bytes'))
query = frame.writeSt