Re: Handling of watermark in structured streaming

Joe Ammann Tue, 14 May 2019 08:06:41 -0700

Hi Suket

Sorry, this was a typo in the pseudo-code I sent. Of course that what you 
suggested (using the same eventtime attribute for the watermark and the window) 
is what my code does in reality. Sorry, to confuse people.


On 5/14/19 4:14 PM, suket arora wrote:
> Hi Joe,
> As per the spark structured streaming documentation and I quote
> |"withWatermark| must be called on the same column as the timestamp column 
> used in the aggregate. For example, |df.withWatermark("time", "1 
> min").groupBy("time2").count()| is invalid in Append output mode, as 
> watermark is defined on a different column from the aggregation column."
> 
> *And after referring  the following code *
>          // Group the data by window and word and compute the count of each 
> group
> 
> |val windowedCounts = words .withWatermark("timestamp", "10 minutes") 
> .groupBy( window($"timestamp", "10 minutes", "5 minutes"), $"word") .count()|
> 
> 
> I would suggest you to try following code
> 
>  df = inputStream.withWatermark("eventtime", "20 
> seconds").groupBy($"sharedId", window($"eventtime","20 seconds", "10 
> seconds"))
> 
> And If this doesn't work, you can try trigger on query.

Can you maybe explain what you mean by "try trigger on query" - I don't 
understand that.

-- 
CU, Joe

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Handling of watermark in structured streaming

Reply via email to