Re: Is watermark always set using processing time or event time or both?

Jacek Laskowski Mon, 04 Sep 2017 00:06:06 -0700

Hi,

It's by default event time-based as there's no way to define the
column using withWatermark operator.


See 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset@withWatermark(eventTime:String,delayThreshold:String):org.apache.spark.sql.Dataset[T]

But...

Given your initial Dataset can have no event time column you can
auto-generate one using current_date or current_timestamp or some
other way at processing time that would give you the other option (at
processing time).

And the last but not least...

In the most generic solution using
KeyValueGroupedDataset.flatMapGroupsWithState, you can pre-define the
strategies or write a custom one. That's why they call it a solution
for an "arbitrary aggregation".

* 
http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.KeyValueGroupedDataset

* https://youtu.be/JAb4FIheP28

Pozdrawiam,
Jacek Laskowski
----
https://about.me/JacekLaskowski
Spark Structured Streaming (Apache Spark 2.2+)
https://bit.ly/spark-structured-streaming
Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark
Follow me at https://twitter.com/jaceklaskowski


On Fri, Sep 1, 2017 at 8:15 PM, kant kodali <kanth...@gmail.com> wrote:
> Is watermark always set using processing time or event time or both?

---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Is watermark always set using processing time or event time or both?

Reply via email to