Hi, It's by default event time-based as there's no way to define the column using withWatermark operator.
See http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Dataset@withWatermark(eventTime:String,delayThreshold:String):org.apache.spark.sql.Dataset[T] But... Given your initial Dataset can have no event time column you can auto-generate one using current_date or current_timestamp or some other way at processing time that would give you the other option (at processing time). And the last but not least... In the most generic solution using KeyValueGroupedDataset.flatMapGroupsWithState, you can pre-define the strategies or write a custom one. That's why they call it a solution for an "arbitrary aggregation". * http://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.KeyValueGroupedDataset * https://youtu.be/JAb4FIheP28 Pozdrawiam, Jacek Laskowski ---- https://about.me/JacekLaskowski Spark Structured Streaming (Apache Spark 2.2+) https://bit.ly/spark-structured-streaming Mastering Apache Spark 2 https://bit.ly/mastering-apache-spark Follow me at https://twitter.com/jaceklaskowski On Fri, Sep 1, 2017 at 8:15 PM, kant kodali <kanth...@gmail.com> wrote: > Is watermark always set using processing time or event time or both? --------------------------------------------------------------------- To unsubscribe e-mail: user-unsubscr...@spark.apache.org