Perfect! Thanks a lot. On Mon, Apr 10, 2017 at 1:39 PM, Tathagata Das <tathagata.das1...@gmail.com> wrote:
> The trigger interval is optionally specified in the writeStream option > before start. > > val windowedCounts = words.groupBy( > window($"timestamp", "24 hours", "24 hours"), > $"word" > ).count() > .writeStream > .trigger(ProcessingTime("10 seconds")) // optional > .format("memory") > .queryName("tableName") > .start() > > See the full example here - https://github.com/apache/ > spark/blob/master/examples/src/main/scala/org/apache/ > spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala > > > On Mon, Apr 10, 2017 at 12:55 PM, kant kodali <kanth...@gmail.com> wrote: > >> Thanks again! Looks like the update mode is not available in 2.1 (which >> seems to be the latest version as of today) and I am assuming there will be >> a way to specify trigger interval with the next release because with the >> following code I don't see a way to specify trigger interval. >> >> val windowedCounts = words.groupBy( >> window($"timestamp", "24 hours", "24 hours"), >> $"word").count() >> >> >> On Mon, Apr 10, 2017 at 12:32 PM, Michael Armbrust < >> mich...@databricks.com> wrote: >> >>> It sounds like you want a tumbling window (where the slide and duration >>> are the same). This is the default if you give only one interval. You >>> should set the output mode to "update" (i.e. output only the rows that have >>> been updated since the last trigger) and the trigger to "1 second". >>> >>> Try thinking about the batch query that would produce the answer you >>> want. Structured streaming will figure out an efficient way to compute >>> that answer incrementally as new data arrives. >>> >>> On Mon, Apr 10, 2017 at 12:20 PM, kant kodali <kanth...@gmail.com> >>> wrote: >>> >>>> Hi Michael, >>>> >>>> Thanks for the response. I guess I was thinking more in terms of the >>>> regular streaming model. so In this case I am little confused what my >>>> window interval and slide interval be for the following case? >>>> >>>> I need to hold a state (say a count) for 24 hours while capturing all >>>> its updates and produce results every second. I also need to reset the >>>> state (the count) back to zero every 24 hours. >>>> >>>> >>>> >>>> >>>> >>>> >>>> On Mon, Apr 10, 2017 at 11:49 AM, Michael Armbrust < >>>> mich...@databricks.com> wrote: >>>> >>>>> Nope, structured streaming eliminates the limitation that >>>>> micro-batching should affect the results of your streaming query. Trigger >>>>> is just an indication of how often you want to produce results (and if you >>>>> leave it blank we just run as quickly as possible). >>>>> >>>>> To control how tuples are grouped into a window, take a look at the >>>>> window >>>>> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#window-operations-on-event-time> >>>>> function. >>>>> >>>>> On Thu, Apr 6, 2017 at 10:26 AM, kant kodali <kanth...@gmail.com> >>>>> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> Is the trigger interval mentioned in this doc >>>>>> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html> >>>>>> the same as batch interval in structured streaming? For example I have a >>>>>> long running receiver(not kafka) which sends me a real time stream I want >>>>>> to use window interval, slide interval of 24 hours to create the Tumbling >>>>>> window effect but I want to process updates every second. >>>>>> >>>>>> Thanks! >>>>>> >>>>> >>>>> >>>> >>> >> >