Re: Is the trigger interval the same as batch interval in structured streaming?

kant kodali Mon, 10 Apr 2017 13:48:46 -0700

Perfect! Thanks a lot.

On Mon, Apr 10, 2017 at 1:39 PM, Tathagata Das <tathagata.das1...@gmail.com>
wrote:


> The trigger interval is optionally specified in the writeStream option
> before start.
>
> val windowedCounts = words.groupBy(
>   window($"timestamp", "24 hours", "24 hours"),
>   $"word"
> ).count()
> .writeStream
> .trigger(ProcessingTime("10 seconds"))  // optional
> .format("memory")
> .queryName("tableName")
> .start()
>
> See the full example here - https://github.com/apache/
> spark/blob/master/examples/src/main/scala/org/apache/
> spark/examples/sql/streaming/StructuredNetworkWordCountWindowed.scala
>
>
> On Mon, Apr 10, 2017 at 12:55 PM, kant kodali <kanth...@gmail.com> wrote:
>
>> Thanks again! Looks like the update mode is not available in 2.1 (which
>> seems to be the latest version as of today) and I am assuming there will be
>> a way to specify trigger interval with the next release because with the
>> following code I don't see a way to specify trigger interval.
>>
>> val windowedCounts = words.groupBy(
>>   window($"timestamp", "24 hours", "24 hours"),
>>   $"word").count()
>>
>>
>> On Mon, Apr 10, 2017 at 12:32 PM, Michael Armbrust <
>> mich...@databricks.com> wrote:
>>
>>> It sounds like you want a tumbling window (where the slide and duration
>>> are the same).  This is the default if you give only one interval.  You
>>> should set the output mode to "update" (i.e. output only the rows that have
>>> been updated since the last trigger) and the trigger to "1 second".
>>>
>>> Try thinking about the batch query that would produce the answer you
>>> want.  Structured streaming will figure out an efficient way to compute
>>> that answer incrementally as new data arrives.
>>>
>>> On Mon, Apr 10, 2017 at 12:20 PM, kant kodali <kanth...@gmail.com>
>>> wrote:
>>>
>>>> Hi Michael,
>>>>
>>>> Thanks for the response. I guess I was thinking more in terms of the
>>>> regular streaming model. so In this case I am little confused what my
>>>> window interval and slide interval be for the following case?
>>>>
>>>> I need to hold a state (say a count) for 24 hours while capturing all
>>>> its updates and produce results every second. I also need to reset the
>>>> state (the count) back to zero every 24 hours.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> On Mon, Apr 10, 2017 at 11:49 AM, Michael Armbrust <
>>>> mich...@databricks.com> wrote:
>>>>
>>>>> Nope, structured streaming eliminates the limitation that
>>>>> micro-batching should affect the results of your streaming query.  Trigger
>>>>> is just an indication of how often you want to produce results (and if you
>>>>> leave it blank we just run as quickly as possible).
>>>>>
>>>>> To control how tuples are grouped into a window, take a look at the
>>>>> window
>>>>> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#window-operations-on-event-time>
>>>>> function.
>>>>>
>>>>> On Thu, Apr 6, 2017 at 10:26 AM, kant kodali <kanth...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi All,
>>>>>>
>>>>>> Is the trigger interval mentioned in this doc
>>>>>> <http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html>
>>>>>> the same as batch interval in structured streaming? For example I have a
>>>>>> long running receiver(not kafka) which sends me a real time stream I want
>>>>>> to use window interval, slide interval of 24 hours to create the Tumbling
>>>>>> window effect but I want to process updates every second.
>>>>>>
>>>>>> Thanks!
>>>>>>
>>>>>
>>>>>
>>>>
>>>
>>
>

Re: Is the trigger interval the same as batch interval in structured streaming?

Reply via email to