Re: Fault tolerant broadcast in updateStateByKey

Tathagata Das Tue, 07 Feb 2017 14:31:26 -0800

broadcasts are not saved in checkpoints. so you have to save it externally
yourself, and recover it before restarting the stream from checkpoints.


On Tue, Feb 7, 2017 at 3:55 PM, Amit Sela <amitsel...@gmail.com> wrote:

> I know this approach, only thing is, it relies on the transformation being
> an RDD transfomration as well and so could be applied via foreachRDD and
> using the rdd context to avoid a stale context after recovery/resume.
> My question is how to void stale context in a DStream-only transformation
> such as updateStateByKey / mapWithState ?
>
> On Tue, Feb 7, 2017 at 9:19 PM Shixiong(Ryan) Zhu <shixi...@databricks.com>
> wrote:
>
>> It's documented here: http://spark.apache.org/docs/
>> latest/streaming-programming-guide.html#accumulators-
>> broadcast-variables-and-checkpoints
>>
>> On Tue, Feb 7, 2017 at 8:12 AM, Amit Sela <amitsel...@gmail.com> wrote:
>>
>> Hi all,
>>
>> I was wondering if anyone ever used a broadcast variable within
>> an updateStateByKey op. ? Using it is straight-forward but I was wondering
>> how it'll work after resuming from checkpoint (using the rdd.context()
>> trick is not possible here) ?
>>
>> Thanks,
>> Amit
>>
>>
>>

Re: Fault tolerant broadcast in updateStateByKey

Reply via email to