Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Tathagata Das
1. Same way, using static fields in a class. 2. Yes, same way. 3. Yes, you can do that. To differentiate from "first time" v/s "continue", you have to build your own semantics. For example, if the location in HDFS you are suppose to store the offsets does not have any data, that means its probably

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Shushant Arora
1.How to do it in java? 2.Can broadcast objects also be created in same way after checkpointing. 3.Is it safe If I disable checkpoint and write offsets at end of each batch to hdfs in mycode and somehow specify in my job to use this offset for creating kafkastream at first time. How can I specify

Re: broadcast variable and accumulators issue while spark streaming checkpoint recovery

2015-07-29 Thread Tathagata Das
Rather than using accumulator directly, what you can do is something like this to lazily create an accumulator and use it (will get lazily recreated if driver restarts from checkpoint) dstream.transform { rdd => val accum = SingletonObject.getOrCreateAccumulator() // single object method to