> > 3. In our usecase we read from Kafka, do some mapping and lastly persists > data to cassandra as well as pushes the data over remote actor for > realtime update in dashboard. I used below approaches > - First tried to use vary naive way like stream.map(...)*.foreachRDD( > pushes to actor)* > It does not work and stage failed saying akka exception > - Second tried to use > akka.serialization.JavaSerilizer.withSystem(system){...} approach > It does not work and stage failed BUT without any trace anywhere in > lofs > - Finally did rdd.collect to collect the output into driver and then > pushes to actor > It worked.
Have you tried writing to Cassandra using Hadoop Output instead of pushing it to actor in the foreachRDD handler. I am sure that will be more efficient than collect and send to actor. In Calliope we do a similar thing... You can checkout the example here - http://tuplejump.github.io/calliope/streaming.html https://github.com/tuplejump/calliope/blob/develop/src/main/scala/com/tuplejump/calliope/examples/PersistDStream.scala Hope that helps Regards, Rohit *Founder & CEO, **Tuplejump, Inc.* ____________________________ www.tuplejump.com *The Data Engineering Platform* On Tue, Mar 11, 2014 at 11:39 PM, Sourav Chandra < sourav.chan...@livestream.com> wrote: > Hi, > > I have some questions regarding usage patterns and debugging in > spark/spark streaming. > > 1. What is some used design patterns of using broadcast variable? In my > application i created some and also created a scheduled task which > periodically refreshes the variables. I want to know how efficiently and in > modular way people generally achieve this? > > 2. Sometimes a uncaught exception in driver program/worker does not get > traced anywhere? How can we debug this? > > 3. In our usecase we read from Kafka, do some mapping and lastly persists > data to cassandra as well as pushes the data over remote actor for realtime > update in dashboard. I used below approaches > - First tried to use vary naive way like stream.map(...).foreachRDD( > pushes to actor) > It does not work and stage failed saying akka exception > - Second tried to use > akka.serialization.JavaSerilizer.withSystem(system){...} approach > It does not work and stage failed BUT without any trace anywhere in > lofs > - Finally did rdd.collect to collect the output into driver and then > pushes to actor > It worked. > > I would like to know is there any efficient way of achieving this sort of > usecases > > 4. Sometimes I see failed stages but when opened those stage details it > said stage did not start. What does this mean? > > Looking forward for some interesting responses :) > > Thanks, > -- > > Sourav Chandra > > Senior Software Engineer > > · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · > > sourav.chan...@livestream.com > > o: +91 80 4121 8723 > > m: +91 988 699 3746 > > skype: sourav.chandra > > Livestream > > "Ajmera Summit", First Floor, #3/D, 68 Ward, 3rd Cross, 7th C Main, 3rd > Block, Koramangala Industrial Area, > > Bangalore 560034 > > www.livestream.com >