read kafka offset from spark checkpoint

2016-08-15 Thread Shifeng Xiao
Hi folks, We are using kafka + spark streaming in our data pipeline, but sometimes we have to clean up checkpoint from hdfs before we restart spark streaming application, otherwise the application fails to start. That means we are losing data when we clean up checkpoint, is there a way to read

restart spark streaming app

2016-08-12 Thread Shifeng Xiao
Hi folks, I am using Spark streaming, and I am not clear if there is smart way to restart the app once it fails, currently we just have one cron job to check if the job is running every 2 or 5 minutes and restart the app when necessary. According to spark streaming guide: - *YARN* - Yarn sup