This consumer pretty much covers all those scenarios you listed github.com/dibbhatt/kafka-spark-consumer Give it a try.
Thanks Best Regards On Thu, Sep 10, 2015 at 3:32 PM, Krzysztof Zarzycki <[email protected]> wrote: > Hi there, > I have a problem with fulfilling all my needs when using Spark Streaming > on Kafka. Let me enumerate my requirements: > 1. I want to have at-least-once/exactly-once processing. > 2. I want to have my application fault & simple stop tolerant. The Kafka > offsets need to be tracked between restarts. > 3. I want to be able to upgrade code of my application without losing > Kafka offsets. > > Now what my requirements imply according to my knowledge: > 1. implies using new Kafka DirectStream. > 2. implies using checkpointing. kafka DirectStream will write offsets to > the checkpoint as well. > 3. implies that checkpoints can't be used between controlled restarts. So > I need to install shutdownHook with ssc.stop(stopGracefully=true) (here is > a description how: > https://metabroadcast.com/blog/stop-your-spark-streaming-application-gracefully > ) > > Now my problems are: > 1. If I cannot make checkpoints between code upgrades, does it mean that > Spark does not help me at all with keeping my Kafka offsets? Does it mean, > that I have to implement my own storing to/initalization of offsets from > Zookeeper? > 2. When I set up shutdownHook and my any executor throws an exception, it > seems that application does not fail, but stuck in running state. Is that > because stopGracefully deadlocks on exceptions? How to overcome this > problem? Maybe I can avoid setting shutdownHook and there is other way to > stop gracefully your app? > > 3. If I somehow overcome 2., is it enough to just stop gracefully my app > to be able to upgrade code & not lose Kafka offsets? > > > Thank you a lot for your answers, > Krzysztof Zarzycki > > > >
