Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-03 Thread Cody Koeninger
Do you believe that all exceptions (including catastrophic ones like out of heap space) should be caught and silently discarded? Do you believe that a database system that runs out of disk space should silently continue to accept writes? What I am trying to say is, when something is broken in a w

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Dibyendu Bhattacharya
There are other ways to deal with the problem than shutdown the streaming job. You can monitor the lag in your consumer to see if consumer if falling behind . If lag is too high that offsetOutOfRange can happen, you either increase retention period or increase consumer rate..or do both .. What I a

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Cody Koeninger
I believe that what differentiates reliable systems is individual components should fail fast when their preconditions aren't met, and other components should be responsible for monitoring them. If a user of the direct stream thinks that your approach of restarting and ignoring data loss is the ri

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Dibyendu Bhattacharya
Well, even if you do correct retention and increase speed, OffsetOutOfRange can still come depends on how your downstream processing is. And if that happen , there is No Other way to recover old messages . So best bet here from Streaming Job point of view is to start from earliest offset rather br

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Cody Koeninger
No, silently restarting from the earliest offset in the case of offset out of range exceptions during a streaming job is not the "correct way of recovery". If you do that, your users will be losing data without knowing why. It's more like a "way of ignoring the problem without actually addressin

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Dibyendu Bhattacharya
This consumer which I mentioned does not silently throw away data. If offset out of range it start for earliest offset and that is correct way of recovery from this error. Dibyendu On Dec 2, 2015 9:56 PM, "Cody Koeninger" wrote: > Again, just to be clear, silently throwing away data because your

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-02 Thread Cody Koeninger
Again, just to be clear, silently throwing away data because your system isn't working right is not the same as "recover from any Kafka leader changes and offset out of ranges issue". On Tue, Dec 1, 2015 at 11:27 PM, Dibyendu Bhattacharya < dibyendu.bhattach...@gmail.com> wrote: > Hi, if you us

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread Dibyendu Bhattacharya
Hi, if you use Receiver based consumer which is available in spark-packages ( http://spark-packages.org/package/dibbhatt/kafka-spark-consumer) , this has all built in failure recovery and it can recover from any Kafka leader changes and offset out of ranges issue. Here is the package form github :

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread swetha kasireddy
How to avoid those Errors with receiver based approach? Suppose we are OK with at least once processing and use receiver based approach which uses ZooKeeper but not query Kafka directly, would these errors(Couldn't find leader offsets for Set([test_stream,5])))be avoided? On Tue, Dec 1, 2015 a

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread Cody Koeninger
KafkaRDD.scala , handleFetchErr On Tue, Dec 1, 2015 at 3:39 PM, swetha kasireddy wrote: > Hi Cody, > > How to look at Option 2(see the following)? Which portion of the code in > Spark Kafka Direct to look at to handle this issue specific to our > requirements. > > > 2.Catch that exception and so

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread swetha kasireddy
Following is the Option 2 that I was talking about: 2.Catch that exception and somehow force things to "reset" for that partition And how would it handle the offsets already calculated in the backlog (if there is one)? On Tue, Dec 1, 2015 at 1:39 PM, swetha kasireddy wrote: > Hi Cody, > > How t

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread swetha kasireddy
Hi Cody, How to look at Option 2(see the following)? Which portion of the code in Spark Kafka Direct to look at to handle this issue specific to our requirements. 2.Catch that exception and somehow force things to "reset" for that partition And how would it handle the offsets already calculated

Re: Recovery for Spark Streaming Kafka Direct in case of issues with Kafka

2015-12-01 Thread Cody Koeninger
If you're consistently getting offset out of range exceptions, it's probably because messages are getting deleted before you've processed them. The only real way to deal with this is give kafka more retention, consume faster, or both. If you're just looking for a quick "fix" for an infrequent iss