Re: spark streaming 1.3 with kafka connection timeout

2015-09-10 Thread Cody Koeninger
Again, that looks like you lost a kafka broker. Executors will retry failed tasks automatically up to the max failures. spark.streaming.kafka.maxRetries controls the number of times the driver will retry when attempting to get offsets. If your broker isn't up / rebalance hasn't finished after N

Re: spark streaming 1.3 with kafka connection timeout

2015-09-10 Thread Shushant Arora
My bad Got that exception in driver code of same job not in executor. But it says of socket close exception only. org.apache.spark.SparkException: ArrayBuffer(java.io.EOFException: Received -1 when reading from channel, socket has likely been closed., org.apache.spark.SparkException: Couldn't fi

Re: spark streaming 1.3 with kafka connection timeout

2015-09-10 Thread Cody Koeninger
NotLeaderForPartitionException means you lost a kafka broker or had a rebalance... why did you say " I am getting Connection tmeout in my code." You've asked questions about this exact same situation before, the answer remains the same On Thu, Sep 10, 2015 at 9:44 AM, Shushant Arora wrote: > St

Re: spark streaming 1.3 with kafka connection timeout

2015-09-10 Thread Shushant Arora
Stack trace is 15/09/09 22:49:52 ERROR kafka.KafkaRDD: Lost leader for topic topicname partition 99, sleeping for 200ms kafka.common.NotLeaderForPartitionException at sun.reflect.GeneratedConstructorAccessor26.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessor

Re: spark streaming 1.3 with kafka connection timeout

2015-09-10 Thread Cody Koeninger
Post the actual stacktrace you're getting On Thu, Sep 10, 2015 at 12:21 AM, Shushant Arora wrote: > Executors in spark streaming 1.3 fetch messages from kafka in batches and > what happens when executor takes longer time to complete a fetch batch > > say in > > > directKafkaStream.foreachRDD(new

Re: spark streaming 1.3 with kafka

2015-09-01 Thread Shushant Arora
I feel need of pause and resume in streaming app :) Is there any limit on max queued jobs ? If yes what happens once that limit reaches? Does job gets killed? On Tue, Sep 1, 2015 at 10:02 PM, Cody Koeninger wrote: > Sounds like you'd be better off just failing if the external server is > down,

Re: spark streaming 1.3 with kafka

2015-09-01 Thread Cody Koeninger
Sounds like you'd be better off just failing if the external server is down, and scripting monitoring / restarting of your job. On Tue, Sep 1, 2015 at 11:19 AM, Shushant Arora wrote: > Since in my app , after processing the events I am posting the events to > some external server- if external se

Re: spark streaming 1.3 with kafka

2015-09-01 Thread Shushant Arora
Since in my app , after processing the events I am posting the events to some external server- if external server is down - I want to backoff consuming from kafka. But I can't stop and restart the consumer since it needs manual effort. Backing off few batches is also not possible -since decision o

Re: spark streaming 1.3 with kafka

2015-09-01 Thread Cody Koeninger
Honestly I'd concentrate more on getting your batches to finish in a timely fashion, so you won't even have the issue to begin with... On Tue, Sep 1, 2015 at 10:16 AM, Shushant Arora wrote: > What if I use custom checkpointing. So that I can take care of offsets > being checkpointed at end of ea

Re: spark streaming 1.3 with kafka

2015-09-01 Thread Cody Koeninger
No, if you start arbitrarily messing around with offset ranges after compute is called, things are going to get out of whack. e.g. checkpoints are no longer going to correspond to what you're actually processing On Tue, Sep 1, 2015 at 10:04 AM, Shushant Arora wrote: > can I reset the range base

Re: spark streaming 1.3 with kafka

2015-09-01 Thread Shushant Arora
What if I use custom checkpointing. So that I can take care of offsets being checkpointed at end of each batch. Will it be possible then to reset the offset. On Tue, Sep 1, 2015 at 8:42 PM, Cody Koeninger wrote: > No, if you start arbitrarily messing around with offset ranges after > compute is

Re: spark streaming 1.3 with kafka

2015-09-01 Thread Shushant Arora
can I reset the range based on some condition - before calling transformations on the stream. Say - before calling : directKafkaStream.foreachRDD(new Function, Void>() { @Override public Void call(JavaRDD v1) throws Exception { v1.foreachPartition(new VoidFunction>{ @Override public void call(I

Re: spark streaming 1.3 with kafka

2015-09-01 Thread Cody Koeninger
It's at the time compute() gets called, which should be near the time the batch should have been queued. On Tue, Sep 1, 2015 at 8:02 AM, Shushant Arora wrote: > Hi > > In spark streaming 1.3 with kafka- when does driver bring latest offsets > of this run - at start of each batch or at time when