Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-13 Thread Ivan von Nagy
park.apache.org/docs/latest/streaming-kafka-0-10-integration.html# > locationstrategies > > gives instructions on the default size of the cache and how to increase it. > > On Sat, Nov 12, 2016 at 2:15 PM, Ivan von Nagy wrote: > > Hi Sean, > > > > Thanks for responding. We ha

Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-12 Thread Ivan von Nagy
ible to things like GC pauses or coordination latency. > It could also be that the number of consumers being simultaneously created > on each executor causes a thundering herd problem during initial phases > (which then causes job retries, which then causes more consumer churn, > etc.).

Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-12 Thread Ivan von Nagy
I can tell you are still not doing. > > On Nov 10, 2016 7:43 PM, "Shixiong(Ryan) Zhu" > wrote: > >> Yeah, the KafkaRDD cannot be reused. It's better to document it. >> >> On Thu, Nov 10, 2016 at 8:26 AM, Ivan von Nagy wrote: >> >>>

Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-10 Thread Ivan von Nagy
; > Do you want to keep arguing with me, or follow my advice and proceed > with debugging any remaining issues after you make the changes I > suggested? > > On Mon, Nov 7, 2016 at 1:35 PM, Ivan von Nagy wrote: > > With our stream version, we update the offsets for only the parti

Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-07 Thread Ivan von Nagy
auses, and then we can > >> see if there are still issues. > >> > >> As far as 0.8 vs 0.10, Spark doesn't require you to assign or > >> subscribe to a topic in order to update offsets, Kafka does. If you > >> don't like the new Kafka consumer

Re: Instability issues with Spark 2.0.1 and Kafka 0.10

2016-11-04 Thread Ivan von Nagy
Yes, the parallel KafkaRDD uses the same consumer group, but each RDD uses a single distinct topic. For example, the group would be something like "storage-group", and the topics would be "storage-channel1", and "storage-channel2". In each thread a KafkaConsumer is started, assigned the partitions