Re: Kafaka 0.8, 0.9 in Structured Streaming

2016-10-07 Thread Jeremy Smith
+1 We're on CDH, and it will probably be a while before they support Kafka 0.10. At the same time, we don't use their Spark and we're looking forward to upgrading to 2.0.x and using structured streaming. I was just going to write our own Kafka Source implementation which uses the existing KafkaRD

Re: Not all KafkaReceivers processing the data Why?

2016-09-14 Thread Jeremy Smith
Take a look at how the messages are actually distributed across the partitions. If the message keys have a low cardinality, you might get poor distribution (i.e. all the messages are actually only in two of the five partitions, leading to what you see in Spark). If you take a look at the Kafka dat

Re: Long running Spark Streaming Job increasing executing time per batch

2015-09-24 Thread Jeremy Smith
I found a similar issue happens when there is a memory leak in the spark application (or, in my case, one of the libraries that's used in the spark application). Gradually, unclaimed objects make their way into old or permanent generation space, reducing the available heap. It causes GC overhead