Re: Spark Streaming Specify Kafka Partition

2015-12-04 Thread Cody Koeninger
So createDirectStream will give you a JavaInputDStream of R, where R is the return type you chose for your message handler. If you want a JavaPairInputDStream, you may have to call .mapToPair in order to convert the stream, even if the type you chose for R was already Tuple2 (note that I try to s

Re: Spark Streaming Specify Kafka Partition

2015-12-03 Thread Alan Braithwaite
One quick newbie question since I got another chance to look at this today. We're using java for our spark applications. The createDirectStream we were using previously [1] returns a JavaPairInputDStream, but the createDirectStream with fromOffsets expects an argument recordClass to pass into the

Re: Spark Streaming Specify Kafka Partition

2015-12-01 Thread Cody Koeninger
I actually haven't tried that, since I tend to do the offset lookups if necessary. It's possible that it will work, try it and let me know. Be aware that if you're doing a count() or take() operation directly on the rdd it'll definitely give you the wrong result if you're using -1 for one of the

Re: Spark Streaming Specify Kafka Partition

2015-12-01 Thread Alan Braithwaite
Neat, thanks. If I specify something like -1 as the offset, will it consume from the latest offset or do I have to instrument that manually? - Alan On Tue, Dec 1, 2015 at 6:43 AM, Cody Koeninger wrote: > Yes, there is a version of createDirectStream that lets you specify > fromOffsets: Map[Top

Re: Spark Streaming Specify Kafka Partition

2015-12-01 Thread Cody Koeninger
Yes, there is a version of createDirectStream that lets you specify fromOffsets: Map[TopicAndPartition, Long] On Mon, Nov 30, 2015 at 7:43 PM, Alan Braithwaite wrote: > Is there any mechanism in the kafka streaming source to specify the exact > partition id that we want a streaming job to consum

Spark Streaming Specify Kafka Partition

2015-11-30 Thread Alan Braithwaite
Is there any mechanism in the kafka streaming source to specify the exact partition id that we want a streaming job to consume from? If not, is there a workaround besides writing our a custom receiver? Thanks, - Alan