Re: Kafka Streaming and Filtering > 3000 partitons

2015-10-21 Thread varun sharma
t; >> >> >> // get any needed data from the offset range >> >> val topic = osr.topic >> >> val kafkaPartitionId = osr.partition >> >> val begin = osr.fromOffset >> >> val end = osr.untilOffset >> >> >> >>

Re: Kafka Streaming and Filtering > 3000 partitons

2015-10-21 Thread Cody Koeninger
begin = osr.fromOffset > > val end = osr.untilOffset > > > > // Now we know the topic name, we can filter something > > // Or could we have referenced the topic name from > > // offsetRanges(TaskContext.get.partitionId) earlier > >

RE: Kafka Streaming and Filtering > 3000 partitons

2015-10-21 Thread Dave Ariens
From: Cody Koeninger [mailto:c...@koeninger.org] Sent: Wednesday, October 21, 2015 3:01 PM To: Dave Ariens Cc: user@spark.apache.org Subject: Re: Kafka Streaming and Filtering > 3000 partitons The rdd partitions are 1:1 with kafka topicpartitions, so you can use offsets ranges to fi

Re: Kafka Streaming and Filtering > 3000 partitons

2015-10-21 Thread Cody Koeninger
The rdd partitions are 1:1 with kafka topicpartitions, so you can use offsets ranges to figure out which topic a given rdd partition is for and proceed accordingly. See the kafka integration guide in the spark streaming docs for more details, or https://github.com/koeninger/kafka-exactly-once As

Kafka Streaming and Filtering > 3000 partitons

2015-10-21 Thread Dave Ariens
Hey folks, I have a very large number of Kafka topics (many thousands of partitions) that I want to consume, filter based on topic-specific filters, then produce back to filtered topics in Kafka. Using the receiver-less based approach with Spark 1.4.1 (described here