Looking @ [1], it seems to recommend pull from multiple Kafka topics in
order to parallelize data received from Kafka over multiple nodes. I notice
in [2], however, that one of the createConsumer() functions takes a
groupId. So am I understanding correctly that creating multiple DStreams
with the same groupId allow data to be partitioned across many nodes on a
single topic?

[1]
http://spark.apache.org/docs/1.2.0/streaming-programming-guide.html#level-of-parallelism-in-data-receiving
[2]
https://spark.apache.org/docs/1.2.0/api/scala/index.html#org.apache.spark.streaming.kafka.KafkaUtils$

Reply via email to