Chris,
I did the Dstream.repartition mentioned in the document on parallelism in
receiving, as well as set "spark.default.parallelism" and it still uses only
2 nodes in my cluster. I notice there is another email thread on the same
topic:
http://apache-spark-user-list.1001560.n3.nabble.com/DStre
I like this consumer for what it promises - better control over offset and
recovery from failures. If I understand this right, it still uses single
worker process to read from Kafka (one thread per partition) - is there a
way to specify multiple worker processes (on different machines) to read
fro