I assume you're using the receiver based approach? Have you tried the createDirectStream api?
https://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html If you're sticking with the receiver based approach I think your only option would be to create more consumer streams and union them. That doesn't give you control over where they're run, but should increase the consumer parallelism. On Wed, May 13, 2015 at 10:33 AM, hotdog <[email protected]> wrote: > I 'm using streaming integrated with streaming-kafka. > > My kafka topic has 80 partitions, while my machines have 40 cores. I found > that when the job is running, the kafka consumer processes are only deploy > to 2 machines, the bandwidth of the 2 machines will be very very high. > > I wonder is there any way to control the kafka consumer's dispatch? > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/force-the-kafka-consumer-process-to-different-machines-tp22872.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] > >
