Re: force the kafka consumer process to different machines

Cody Koeninger Wed, 13 May 2015 08:54:06 -0700

I assume you're using the receiver based approach?  Have you tried the
createDirectStream api?


https://spark.apache.org/docs/1.3.0/streaming-kafka-integration.html

If you're sticking with the receiver based approach I think your only
option would be to create more consumer streams and union them.  That
doesn't give you control over where they're run, but should increase the
consumer parallelism.

On Wed, May 13, 2015 at 10:33 AM, hotdog <[email protected]> wrote:

> I 'm using streaming integrated with streaming-kafka.
>
> My kafka topic has 80 partitions, while my machines have 40 cores. I found
> that when the job is running, the kafka consumer processes are only deploy
> to 2 machines, the bandwidth of the 2 machines will be very very high.
>
> I wonder is there any way to control the kafka consumer's dispatch？
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/force-the-kafka-consumer-process-to-different-machines-tp22872.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>

Re: force the kafka consumer process to different machines

Reply via email to