Hi

I'm building a application the read from kafka stream event. In production
we've 5 consumers that share 10 partitions.
But on spark streaming kafka the master act as a consumer then distribute
the tasks to workers so I can have only 1 masters acting as consumer but I
need more because only 1 consumer means Lags.

Do you've any idea what I can do ? Another point is interresting the master
is not loaded at all I can get up more than 10 % CPU

I've tried to increase the queued.max.message.chunks on the kafka client to
read more records thinking it'll speed up the read but I only get 

ERROR consumer.ConsumerFetcherThread:
[ConsumerFetcherThread-SparkEC2_ip-10-138-59-194.ec2.internal-1410182950783-5c49c8e8-0-174167372],
Error in fetch Name: FetchRequest; Version: 0; CorrelationId: 73; ClientId:
SparkEC2-ConsumerFetcherThread-SparkEC2_ip-10-138-59-194.ec2.internal-1410182950783-5c49c8e8-0-174167372;
ReplicaId: -1; MaxWait: 100 ms; MinBytes: 1 bytes; RequestInfo: [IA2,7] ->
PartitionFetchInfo(929838589,1048576),[IA2,6] ->
PartitionFetchInfo(929515796,1048576),[IA2,9] ->
PartitionFetchInfo(929577946,1048576),[IA2,8] ->
PartitionFetchInfo(930751599,1048576),[IA2,2] ->
PartitionFetchInfo(926457704,1048576),[IA2,5] ->
PartitionFetchInfo(930774385,1048576),[IA2,0] ->
PartitionFetchInfo(929913213,1048576),[IA2,3] ->
PartitionFetchInfo(929268891,1048576),[IA2,4] ->
PartitionFetchInfo(929949877,1048576),[IA2,1] ->
PartitionFetchInfo(930063114,1048576)
java.lang.OutOfMemoryError: Java heap space

Is someone have ideas ?
Thanks



--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/How-to-scale-large-kafka-topic-tp13691.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org

Reply via email to