To amortize the long latency across DC, you may need to tune the socket buffer size to get higher throughput. See https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=27846330
Thanks, Jun On Fri, May 9, 2014 at 4:06 AM, Sithiyavanich, Manawat (Agoda) < manawat.sithiyavan...@agoda.com> wrote: > Hi , > > I have been setting up 2 clusters of Kafka located across DCs , one in > Asia and another one in Europe. All of them are running on EC2 m1.xlarge > machines. > Our goal is to replicate data from Asia cluster to Europe cluster with > "high and stable" speed. > > I ended up trying Mirrormaker provided by Kafka , setup in Europe cluster, > and here is the weird problems I have got so far. > Let N = number of partitions of a topic in Asia cluster. > > 1) Using the Kafka provided Mirrormaker , with -num.streams = N. > > It runs super slow at the rate of around 30 Mbps while we expect around 1 > Gbps as we have noticed this rate at Asia cluster when producer producing > data to Kafka brokers. > > 2) I tried creating my own version of Consumer using High-level API in > Europe to consume data directly from Asia cluster , and ended up with the > same rate around 30 to 40 Mbps > > I have designed the consumer this way. > > - 1 Consumer Group (1 kafka.javaapi.consumer.ConsumerConnector) > - N threads to handle N streams from 1 Consumer Group > > 3) I tried creating another version of Consumer using High-level API , and > ended up with around 400 Mbps! , which is much more higher than 2 mentioned > ways > > Here is my new design > > - N Consumer Group (N kafka.javaapi.consumer.ConsumerConnector) with the > same group ID > - N Threads to handle N Consumer Group > - 1 Consumer Group handles 1 Stream > > Also , the more number of N , the higher speeds I've got from the > solution 3) while solution 1) and 2) remains the speed at 30Mbps. > > I don't see the technical differences between solution 3) and the rest but > how the speed of consuming data with solution 3) becomes so hugely > different from the rest? > > Right now our N = 60 , and each message in a topic in Asia cluster has > average size of 50KB. > > Regards > Manawat > > ________________________________ > This message is confidential and is for the sole use of the intended > recipient(s). It may also be privileged or otherwise protected by copyright > or other legal rules. If you have received it by mistake please let us know > by reply email and delete it from your system. It is prohibited to copy > this message or disclose its content to anyone. Any confidentiality or > privilege is not waived or lost by any mistaken delivery or unauthorized > disclosure of the message. All messages sent to and from Agoda may be > monitored to ensure compliance with company policies, to protect the > company's interests and to remove potential malware. Electronic messages > may be intercepted, amended, lost or deleted, or contain viruses. >