On May 27, 2015, at 11:23 AM, Joel Koshy <jjkosh...@gmail.com> wrote:
> That's right - it should not help significantly assuming even > distribution of leaders and even distribution of partition volume > (average inbound messages/sec). > Aditya, Joel, Oh, right, that makes sense. If I had a 10 partition topic across 10 nodes where each leader handles 1/10th of the consumer traffic for that topic, I could change that and instead have 100 partition topic across 100 nodes, and then each leader would only have to handle 1/100th of the consumer traffic for that topic. -James > Theo's use-case is a bit different though in which you want to avoid > cross-zone consumer reads especially if you have a high fan-out in > number of consumers. > > On Wed, May 27, 2015 at 05:56:56PM +0000, Aditya Auradkar wrote: >> Is that necessarily the case? On a cluster hosting partitions, assuming the >> leaders are evenly distributed, every node should receive a roughly equal >> share of the traffic. It does help a lot when the consumer throughput of a >> single partition exceeds the capacity of a single leader but at that point >> the topic ideally needs more partitions. >> >> Aditya >> >> ________________________________________ >> From: James Cheng [jch...@tivo.com] >> Sent: Wednesday, May 27, 2015 10:50 AM >> To: users@kafka.apache.org >> Subject: Re: Is fetching from in-sync replicas possible? >> >> On May 26, 2015, at 1:44 PM, Joel Koshy <jjkosh...@gmail.com> wrote: >> >>>> Apologies if this question has been asked before. If I understand things >>>> correctly a client can only fetch from the leader of a partition, not from >>>> an (in-sync) replica. I have a use case where it would be very beneficial >>>> if it were possible to fetch from a replica instead of just the leader, and >>>> I wonder why it is not allowed? Are there any consistency problems with >>>> allowing it, for example? Is there any way to configure Kafka to allow it? >>> >>> Yes this should be possible. I don't think there are any consistency >>> issues (barring any bugs) since we never expose past the >>> high-watermark and the follower HW is strictly <= leader HW. Can you >>> file a jira for this? >>> >> >> Wouldn't this allow Kafka to scale to handle a lot more consumer traffic? >> Currently, consumers all have to read from the leader, which means that the >> network/disk bandwidth of a particular leader is the bottleneck. If >> consumers could read from in-sync replicas, then a single node no longer is >> the bottleneck for reads. You could scale out your read capacity as far as >> you want. >> >> -James >> >> >>>> The use case is a Kafka cluster running in EC2 across three availability >>>> zones. >>> >>> Out of curiosity - what's the typical latency (distribution) you see >>> between zones? >>> >>> Joel >> >