On May 27, 2015, at 11:23 AM, Joel Koshy <jjkosh...@gmail.com> wrote:

> That's right - it should not help significantly assuming even
> distribution of leaders and even distribution of partition volume
> (average inbound messages/sec).
> 

Aditya, Joel,

Oh, right, that makes sense. If I had a 10 partition topic across 10 nodes 
where each leader handles 1/10th of the consumer traffic for that topic, I 
could change that and instead have 100 partition topic across 100 nodes, and 
then each leader would only have to handle 1/100th of the consumer traffic for 
that topic.

-James


> Theo's use-case is a bit different though in which you want to avoid
> cross-zone consumer reads especially if you have a high fan-out in
> number of consumers.
> 
> On Wed, May 27, 2015 at 05:56:56PM +0000, Aditya Auradkar wrote:
>> Is that necessarily the case? On a cluster hosting partitions, assuming the 
>> leaders are evenly distributed, every node should receive a roughly equal 
>> share of the traffic. It does help a lot when the consumer throughput of a 
>> single partition exceeds the capacity of a single leader but at that point 
>> the topic ideally needs more partitions.
>> 
>> Aditya
>> 
>> ________________________________________
>> From: James Cheng [jch...@tivo.com]
>> Sent: Wednesday, May 27, 2015 10:50 AM
>> To: users@kafka.apache.org
>> Subject: Re: Is fetching from in-sync replicas possible?
>> 
>> On May 26, 2015, at 1:44 PM, Joel Koshy <jjkosh...@gmail.com> wrote:
>> 
>>>> Apologies if this question has been asked before. If I understand things
>>>> correctly a client can only fetch from the leader of a partition, not from
>>>> an (in-sync) replica. I have a use case where it would be very beneficial
>>>> if it were possible to fetch from a replica instead of just the leader, and
>>>> I wonder why it is not allowed? Are there any consistency problems with
>>>> allowing it, for example? Is there any way to configure Kafka to allow it?
>>> 
>>> Yes this should be possible.  I don't think there are any consistency
>>> issues (barring any bugs) since we never expose past the
>>> high-watermark and the follower HW is strictly <= leader HW. Can you
>>> file a jira for this?
>>> 
>> 
>> Wouldn't this allow Kafka to scale to handle a lot more consumer traffic? 
>> Currently, consumers all have to read from the leader, which means that the 
>> network/disk bandwidth of a particular leader is the bottleneck. If 
>> consumers could read from in-sync replicas, then a single node no longer is 
>> the bottleneck for reads. You could scale out your read capacity as far as 
>> you want.
>> 
>> -James
>> 
>> 
>>>> The use case is a Kafka cluster running in EC2 across three availability
>>>> zones.
>>> 
>>> Out of curiosity - what's the typical latency (distribution) you see
>>> between zones?
>>> 
>>> Joel
>> 
> 

Reply via email to