Re: [DISCUSS] KIP-392: Allow consumers to fetch from the closest replica

Jason Gustafson Mon, 10 Dec 2018 12:46:30 -0800

Hey Mickael,

Thanks for the comments. Responses below:


- I'm guessing the selector will be invoke after each rebalance so
> every time the consumer is assigned a partition it will be able to
> select it. Is that true?


I'm not sure it is necessary to do it after every rebalance, but certainly
the selector would be invoked after the partition is first assigned. Was
there a specific concern you had in mind?

- From the selector API, I'm not sure how the consumer will be able to
> address some of the choices mentioned in "Finding the preferred
> follower". Especially the available bandwidth and the load balancing.
> By only having the list of Nodes, a consumer can pick the nereast
> replica (assuming the rack field means anything to users) or balance
> its own bandwidth but that might not necessarily mean improved
> performance or a balanced load on the brokers.


The intent is to provide a minimal extension point. Users would have to
rely on external sources for their own custom selection logic. It is
similar to other interfaces exposed in the clients, such as Partitioner and
PartitionAssignor. The interface exposes only metadata about the
replication state, but nothing stops users from leveraging other
information to make better decisions. Does that seem reasonable?

Thanks,
Jason



On Mon, Dec 10, 2018 at 11:41 AM Jason Gustafson <ja...@confluent.io> wrote:

> Hey Eno,
>
> Thanks for the comments. However, I'm a bit confused. I'm not suggesting
> we change Produce semantics in any way. All writes still go through the
> partition leader and nothing changes with respect to committing to the ISR.
> The main issue, as I've mentioned in the KIP, is the increased latency
> before a committed offset is exposed on followers.
>
> Perhaps I have misunderstood your question?
>
> Thanks,
> Jason
>
> On Mon, Dec 3, 2018 at 9:18 AM Eno Thereska <eno.there...@gmail.com>
> wrote:
>
>> Hi Jason,
>>
>> This is an interesting KIP. This will have massive implications for
>> consistency and serialization, since currently the leader for a partition
>> serializes requests. A few questions for now:
>>
>> - before we deal with the complexity, it'd be great to see a crisp example
>> in the motivation as to when this will have the most benefit for a
>> customer. In particular, although the customer might have a multi-DC
>> deployment, the DCs could still be close by in a region, so what is the
>> expected best-case scenario for a performance gain? E.g., if all DCs are
>> on
>> the east-cost, say. Right now it's not clear to me.
>> - perhaps performance is not the right metric. Is the metric you are
>> optimizing for latency, throughput or cross-DC cost? (I believe it is
>> cross-DC cost from the KIP). Just wanted to double-check since I'm not
>> sure
>> latency would improve. Throughput could really improve from parallelism
>> (especially in cases when there is mostly consuming going on). So it could
>> be throughput as well.
>> - the proposal would probably lead to choosing a more complex consistency.
>> I tend to like the description Doug Terry has in his paper "Replicated
>> Data
>> Consistency Explained Through Baseball"
>>
>> https://www.microsoft.com/en-us/research/wp-content/uploads/2011/10/ConsistencyAndBaseballReport.pdf
>> .
>> To start with, could we get in scenarios where a client that has both a
>> producer and a consumer (e.g., Kafka streams) produces a record, then
>> attempts to consume it back and the consume() comes back with "record does
>> not exist"? That's fine, but could complicate application handling of such
>> scenarios.
>>
>> Thanks,
>> Eno
>>
>> On Mon, Dec 3, 2018 at 12:24 PM Mickael Maison <mickael.mai...@gmail.com>
>> wrote:
>>
>> > Hi Jason,
>> >
>> > Very cool KIP!
>> > A couple of questions:
>> > - I'm guessing the selector will be invoke after each rebalance so
>> > every time the consumer is assigned a partition it will be able to
>> > select it. Is that true?
>> >
>> > - From the selector API, I'm not sure how the consumer will be able to
>> > address some of the choices mentioned in "Finding the preferred
>> > follower". Especially the available bandwidth and the load balancing.
>> > By only having the list of Nodes, a consumer can pick the nereast
>> > replica (assuming the rack field means anything to users) or balance
>> > its own bandwidth but that might not necessarily mean improved
>> > performance or a balanced load on the brokers.
>> >
>> > Thanks
>> > On Mon, Dec 3, 2018 at 11:35 AM Stanislav Kozlovski
>> > <stanis...@confluent.io> wrote:
>> > >
>> > > Hey Jason,
>> > >
>> > > This is certainly a very exciting KIP.
>> > > I assume that no changes will be made to the offset commits and they
>> will
>> > > continue to be sent to the group coordinator?
>> > >
>> > > I also wanted to address metrics - have we considered any changes
>> there?
>> > I
>> > > imagine that it would be valuable for users to be able to
>> differentiate
>> > > between which consumers' partitions are fetched from replicas and
>> which
>> > > aren't. I guess that would need to be addressed both in the server's
>> > > fetcher lag metrics and in the consumers.
>> > >
>> > > Thanks,
>> > > Stanislav
>> > >
>> > > On Wed, Nov 28, 2018 at 10:08 PM Jun Rao <j...@confluent.io> wrote:
>> > >
>> > > > Hi, Jason,
>> > > >
>> > > > Thanks for the KIP. Looks good overall. A few minor comments below.
>> > > >
>> > > > 1. The section on handling FETCH_OFFSET_TOO_LARGE error says "Use
>> the
>> > > > OffsetForLeaderEpoch API to verify the current position with the
>> > leader".
>> > > > The OffsetForLeaderEpoch request returns log end offset if the
>> request
>> > > > leader epoch is the latest. So, we won't know the true high
>> watermark
>> > from
>> > > > that request. It seems that the consumer still needs to send
>> ListOffset
>> > > > request to the leader to obtain high watermark?
>> > > >
>> > > > 2. If a non in-sync replica receives a fetch request from a
>> consumer,
>> > > > should it return a new type of error like ReplicaNotInSync?
>> > > >
>> > > > 3. Could ReplicaSelector be closable?
>> > > >
>> > > > 4. Currently, the ISR propagation from the leader to the controller
>> > can be
>> > > > delayed up to 60 secs through
>> > ReplicaManager.IsrChangePropagationInterval.
>> > > > In that window, the consumer could still be consuming from a non
>> > in-sync
>> > > > replica. The relatively large delay is mostly for reducing the ZK
>> > writes
>> > > > and the watcher overhead. Not sure what's the best way to address
>> > this. We
>> > > > could potentially make this configurable.
>> > > >
>> > > > 5. It may be worth mentioning that, to take advantage of affinity,
>> one
>> > may
>> > > > also want to have a customized PartitionAssignor to have an affinity
>> > aware
>> > > > assignment in addition to a customized ReplicaSelector.
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Jun
>> > > >
>> > > > On Wed, Nov 21, 2018 at 12:54 PM Jason Gustafson <
>> ja...@confluent.io>
>> > > > wrote:
>> > > >
>> > > > > Hi All,
>> > > > >
>> > > > > I've posted a KIP to add the often-requested support for fetching
>> > from
>> > > > > followers:
>> > > > >
>> > > > >
>> > > >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-392%3A+Allow+consumers+to+fetch+from+closest+replica
>> > > > > .
>> > > > > Please take a look and let me know what you think.
>> > > > >
>> > > > > Thanks,
>> > > > > Jason
>> > > > >
>> > > >
>> > >
>> > >
>> > > --
>> > > Best,
>> > > Stanislav
>> >
>>
>

Re: [DISCUSS] KIP-392: Allow consumers to fetch from the closest replica

Reply via email to