Hi, Andrey, Thanks for the update to the wiki. Just a few more minor comments.
1. "If *response_max_bytes* parameter is zero ("no limit"), the request is processed *exactly* as before." Instead of using 0, it seems it's more natural to use Int.MAX_INT to preserve the old behavior. 2. "For the first partition, server always fetches at least *message.max.bytes." *To be more precise, the server only fetches more bytes than *response_max_bytes *(and up *message.max.bytes*t) if there is a message whose size is larger than *response_max_bytes.* 3. "The solution is to continue fetching from first empty partition in round-robin fashion or to perform random shuffle of partitions." We probably want to make it clearer by saying that this is for ordering the partitions in the fetch request. 4. Just to make it clear. Could you include the new fetch request protocol in the wiki (e.g. https://cwiki.apache.org/confluence/display/KAFKA/KIP-4+-+Command+line+and+centralized+administrative+operations#KIP-4-Commandlineandcentralizedadministrativeoperations-MetadataRequest(version1)) and mark the new field? Jun On Mon, Aug 15, 2016 at 3:12 AM, Andrey L. Neporada < anepor...@yandex-team.ru> wrote: > Hi all! > > KIP-74 is updated to sync up with mail list discussion. > https://cwiki.apache.org/confluence/display/KAFKA/KIP-74%3A+ > Add+Fetch+Response+Size+Limit+in+Bytes > > Your feedback is highly appreciated. > > Thanks, > Andrey. > > > > On 12 Aug 2016, at 20:49, Andrey L. Neporada <anepor...@yandex-team.ru> > wrote: > > > > Hi! > >> On 12 Aug 2016, at 20:22, Jun Rao <j...@confluent.io> wrote: > >> > >> Hi, Andrey, > >> > >> Yes, I agree that it's more work for the client to do the round-robin > logic > >> since it has to be stateful. However, that applies to both the consumer > >> client and the replica fetch thread. I just feel that it's weird to use > one > >> strategy in the replica fetch thread and another in the consumer > client. An > >> alternative is to just let the leader broker shuffle the partitions > before > >> filling up the bytes. This alleviates the need for the clients to > maintain > >> additional states. It's not as deterministic as doing round-robin in the > >> client, but is probably good enough in practice. > >> > > > > Oh, I got the point. You are talking about high-level consumer client > API (i.e. Consumer.java). > > Yes, it makes sense to use the same strategy as in ReplicaFetcher thread > there. > > > > > >> For 2), I am not sure if we want to set the default limit of fetch > response > >> in the consumer to be 0. Today, sending a response of more than 2GB will > >> cause buffer overflow. How about defaulting it to 50BM? This should be > >> enough for the common case. For the default fetch response limit in the > >> replica fetcher, I'd recommend 10MB since there could be multiple > replica > >> fetcher threads to different brokers. > >> > > > > Ok, I’ll put this numbers in PR - we can discuss them later. > > > >> Thanks, > >> > >> Jun > >> > > > > Andrey. > > > >> > >> On Fri, Aug 12, 2016 at 9:19 AM, Andrey L. Neporada < > >> anepor...@yandex-team.ru> wrote: > >> > >>> Hi! > >>> > >>>> On 12 Aug 2016, at 18:32, Jun Rao <j...@confluent.io> wrote: > >>>> > >>>> Hi, Andrey, > >>>> > >>>> Why shouldn't the client library do reordering? It seems that if > >>>> ReplicaFetcher thread does round-robin, the consumer client should do > >>> that > >>>> too? > >>>> > >>> > >>> IMHO the client library is not a good place to implement such logic. > >>> For example, if we want to put round-robin logic in client lib, it will > >>> become stageful (since it should remember first empty partition > received > >>> from server). It should also remember some kind of context to > distinguish > >>> one end-user (or partition set) from another, etc. > >>> > >>> Personally, I'd expect from client library to submit requests as is. > >>> Anyway, I think it is difficult to come up with reordering logic that > will > >>> be good for all clients. > >>> > >>> So, I propose: > >>> > >>> 1) preserve per-partition limit in fetch request (as discussed) > >>> 2) make default limit of fetch response to be 0 (which means no limit > at > >>> all). So, by default, new fetch request should behave exactly like old > one > >>> 3) implement round-robin logic in ReplicaFetcherThread and use non-zero > >>> default fetch response limit there. > >>> 4) document that all end-users who want to use FetchRequest with actual > >>> response limit to implement some kind of round-robin to ensure > fairness (if > >>> they care about it) > >>> > >>> > >>>> Thanks, > >>>> > >>>> Jun > >>>> > >>> > >>> Thanks, > >>> Andrey. > >>> > >>> > >>>> On Fri, Aug 12, 2016 at 3:56 AM, Andrey L. Neporada < > >>>> anepor...@yandex-team.ru> wrote: > >>>> > >>>>> Hi! > >>>>> > >>>>>> On 12 Aug 2016, at 04:29, Jun Rao <j...@confluent.io> wrote: > >>>>>> > >>>>>> Hi, Andrey, > >>>>>> > >>>>>> One potential benefit of keeping the per partition limit is for > Kafka > >>>>>> stream. When reading messages from different partitions, KStream > >>> prefers > >>>>> to > >>>>>> read from partitions with smaller timestamps first and only advances > >>> the > >>>>>> KStream timestamp when it sees at least one message from every > >>> partition. > >>>>>> Being able to fill up multiple partitions in a single fetch response > >>> can > >>>>>> help KStream advance the timestamp quicker when there is backlog > from > >>> the > >>>>>> input. So, it's probably better if we just add a new response limit > >>> while > >>>>>> keeping the per partition limit. > >>>>>> > >>>>> > >>>>> Yes, this makes sense for me. > >>>>> > >>>>>> Also, for fairness across partitions, you mentioned "The solution > is to > >>>>>> start fetching from first empty partition in round-robin fashion or > to > >>>>>> perform random shuffle of partitions.". It seems the former is more > >>>>>> deterministic. Did you use that in your implementation and should we > >>>>>> recommend that for non-java clients as well? > >>>>>> > >>>>> > >>>>> In my initial implementation I just did random shuffling on server > side. > >>>>> Now I plan to use deterministic approach - do round-robin in > >>>>> ReplicaFetcher thread - I believe the client library itself > shouldn’t do > >>>>> any form of reordering. > >>>>> > >>>>> But we should definitely recommend some form of shuffling if client > >>> wants > >>>>> to limit response size. > >>>>> Not sure which shuffling method is better. > >>>>> > >>>>> > >>>>>> Thanks, > >>>>>> > >>>>>> Jun > >>>>>> > >>>>> > >>>>> Thanks, > >>>>> Andrey. > >>>>> > >>>>> > >>> > >>> > > > >