Hi Colin, This is looking good to me. A few comments:
1. The fetch type seems unnecessary in the request and response schemas since it can be inferred by the sessionId/epoch. 2. I agree with Jun that a separate array for partitions to remove would be more intuitive. 3. I'm not super thrilled with the cache configuration since it seems to tie us a bit too closely to the implementation. You've mostly convinced me on the need for the slots config, but I wonder if we can at least do without "min.incremental.fetch.session.eviction.ms"? For one, I think the broker should reserve the right to evict sessions at will. We shouldn't be stuck maintaining a small session at the expense of a much larger one just to enforce this timeout. Internally, I think having some cache stickiness to avoid thrashing makes sense, but I think static values are likely to be good enough and that lets us retain some flexibility to change the behavior in the future. 4. I think the word "incremental" is redundant in the config names. Maybe it could just be "max.fetch.session.cache.slots" for example? Thanks, Jason On Sat, Jan 20, 2018 at 12:54 PM, Colin McCabe <cmcc...@apache.org> wrote: > On Fri, Jan 19, 2018, at 15:02, Jun Rao wrote: > > Hi, Colin, > > > > Thanks for the KIP. Looks good to me overall. Just a couple of more > > comments. > > > > 1. As I mentioned earlier, it might be useful to add some metrics for > > monitoring the usage of the session cache. For example, it would be > useful > > to know how many slots are being used (or unused), # of total partitions > in > > the cached slots (to understand space), the eviction rate (to see if > there > > is any churn), etc. > > Thanks, Jun. Sorry-- I meant to address this earlier, but I forgot about > it. I just added some proposed metrics to the KIP wiki. > > > > > 2. Using max_bytes to 0 represent the removal of a partition seems > > unintuitive. Perhaps it's better to either add a flag per partition or > add > > a removed partition list. > > Perhaps if we use max_bytes -1 to represent removal, it will be more > intuitive? After all, -1 bytes is clearly not a valid amount of bytes to > fetch. Or should be have a separate array of removed TopicPartitions? > > On a related note, in the FetchResponse#PartitionData, we have an "error" > field, plus highWatermark, lastStableOffset, logStartOffset, etc. But when > the "error" field is set, those other fields are not used. Perhaps we > could save some space by just having a separate array of "partitions with > errors." In the common case where there are no errors, this would save 2 > bytes per partition, which could be quite significant in large responses. > > best, > Colin > > > > > Jun > > > > > > On Thu, Jan 18, 2018 at 6:15 PM, Colin McCabe <cmcc...@apache.org> > wrote: > > > > > Hi all, > > > > > > I updated the KIP. There is also an implementation of this KIP here: > > > https://github.com/apache/kafka/pull/4418 > > > > > > The updated implementation simplifies a few things, and adds the > ability > > > to incrementally add or remove individual partitions in an incremental > > > fetch request. > > > > > > best, > > > Colin > > > > > > > > > On Tue, Dec 19, 2017, at 19:28, Colin McCabe wrote: > > > > Hi all, > > > > > > > > I'd like to start the vote on KIP-227: Incremental Fetch Requests. > > > > > > > > The KIP is here: > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > 227%3A+Introduce+Incremental+FetchRequests+to+Increase+ > > > Partition+Scalability > > > > > > > > and discussion thread earlier: > > > > https://www.mail-archive.com/dev@kafka.apache.org/msg83115.html > > > > > > > > thanks, > > > > Colin > > > >