Hi, An option that is not currently covered in the KIP is to have a separate config max.block.ms, which is similar to the producer config with the same name. This came up during the KAFKA-2391 discussion. I think it's clear that we can't rely on request.timeout.ms, so the decision is between adding overloads or adding a new config. People seemed to be leaning towards the latter in KAFKA-2391, but Jason makes a good point that the overloads are more flexible. A couple of questions from me:
1. Do we need the additional flexibility? 2. If we do, do we need it for every blocking method? Ismael On Mon, Mar 19, 2018 at 5:03 PM, Richard Yu <yohan.richard...@gmail.com> wrote: > Hi Guozhang, > > I made some clarifications to KIP-266, namely: > 1. Stated more specifically that commitSync will accept user input. > 2. fetchCommittedOffsets(): Made its role in blocking more clear to the > reader. > 3. Sketched what would happen when time limit is exceeded. > > These changes should make the KIP easier to understand. > > Cheers, > Richard > > On Mon, Mar 19, 2018 at 9:33 AM, Guozhang Wang <wangg...@gmail.com> wrote: > > > Hi Richard, > > > > I made a pass over the KIP again, some more clarifications / comments: > > > > 1. seek() call itself is not blocking, only the following poll() call may > > be blocking as the actually metadata rq will happen. > > > > 2. I saw you did not include Consumer.partitionFor(), > > Consumer.OffsetAndTimestamp() and Consumer.listTopics() in your KIP. > After > > a second thought, I think this may be a better idea to not tackle them in > > the same KIP, and probably we should consider whether we would change the > > behavior or not in another discussion. So I agree to not include them. > > > > 3. In your wiki you mentioned "Another change shall be made to > > KafkaConsumer#poll(), due to its call to updateFetchPositions() which > > blocks indefinitely." This part may a bit obscure to most readers who's > not > > familiar with the KafkaConsumer internals, could you please add more > > elaborations. More specifically, I think the root causes of the public > APIs > > mentioned are a bit different while the KIP's explanation sounds like > they > > are due to the same reason: > > > > 3.1 fetchCommittedOffsets(): this internal call will block forever if the > > committed offsets cannot be fetched successfully and affect position() > and > > committed(). We need to break out of its internal while loop. > > 3.2 position() itself will while loop when offsets cannot be retrieved in > > the underlying async call. We need to break out this while loop. > > 3.3 commitSync() passed Long.MAX_VALUE as the timeout value, we should > take > > the user specified timeouts when applicable. > > > > > > > > Guozhang > > > > On Sat, Mar 17, 2018 at 4:44 PM, Richard Yu <yohan.richard...@gmail.com> > > wrote: > > > > > Actually, what I said above is inaccurate. In > > > testSeekAndCommitWithBrokerFailures, TestUtils.waitUntilTrue blocks, > not > > > seek. > > > My assumption is that seek did not update correctly. I will be digging > > > further into this. > > > > > > > > > > > > On Sat, Mar 17, 2018 at 4:16 PM, Richard Yu < > yohan.richard...@gmail.com> > > > wrote: > > > > > > > One more thing: when looking through tests, I have realized that > seek() > > > > methods can potentially block indefinitely. As you well know, seek() > is > > > > called when pollOnce() or position() is active. Thus, if position() > > > blocks > > > > indefinitely, then so would seek(). Should bounding seek() also be > > > included > > > > in this KIP? > > > > > > > > Thanks, Richard > > > > > > > > On Sat, Mar 17, 2018 at 1:16 PM, Richard Yu < > > yohan.richard...@gmail.com> > > > > wrote: > > > > > > > >> Thanks for the advice, Jason > > > >> > > > >> I have modified KIP-266 to include the java doc for committed() and > > > other > > > >> blocking methods, and I also > > > >> mentioned poll() which will also be bounded. Let me know if there is > > > >> anything else. :) > > > >> > > > >> Sincerely, Richard > > > >> > > > >> > > > >> > > > >> > > > >> > > > >> On Sat, Mar 17, 2018 at 12:00 PM, Jason Gustafson < > ja...@confluent.io > > > > > > >> wrote: > > > >> > > > >>> Hi Richard, > > > >>> > > > >>> Thanks for the updates. I'm really glad you picked this up. A > couple > > > >>> minor > > > >>> comments: > > > >>> > > > >>> 1. Can you list the full set of new APIs explicitly in the KIP? > > > >>> Currently I > > > >>> only see the javadoc for `position()`. > > > >>> > > > >>> 2. We should consider adding `TimeUnit` to the new methods to avoid > > > unit > > > >>> confusion. I know it's inconsistent with the poll() API, but I > think > > it > > > >>> was > > > >>> probably a mistake not to include it there, so better not to double > > > down > > > >>> on > > > >>> that mistake. And note that we do already have `close(long, > > TimeUnit)`. > > > >>> > > > >>> Other than that, I think the current KIP seems reasonable. > > > >>> > > > >>> Thanks, > > > >>> Jason > > > >>> > > > >>> On Wed, Mar 14, 2018 at 5:00 PM, Richard Yu < > > > yohan.richard...@gmail.com> > > > >>> wrote: > > > >>> > > > >>> > Note to all: I have included bounding commitSync() and > committed() > > in > > > >>> this > > > >>> > KIP. > > > >>> > > > > >>> > On Sun, Mar 11, 2018 at 5:05 PM, Richard Yu < > > > >>> yohan.richard...@gmail.com> > > > >>> > wrote: > > > >>> > > > > >>> > > Hi all, > > > >>> > > > > > >>> > > I updated the KIP where overloading position() is now the > favored > > > >>> > approach. > > > >>> > > Bounding position() using requestTimeoutMs has been listed as > > > >>> rejected. > > > >>> > > > > > >>> > > Any thoughts? > > > >>> > > > > > >>> > > On Tue, Mar 6, 2018 at 6:00 PM, Guozhang Wang < > > wangg...@gmail.com> > > > >>> > wrote: > > > >>> > > > > > >>> > >> I agree that adding the overloads is most flexible. But going > > for > > > >>> that > > > >>> > >> direction we'd do that for all the blocking call that I've > > listed > > > >>> above, > > > >>> > >> with this timeout value covering the end-to-end waiting time. > > > >>> > >> > > > >>> > >> > > > >>> > >> Guozhang > > > >>> > >> > > > >>> > >> On Tue, Mar 6, 2018 at 10:02 AM, Ted Yu <yuzhih...@gmail.com> > > > >>> wrote: > > > >>> > >> > > > >>> > >> > bq. The most flexible option is to add overloads to the > > consumer > > > >>> > >> > > > > >>> > >> > This option is flexible. > > > >>> > >> > > > > >>> > >> > Looking at the tail of SPARK-18057, Spark dev voiced the > same > > > >>> choice. > > > >>> > >> > > > > >>> > >> > +1 for adding overload with timeout parameter. > > > >>> > >> > > > > >>> > >> > Cheers > > > >>> > >> > > > > >>> > >> > On Mon, Mar 5, 2018 at 2:42 PM, Jason Gustafson < > > > >>> ja...@confluent.io> > > > >>> > >> > wrote: > > > >>> > >> > > > > >>> > >> > > @Guozhang I probably have suggested all options at some > > point > > > or > > > >>> > >> another, > > > >>> > >> > > including most recently, the current KIP! I was thinking > > that > > > >>> > >> practically > > > >>> > >> > > speaking, the request timeout defines how long the user is > > > >>> willing > > > >>> > to > > > >>> > >> > wait > > > >>> > >> > > for a response. The consumer doesn't really have a complex > > > send > > > >>> > >> process > > > >>> > >> > > like the producer for any of these APIs, so I wasn't sure > > how > > > >>> much > > > >>> > >> > benefit > > > >>> > >> > > there would be from having more granular control over > > timeouts > > > >>> (in > > > >>> > the > > > >>> > >> > end, > > > >>> > >> > > KIP-91 just adds a single timeout to control the whole > > send). > > > >>> That > > > >>> > >> said, > > > >>> > >> > it > > > >>> > >> > > might indeed be better to avoid overloading the config as > > you > > > >>> > suggest > > > >>> > >> > since > > > >>> > >> > > at least it avoids inconsistency with the producer's > usage. > > > >>> > >> > > > > > >>> > >> > > The most flexible option is to add overloads to the > consumer > > > so > > > >>> that > > > >>> > >> > users > > > >>> > >> > > can pass the timeout directly. I'm not sure if that is > more > > or > > > >>> less > > > >>> > >> > > annoying than a new config, but I've found config > timeouts a > > > >>> little > > > >>> > >> > > constraining in practice. For example, I could imagine > users > > > >>> wanting > > > >>> > >> to > > > >>> > >> > > wait longer for an offset commit operation than a position > > > >>> lookup; > > > >>> > if > > > >>> > >> the > > > >>> > >> > > latter isn't timely, users can just pause the partition > and > > > >>> continue > > > >>> > >> > > fetching on others. If you cannot commit offsets, however, > > it > > > >>> might > > > >>> > be > > > >>> > >> > > safer for an application to wait availability of the > > > coordinator > > > >>> > than > > > >>> > >> > > continuing. > > > >>> > >> > > > > > >>> > >> > > -Jason > > > >>> > >> > > > > > >>> > >> > > On Sun, Mar 4, 2018 at 10:14 PM, Guozhang Wang < > > > >>> wangg...@gmail.com> > > > >>> > >> > wrote: > > > >>> > >> > > > > > >>> > >> > > > Hello Richard, > > > >>> > >> > > > > > > >>> > >> > > > Thanks for the proposed KIP. I have a couple of general > > > >>> comments: > > > >>> > >> > > > > > > >>> > >> > > > 1. I'm not sure if piggy-backing the timeout exception > on > > > the > > > >>> > >> > > > existing requestTimeoutMs configured in " > > request.timeout.ms > > > " > > > >>> is a > > > >>> > >> good > > > >>> > >> > > > idea > > > >>> > >> > > > since a) it is a general config that applies for all > types > > > of > > > >>> > >> requests, > > > >>> > >> > > and > > > >>> > >> > > > 2) using it to cover all the phases of an API call, > > > including > > > >>> > >> network > > > >>> > >> > > round > > > >>> > >> > > > trip and potential metadata refresh is shown to not be a > > > good > > > >>> > idea, > > > >>> > >> as > > > >>> > >> > > > illustrated in KIP-91: > > > >>> > >> > > > > > > >>> > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > >>> > >> > > > 91+Provide+Intuitive+User+Timeouts+in+The+Producer > > > >>> > >> > > > > > > >>> > >> > > > In fact, I think in KAFKA-4879 which is aimed for the > same > > > >>> issue > > > >>> > as > > > >>> > >> > > > KAFKA-6608, > > > >>> > >> > > > Jason has suggested we use a new config for the API. > Maybe > > > >>> this > > > >>> > >> would > > > >>> > >> > be > > > >>> > >> > > a > > > >>> > >> > > > more intuitive manner than reusing the > request.timeout.ms > > > >>> config. > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > 2. Besides the Consumer.position() call, there are a > > couple > > > of > > > >>> > more > > > >>> > >> > > > blocking calls today that could result in infinite > > blocking: > > > >>> > >> > > > Consumer.commitSync() and Consumer.committed(), should > > they > > > be > > > >>> > >> > considered > > > >>> > >> > > > in this KIP as well? > > > >>> > >> > > > > > > >>> > >> > > > 3. There are a few other APIs that are today relying on > > > >>> > >> > > request.timeout.ms > > > >>> > >> > > > already for breaking the infinite blocking, namely > > > >>> > >> > > Consumer.partitionFor(), > > > >>> > >> > > > Consumer.OffsetAndTimestamp() and Consumer.listTopics(), > > if > > > >>> we are > > > >>> > >> > making > > > >>> > >> > > > the other blocking calls to be relying a new config as > > > >>> suggested > > > >>> > in > > > >>> > >> 1) > > > >>> > >> > > > above, should we also change the semantics of these API > > > >>> functions > > > >>> > >> for > > > >>> > >> > > > consistency? > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > Guozhang > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > On Sun, Mar 4, 2018 at 11:13 AM, Richard Yu < > > > >>> > >> > yohan.richard...@gmail.com> > > > >>> > >> > > > wrote: > > > >>> > >> > > > > > > >>> > >> > > > > Hi all, > > > >>> > >> > > > > > > > >>> > >> > > > > I would like to discuss a potential change which would > > be > > > >>> made > > > >>> > to > > > >>> > >> > > > > KafkaConsumer: > > > >>> > >> > > > > https://cwiki.apache.org/confluence/pages/viewpage. > > > >>> > >> > > > action?pageId=75974886 > > > >>> > >> > > > > > > > >>> > >> > > > > Thanks, > > > >>> > >> > > > > Richard Yu > > > >>> > >> > > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > > > > >>> > >> > > > -- > > > >>> > >> > > > -- Guozhang > > > >>> > >> > > > > > > >>> > >> > > > > > >>> > >> > > > > >>> > >> > > > >>> > >> > > > >>> > >> > > > >>> > >> -- > > > >>> > >> -- Guozhang > > > >>> > >> > > > >>> > > > > > >>> > > > > > >>> > > > > >>> > > > >> > > > >> > > > > > > > > > > > > > > > -- > > -- Guozhang > > >