Hi all, If there are no more comments, I'll start a vote soon.
Thanks, Kamal On Wed, Aug 14, 2024 at 5:05 PM Luke Chen <show...@gmail.com> wrote: > Hi Kamal, > > Thanks for the update. > LGTM. > > Luke > > On Wed, Aug 14, 2024 at 7:25 PM Kamal Chandraprakash < > kamal.chandraprak...@gmail.com> wrote: > > > Hi all, > > > > > I saw we added some new configs/metrics. > > > > I have removed the recent changes to the public interfaces to limit the > > scope of the KIP to minimum. PTAL. > > > > Thanks, > > Kamal > > > > On Wed, Aug 14, 2024 at 9:58 AM Kamal Chandraprakash < > > kamal.chandraprak...@gmail.com> wrote: > > > > > Hi Luke, > > > > > > > LC5 > > > Agree, this is a rare scenario. Given that we have a common pool of > > > request handler threads to accept > > > all the incoming requests and there are no quotas to handle for each > > > request. I'm OK with reusing the > > > same remote-log-reader threads for LIST_OFFSETS requests. There may be > > > noisy neighbor issues > > > in handling the LIST_OFFSETS and FETCH remote requests when we read > from > > > remote storage > > > aggressively and all the remote-log-reader threads are busy. > > > > > > > If so, maybe the additional config/metrics are not necessary? > > > Do you mean to have a separate thread pool and hardcode the num > threads? > > > > > > > LC6 and LC7 > > > Updated the KIP. > > > > > > Thanks, > > > Kamal > > > > > > On Wed, Aug 14, 2024 at 8:05 AM Luke Chen <show...@gmail.com> wrote: > > > > > >> Hi Kamal, > > >> > > >> Thanks for the response. > > >> > > >> I saw we added some new configs/metrics. Comments: > > >> > > >> LC5: Do you think this is a commonly happened issue that we need to > add > > a > > >> separate `remote.log.offset.reader.threads` for it? > > >> I thought this rarely happened. If so, maybe the additional > > config/metrics > > >> are not necessary? It makes the config more complicated. > > >> > > >> LC6: The config name: > > >> `remote.log.offset.read.max.pending.tasks` , should we be consistent > to > > >> use > > >> `reader`, instead of `read`? > > >> > > >> LC7: We should set a default value for the newly introduced configs > and > > >> written in KIP. > > >> > > >> Thanks. > > >> Luke > > >> > > >> On Tue, Aug 13, 2024 at 8:47 PM Kamal Chandraprakash < > > >> kamal.chandraprak...@gmail.com> wrote: > > >> > > >> > Hi Luke, > > >> > > > >> > Thanks for the review! > > >> > > > >> > > LC2: > > >> > a. If the time taken to process the request is less than 5 mins, > then > > >> the > > >> > Admin client will get a response. > > >> > b. If the time taken to process the request is more than 5 mins, > then > > >> the > > >> > Admin client will itself timeout the request due to the > > >> > default-api-timeout. > > >> > c. If the time taken to process the request is more than 6 mins, > then > > >> > the server will cancel the request in the DelayedRemoteListOffsets > > >> > purgatory (to be implemented) and > > >> > send TimeoutException back to the client if the client is > waiting > > >> for > > >> > the response. > > >> > > > >> > > LC3: > > >> > Updated the KIP. > > >> > > > >> > > LC4: > > >> > The consumer retries the LIST_OFFSETS request incase of > > failures/timeout > > >> > but not the AdminClient. So, I think this is a retry feature in the > > >> > Consumer. > > >> > > > >> > Updated the "Public Interfaces" section in the KIP > > >> > < > > >> > > > >> > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1075%3A+Introduce+delayed+remote+list+offsets+purgatory+to+make+LIST_OFFSETS+async > > >> > > > > >> > by adding more details. PTAL. > > >> > > > >> > Thanks, > > >> > Kamal > > >> > > > >> > > > >> > On Tue, Aug 13, 2024 at 2:03 PM Luke Chen <show...@gmail.com> > wrote: > > >> > > > >> > > Hi Kamal, > > >> > > > > >> > > Thanks for the update. > > >> > > LC1: I see. Thanks. > > >> > > LC2: What I still don't understand is what is the relationship > > between > > >> > > remote.list.offsets.request.timeout.ms V.S. > > >> > > request.timeout/default.api.timeout. > > >> > > Suppose we set request timeout to 30 seconds, > default.api.timeout=5 > > >> mins > > >> > > and remote.list.offsets.request.timeout.ms = 6 mins. > > >> > > So, when Admin sends a list offset request that needs to query > > remote > > >> > > storage, when will it throw timeout exception? 30 secs or 5 mins > or > > 6 > > >> > mins? > > >> > > We might need to make it clear in the KIP. > > >> > > > > >> > > LC3: > > >> > > "Admin sends only one request and wait for upto > default-api-timeout. > > >> (eg) > > >> > > If the admin is configured with default-api-timeout as 5 mins and > > >> > > request-timeout as 30 seconds. And, the server takes 50 seconds to > > >> > process > > >> > > the LIST_OFFSETS request, then the admin sends only one > LIST_OFFSETS > > >> > > request, then receives the request from server after 50 seconds." > > >> > > > > >> > > In the end of the section, it should be receives the "response" > from > > >> > > server? > > >> > > > > >> > > LC4: I found the different consumer and admin behavior when > setting > > >> > > "request.timeout" and "default.api.timeout" is confusing. Are they > > >> > expected > > >> > > or a bug? > > >> > > > > >> > > Thank you. > > >> > > Luke > > >> > > > > >> > > > > >> > > On Thu, Aug 8, 2024 at 10:06 PM Kamal Chandraprakash < > > >> > > kamal.chandraprak...@gmail.com> wrote: > > >> > > > > >> > > > Hi Luke, > > >> > > > > > >> > > > Thanks for the review! > > >> > > > > > >> > > > LC1: When the consumer starts to read data, then it might need > the > > >> > below > > >> > > > offsets: > > >> > > > earliest, latest, and last-committed-offset based on the > > >> > > > "auto.offset.reset" config. > > >> > > > > > >> > > > The earliest and latest offsets have special timestamps -2 and > -1, > > >> > those > > >> > > > timestamp corresponding offsets are cached in the > > >> > > > broker memory and get served immediately. The > > last-committed-offset > > >> is > > >> > > also > > >> > > > cached in the GroupMetadata and > > >> > > > gets served in the "OFFSET_FETCH" request. Unless the consumer > > >> > > > explicitly uses the KafkaConsumer#offsetForTimes API, > > >> > > > there won't be any delay in serving the data from the local log. > > >> > > > > > >> > > > In this KIP, we are trying to address the case in which multiple > > >> > > consumers > > >> > > > start at the same time and use the 'offsetForTimes` LIST_OFFSETS > > >> API, > > >> > > > assuming the remote requests are slow, then it should not block > > >> other > > >> > > > PRODUCE/FETCH requests. > > >> > > > > > >> > > > LC2: Sorry for the confusion. We were planning to introduce only > > the > > >> > > broker > > >> > > > config "remote.list.offsets.timeout.ms". > > >> > > > If we add the timeout in the ListOffsetsRequest.json, then when > > old > > >> > > clients > > >> > > > talk with the new broker, we don't have > > >> > > > a timeout to set on the server. Moved adding the timeout to the > > >> > > > ListOffsetsRequest to the rejected alternatives section. > > >> > > > > > >> > > > -- > > >> > > > Kamal > > >> > > > > > >> > > > > >> > > > >> > > > > > >