Hi all,

> I saw we added some new configs/metrics.

I have removed the recent changes to the public interfaces to limit the
scope of the KIP to minimum. PTAL.

Thanks,
Kamal

On Wed, Aug 14, 2024 at 9:58 AM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi Luke,
>
> > LC5
> Agree, this is a rare scenario. Given that we have a common pool of
> request handler threads to accept
> all the incoming requests and there are no quotas to handle for each
> request. I'm OK with reusing the
> same remote-log-reader threads for LIST_OFFSETS requests. There may be
> noisy neighbor issues
> in handling the LIST_OFFSETS and FETCH remote requests when we read from
> remote storage
> aggressively and all the remote-log-reader threads are busy.
>
> >  If so, maybe the additional config/metrics are not necessary?
> Do you mean to have a separate thread pool and hardcode the num threads?
>
> > LC6 and LC7
> Updated the KIP.
>
> Thanks,
> Kamal
>
> On Wed, Aug 14, 2024 at 8:05 AM Luke Chen <show...@gmail.com> wrote:
>
>> Hi Kamal,
>>
>> Thanks for the response.
>>
>> I saw we added some new configs/metrics. Comments:
>>
>> LC5: Do you think this is a commonly happened issue that we need to add a
>> separate `remote.log.offset.reader.threads` for it?
>> I thought this rarely happened. If so, maybe the additional config/metrics
>> are not necessary? It makes the config more complicated.
>>
>> LC6: The config name:
>> `remote.log.offset.read.max.pending.tasks` , should we be consistent to
>> use
>> `reader`, instead of `read`?
>>
>> LC7: We should set a default value for the newly introduced configs and
>> written in KIP.
>>
>> Thanks.
>> Luke
>>
>> On Tue, Aug 13, 2024 at 8:47 PM Kamal Chandraprakash <
>> kamal.chandraprak...@gmail.com> wrote:
>>
>> > Hi Luke,
>> >
>> > Thanks for the review!
>> >
>> > > LC2:
>> > a. If the time taken to process the request is less than 5 mins, then
>> the
>> > Admin client will get a response.
>> > b. If the time taken to process the request is more than 5 mins, then
>> the
>> > Admin client will itself timeout the request due to the
>> > default-api-timeout.
>> > c. If the time taken to process the request is more than 6 mins, then
>> > the server will cancel the request in the DelayedRemoteListOffsets
>> > purgatory (to be implemented) and
>> >     send TimeoutException back to the client if the client is waiting
>> for
>> > the response.
>> >
>> > > LC3:
>> > Updated the KIP.
>> >
>> > > LC4:
>> > The consumer retries the LIST_OFFSETS request incase of failures/timeout
>> > but not the AdminClient. So, I think this is a retry feature in the
>> > Consumer.
>> >
>> > Updated the "Public Interfaces" section in the KIP
>> > <
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1075%3A+Introduce+delayed+remote+list+offsets+purgatory+to+make+LIST_OFFSETS+async
>> > >
>> > by adding more details. PTAL.
>> >
>> > Thanks,
>> > Kamal
>> >
>> >
>> > On Tue, Aug 13, 2024 at 2:03 PM Luke Chen <show...@gmail.com> wrote:
>> >
>> > > Hi Kamal,
>> > >
>> > > Thanks for the update.
>> > > LC1: I see. Thanks.
>> > > LC2: What I still don't understand is what is the relationship between
>> > > remote.list.offsets.request.timeout.ms V.S.
>> > > request.timeout/default.api.timeout.
>> > > Suppose we set request timeout to 30 seconds, default.api.timeout=5
>> mins
>> > > and remote.list.offsets.request.timeout.ms = 6 mins.
>> > > So, when Admin sends a list offset request that needs to query remote
>> > > storage, when will it throw timeout exception? 30 secs or 5 mins or 6
>> > mins?
>> > > We might need to make it clear in the KIP.
>> > >
>> > > LC3:
>> > > "Admin sends only one request and wait for upto default-api-timeout.
>> (eg)
>> > > If the admin is configured with default-api-timeout as 5 mins and
>> > > request-timeout as 30 seconds. And, the server takes 50 seconds to
>> > process
>> > > the LIST_OFFSETS request, then the admin sends only one LIST_OFFSETS
>> > > request, then receives the request from server after 50 seconds."
>> > >
>> > > In the end of the section, it should be receives the "response" from
>> > > server?
>> > >
>> > > LC4: I found the different consumer and admin behavior when setting
>> > > "request.timeout" and "default.api.timeout" is confusing. Are they
>> > expected
>> > > or a bug?
>> > >
>> > > Thank you.
>> > > Luke
>> > >
>> > >
>> > > On Thu, Aug 8, 2024 at 10:06 PM Kamal Chandraprakash <
>> > > kamal.chandraprak...@gmail.com> wrote:
>> > >
>> > > > Hi Luke,
>> > > >
>> > > > Thanks for the review!
>> > > >
>> > > > LC1: When the consumer starts to read data, then it might need the
>> > below
>> > > > offsets:
>> > > > earliest, latest, and last-committed-offset based on the
>> > > > "auto.offset.reset" config.
>> > > >
>> > > > The earliest and latest offsets have special timestamps -2 and -1,
>> > those
>> > > > timestamp corresponding offsets are cached in the
>> > > > broker memory and get served immediately. The last-committed-offset
>> is
>> > > also
>> > > > cached in the GroupMetadata and
>> > > > gets served in the "OFFSET_FETCH" request. Unless the consumer
>> > > > explicitly uses the KafkaConsumer#offsetForTimes API,
>> > > > there won't be any delay in serving the data from the local log.
>> > > >
>> > > > In this KIP, we are trying to address the case in which multiple
>> > > consumers
>> > > > start at the same time and use the 'offsetForTimes` LIST_OFFSETS
>> API,
>> > > > assuming the remote requests are slow, then it should not block
>> other
>> > > > PRODUCE/FETCH requests.
>> > > >
>> > > > LC2: Sorry for the confusion. We were planning to introduce only the
>> > > broker
>> > > > config "remote.list.offsets.timeout.ms".
>> > > > If we add the timeout in the ListOffsetsRequest.json, then when old
>> > > clients
>> > > > talk with the new broker, we don't have
>> > > > a timeout to set on the server. Moved adding the timeout to the
>> > > > ListOffsetsRequest to the rejected alternatives section.
>> > > >
>> > > > --
>> > > > Kamal
>> > > >
>> > >
>> >
>>
>

Reply via email to