[VOTE] KIP-930: Tiered Storage Metrics

2023-07-24 Thread Abhijeet Kumar
Hi All,

I would like to start the vote for KIP-930 Tiered Storage Metrics.

The KIP is here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Tiered+Storage+Metrics

Regards
Abhijeet.


Re: [DISCUSS] KIP-930: Tiered Storage Metrics

2023-07-24 Thread Abhijeet Kumar
Hi Kamal,

As we discussed offline, I will rename this KIP so that it only captures
the aspect of renaming the previously added metrics to remove ambiguity.
I will create another KIP for RemoteIndexCache metrics and other relevant
tiered storage metrics.

On Tue, Jul 25, 2023 at 12:03 PM Kamal Chandraprakash <
kamal.chandraprak...@gmail.com> wrote:

> Hi Abhijeet,
>
> Thanks for the KIP!
>
> We are changing the metric names from what was proposed in the KIP-405 and
> adding new metrics for RemoteIndexCache.
> In the KIP, it's not clear whether we are renaming the aggregate broker
> level metrics for remote copy/fetch/failed-copy/failed-fetch.
>
> Are these metrics enough to monitor all the aspects of tiered storage?
>
> (eg)
> 1. Metrics to see the Tier Lag Status by number of pending
> segments/records.
> 2. Similar to log-start-offset and log-end-offset metrics.  Should we
> expose local-log-start-offset and highest-offset-uploaded-to-remote-storage
> as metric?
>
> Thanks,
> Kamal
>
> On Mon, Jul 24, 2023 at 2:08 PM Abhijeet Kumar  >
> wrote:
>
> > Hi All,
> >
> > I created KIP-930 for adding RemoteIndexCache stats and also to rename
> some
> > tiered storage metrics added as part of KIP-405
> > <
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-NewMetrics
> > >
> > to remove ambiguity.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Tiered+Storage+Metrics
> >
> > Feedback and suggestions are welcome.
> >
> > Regards,
> > Abhijeet.
> >
>


-- 
Abhijeet.


Re: [VOTE] KIP-930: Tiered Storage Metrics

2023-07-25 Thread Abhijeet Kumar
Please find the updated link to the KIP:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Rename+ambiguous+Tiered+Storage+Metrics

Updated the KIP as per our conversation on the discussion thread.

On Tue, Jul 25, 2023 at 11:29 AM Abhijeet Kumar 
wrote:

> Hi All,
>
> I would like to start the vote for KIP-930 Tiered Storage Metrics.
>
> The KIP is here:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Tiered+Storage+Metrics
>
> Regards
> Abhijeet.
>
>

-- 
Abhijeet.


Re: [VOTE] KIP-930: Tiered Storage Metrics

2023-07-30 Thread Abhijeet Kumar
Hi All,

This KIP is accepted with 3 +1 binding votes(Luke, Satish, Divij) and
2 +1 non-binding votes(Jorge, Kamal) .

Thank you all for voting.


On Mon, Jul 31, 2023 at 10:53 AM Satish Duggana 
wrote:

> Thanks for voting on the KIP.
>
> I agree we should raise the KIPs early so that we can have a longer
> duration for other people to take a look.
>
> It seems whoever was involved in the earlier discussions for this
> change commented and voted. It is a minor KIP to rename existing
> metrics, and it has been there for  ~ a week. We can close this with
> vote results to be included in 3.6.0.
>
> ~Satish.
>
> On Tue, 25 Jul 2023 at 23:17, Divij Vaidya 
> wrote:
> >
> > Thank you for the KIP Abhinav. Although we should avoid changing
> > customer-facing interfaces (such as metrics) after a KIP is accepted,
> > in this case, I think that the divergence is minimal and the right
> > thing to do in the longer run. Hence, I would consider this change as
> > a one-off exception and not a precedent for the future changes.
> >
> > +1 (binding) from me.
> >
> > Also, I think we should leave the vote open longer for some duration
> > (at least 2 weeks) to give an opportunity for folks in the community
> > to add any thoughts that they might have. The KIP has been published
> > for only 1 day so far and interested folks may not have had a chance
> > to look into it yet.
> >
> > --
> > Divij Vaidya
> >
> >
> >
> > On Tue, Jul 25, 2023 at 6:45 PM Satish Duggana 
> wrote:
> > >
> > > +1 for the KIP.
> > >
> > > Thanks,
> > > Satish.
> > >
> > > On Tue, 25 Jul 2023 at 18:31, Kamal Chandraprakash
> > >  wrote:
> > > >
> > > > +1 (non-binding)
> > > >
> > > > --
> > > > Kamal
> > > >
> > > > On Tue, Jul 25, 2023 at 11:30 AM Abhijeet Kumar <
> abhijeet.cse@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I would like to start the vote for KIP-930 Tiered Storage Metrics.
> > > > >
> > > > > The KIP is here:
> > > > >
> > > > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Tiered+Storage+Metrics
> > > > >
> > > > > Regards
> > > > > Abhijeet.
> > > > >
>


-- 
Abhijeet.


[DISCUSS] KIP-956: Tiered Storage Quotas

2023-11-22 Thread Abhijeet Kumar
Hi All,

I have created KIP-956 for defining read and write quota for tiered storage.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas

Feedback and suggestions are welcome.

Regards,
Abhijeet.


Re: [VOTE] KIP-1023: Follower fetch from tiered offset

2024-04-28 Thread Abhijeet Kumar
Hi All,

This KIP is accepted with 3 +1 binding votes(Luke, Christo, Jun, Satish)
and 2 +1 non-binding votes(Kamal, Omnia).

Thank you all for voting.

Abhijeet.



On Sat, Apr 27, 2024 at 5:50 AM Satish Duggana 
wrote:

> Thanks Abhijeet for the KIP.
> +1 from me.
>
> ~Satish
>
> On Fri, 26 Apr 2024 at 8:35 PM, Jun Rao  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the KIP. +1
> >
> > Jun
> >
> > On Thu, Apr 25, 2024 at 10:30 PM Abhijeet Kumar <
> > abhijeet.cse@gmail.com>
> > wrote:
> >
> > > Hi All,
> > >
> > > I would like to start the vote for KIP-1023 - Follower fetch from
> tiered
> > > offset
> > >
> > > The KIP is here:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
> > >
> > > Regards.
> > > Abhijeet.
> > >
> >
>


Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-23 Thread Abhijeet Kumar
Hi Divij,

Seems like there is some confusion about the new protocol for fetching from
tiered offset.
The scenario you are highlighting is where,
Leader's Log Start Offset = 0
Last Tiered Offset = 10

Following is the sequence of events that will happen:

1. Follower requests offset 0 from the leader
2. Assuming offset 0 is not available locally (to arrive at your scenario),
Leader returns OffsetMovedToTieredStorageException
3. Follower fetches the earliest pending upload offset and receives 11
4. Follower builds aux state from [0-10] and sets the fetch offset to 11
(This step corresponds to step 3 in your email)

At this stage, even if the leader has uploaded more data and the
last-tiered offset has changed (say to 15), it will not matter
because offset 11 should still be available on the leader and when the
follower requests data with fetch offset 11, the leader
will return with a valid partition data response which the follower can
consume and proceed further. Even if the offset 11 is not
available anymore, the follower will eventually be able to catch up with
the leader by resetting its fetch offset until the offset
is available on the leader's local log. Once it catches up, replication on
the follower can proceed.

Regards,
Abhijeet.



On Tue, Jul 2, 2024 at 3:55 PM Divij Vaidya  wrote:

> Hi folks.
>
> I am late to the party but I have a question on the proposal.
>
> How are we preventing a situation such as the following:
>
> 1. Empty follower asks leader for 0
> 2. Leader compares 0 with last-tiered-offset, and responds with 11 (where10
> is last-tiered-offset) and a OffsetMovedToTieredException
> 3. Follower builds aux state from [0-10] and sets the fetch offset to 11
> 4. But leader has already uploaded more data and now the new
> last-tiered-offset is 15
> 5. Go back to 2
>
> This could cause a cycle where the replica will be stuck trying to
> reconcile with the leader.
>
> --
> Divij Vaidya
>
>
>
> On Fri, Apr 26, 2024 at 7:28 AM Abhijeet Kumar  >
> wrote:
>
> > Thank you all for your comments. As all the comments in the thread are
> > addressed, I am starting a Vote thread for the KIP. Please have a look.
> >
> > Regards.
> > Abhijeet.
> >
> >
> >
> > On Thu, Apr 25, 2024 at 6:08 PM Luke Chen  wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the update.
> > >
> > > I have no more comments.
> > >
> > > Luke
> > >
> > > On Thu, Apr 25, 2024 at 4:21 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Abhijeet,
> > > >
> > > > Thanks for the updated KIP. It looks good to me.
> > > >
> > > > Jun
> > > >
> > > > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
> > > > abhijeet.cse@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Please find my comments inline.
> > > > >
> > > > >
> > > > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao 
> > > > wrote:
> > > > >
> > > > > > Hi, Abhijeet,
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > 1. I am wondering if we could achieve the same result by just
> > > lowering
> > > > > > local.retention.ms and local.retention.bytes. This also allows
> the
> > > > newly
> > > > > > started follower to build up the local data before serving the
> > > consumer
> > > > > > traffic.
> > > > > >
> > > > >
> > > > > I am not sure I fully followed this. Do you mean we could lower the
> > > > > local.retention (by size and time)
> > > > > so that there is little data on the leader's local storage so that
> > the
> > > > > follower can quickly catch up with the leader?
> > > > >
> > > > > In that case, we will need to set small local retention across
> > brokers
> > > in
> > > > > the cluster. It will have the undesired
> > > > > effect where there will be increased remote log fetches for serving
> > > > consume
> > > > > requests, and this can cause
> > > > > degradations. Also, this behaviour (of increased remote fetches)
> will
> > > > > happen on all brokers at all times, whereas in
> > > > > the KIP we are restricting the behavior only to the newly
> > bootstrapped
> > > > > brokers and only until the time it fu

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-24 Thread Abhijeet Kumar
Hi Divij,

The rare scenario we are discussing is similar to an empty follower trying
to catch up with the leader for a topic that is not enabled with tiered
storage. Consider the following steps:

1. Follower requests offset 0 from the leader.
2. Offset 0 is no more valid on the leader as its log start offset is 10,
hence leader throws Out of Range error
3. Follower fetches the earliest offset from the leader and gets 10, then
resets its Fetch offset to 10.
4. Follower requests offset 10 from the leader, but the previous log start
offset (10) is deleted from the leader and the new log start offset is 15.
Hence the leader throws an Out of Range error. The follower goes back to
step 3

Even in this scenario, theoretically, the follower will never be able to
catch up. Since this is an existing problem, that affects even regular
replication for topics without tiered storage, should we take this up
separately?
I can add a small note in the KIP saying that this behavior for follower
catchup is similar to a scenario when tiered storage is not enabled.

Regards,
Abhijeet.



On Tue, Jul 23, 2024 at 4:49 PM Divij Vaidya 
wrote:

> Thank you for your response Abhijeet. You have understood the scenario
> correctly. For the purpose of discussion, please consider the latter case
> where offset 11 is not available on the leader anymore (it got cleaned
> locally since the last tiered offset is 15). In such a case, you
> mentioned, the follower will eventually be able to catch up with the leader
> by resetting its fetch offset until the offset is available on the leader's
> local log. Correct me if I am wrong but it is not guaranteed that it will
> eventually catch up because theoretically, everytime it asks for a newer
> fetch offset, the leader may have deleted it locally. I understand that it
> is an edge case scenario which will only happen with configurations for
> small segment sizes and aggressive cleaning but nevertheless, it is a
> possible scenario.
>
> Do you agree that theoretically it is possible for the follower to loop
> such that it is never able to catch up?
>
> We can proceed with the KIP with an understanding that this scenario is
> rare and we are willing to accept the risk of it. In such a case, we should
> add a detection mechanism for such a scenario in the KIP, so that if we
> encounter this scenario, the user has a way to detect (and mitigate it).
> Alternatively, we can change the KIP design to ensure that we never
> encounter this scenario. Given the rarity of the scenario, I am ok with
> having a detection mechanism (metric?) in place and having this scenario
> documented as an acceptable risk in current design.
>
> --
> Divij Vaidya
>
>
>
> On Tue, Jul 23, 2024 at 11:55 AM Abhijeet Kumar <
> abhijeet.cse@gmail.com>
> wrote:
>
> > Hi Divij,
> >
> > Seems like there is some confusion about the new protocol for fetching
> from
> > tiered offset.
> > The scenario you are highlighting is where,
> > Leader's Log Start Offset = 0
> > Last Tiered Offset = 10
> >
> > Following is the sequence of events that will happen:
> >
> > 1. Follower requests offset 0 from the leader
> > 2. Assuming offset 0 is not available locally (to arrive at your
> scenario),
> > Leader returns OffsetMovedToTieredStorageException
> > 3. Follower fetches the earliest pending upload offset and receives 11
> > 4. Follower builds aux state from [0-10] and sets the fetch offset to 11
> > (This step corresponds to step 3 in your email)
> >
> > At this stage, even if the leader has uploaded more data and the
> > last-tiered offset has changed (say to 15), it will not matter
> > because offset 11 should still be available on the leader and when the
> > follower requests data with fetch offset 11, the leader
> > will return with a valid partition data response which the follower can
> > consume and proceed further. Even if the offset 11 is not
> > available anymore, the follower will eventually be able to catch up with
> > the leader by resetting its fetch offset until the offset
> > is available on the leader's local log. Once it catches up, replication
> on
> > the follower can proceed.
> >
> > Regards,
> > Abhijeet.
> >
> >
> >
> > On Tue, Jul 2, 2024 at 3:55 PM Divij Vaidya 
> > wrote:
> >
> > > Hi folks.
> > >
> > > I am late to the party but I have a question on the proposal.
> > >
> > > How are we preventing a situation such as the following:
> > >
> > > 1. Empty follower asks leader for 0
> > > 2. Leader compares 0 with last-tiered-offset, and responds with 11
> > (where10
> > > is last-tiered

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-07-25 Thread Abhijeet Kumar
>
> I am ok with adding a note in the KIP but the note should say that it has
> an elevated risk for this scenario due to increased probability of having
> an aggressive local cleanup with Tiered Storage.
>

I would like to clarify that there is no elevated risk because of this KIP.
This risk
already exists with the current protocol where an empty follower for a
tiered storage
enabled topic catches up with the leader where it replicates data beginning
from
leader's local log start offset.

In the current KIP, the follower starts from last tiered offset instead
which is usually
much ahead of leader's local log start offset and can never be behind the
leader's
local log start offset. Hence this offset is more likely to be available on
the leader's
local log compared to the local log start offset in this rare scenario.

Therefore, the risk is actually lower when the follower starts with last
tiered offset instead
of the local log start offset.

Abhijeet.



On Wed, Jul 24, 2024 at 7:26 PM Divij Vaidya 
wrote:

> The difference between the two scenarios you mentioned is that with Tiered
> Storage, the chances of hitting this scenario increases since a user is
> likely to have an aggressive setting for local disk data cleanup, which
> would not be the case in empty followers catching up in a non-tiered
> storage world.
>
> I am ok with adding a note in the KIP but the note should say that it has
> an elevated risk for this scenario due to increased probability of having
> an aggressive local cleanup with Tiered Storage.
>
> --
> Divij Vaidya
>
>
>
> On Wed, Jul 24, 2024 at 1:22 PM Abhijeet Kumar  >
> wrote:
>
> > Hi Divij,
> >
> > The rare scenario we are discussing is similar to an empty follower
> trying
> > to catch up with the leader for a topic that is not enabled with tiered
> > storage. Consider the following steps:
> >
> > 1. Follower requests offset 0 from the leader.
> > 2. Offset 0 is no more valid on the leader as its log start offset is 10,
> > hence leader throws Out of Range error
> > 3. Follower fetches the earliest offset from the leader and gets 10, then
> > resets its Fetch offset to 10.
> > 4. Follower requests offset 10 from the leader, but the previous log
> start
> > offset (10) is deleted from the leader and the new log start offset is
> 15.
> > Hence the leader throws an Out of Range error. The follower goes back to
> > step 3
> >
> > Even in this scenario, theoretically, the follower will never be able to
> > catch up. Since this is an existing problem, that affects even regular
> > replication for topics without tiered storage, should we take this up
> > separately?
> > I can add a small note in the KIP saying that this behavior for follower
> > catchup is similar to a scenario when tiered storage is not enabled.
> >
> > Regards,
> > Abhijeet.
> >
> >
> >
> > On Tue, Jul 23, 2024 at 4:49 PM Divij Vaidya 
> > wrote:
> >
> > > Thank you for your response Abhijeet. You have understood the scenario
> > > correctly. For the purpose of discussion, please consider the latter
> case
> > > where offset 11 is not available on the leader anymore (it got cleaned
> > > locally since the last tiered offset is 15). In such a case, you
> > > mentioned, the follower will eventually be able to catch up with the
> > leader
> > > by resetting its fetch offset until the offset is available on the
> > leader's
> > > local log. Correct me if I am wrong but it is not guaranteed that it
> will
> > > eventually catch up because theoretically, everytime it asks for a
> newer
> > > fetch offset, the leader may have deleted it locally. I understand that
> > it
> > > is an edge case scenario which will only happen with configurations for
> > > small segment sizes and aggressive cleaning but nevertheless, it is a
> > > possible scenario.
> > >
> > > Do you agree that theoretically it is possible for the follower to loop
> > > such that it is never able to catch up?
> > >
> > > We can proceed with the KIP with an understanding that this scenario is
> > > rare and we are willing to accept the risk of it. In such a case, we
> > should
> > > add a detection mechanism for such a scenario in the KIP, so that if we
> > > encounter this scenario, the user has a way to detect (and mitigate
> it).
> > > Alternatively, we can change the KIP design to ensure that we never
> > > encounter this scenario. Given the rarity of the scenario, I am ok with
> > > having a detection mechanism (metric?) in place

Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-02-02 Thread Abhijeet Kumar
Yes, it is part of V1. I have added it now.

I will follow up on the KIP comments in the next couple of days and
try to close the review in the next few weeks.

Regards.

On Thu, Feb 1, 2024 at 7:40 PM Francois Visconte
 wrote:
>
> Hi,
>
> I see that the ticket has been left untouched since a while now.
> Should it be included in the tiered storage v1?
> We've observed that lacking a way to throttle uploads to tiered storage has
> a major impact on
> producers and consumers when tiered storage access recovers (starving disk
> IOps/throughput or CPU).
> For this reason, I think this is an important feature and possibly worth
> including in v1?
>
> Regards,
>
>
> On Tue, Dec 5, 2023 at 8:43 PM Jun Rao  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the KIP. A few comments.
> >
> > 10. remote.log.manager.write.quota.default:
> > 10.1 For other configs, we
> > use replica.alter.log.dirs.io.max.bytes.per.second. To be consistent,
> > perhaps this can be sth like remote.log.manager.write.max.bytes.per.second.
> > 10.2 Could we list the new metrics associated with the new quota.
> > 10.3 Is this dynamically configurable? If so, could we document the impact
> > to tools like kafka-configs.sh and AdminClient?
> >
> > Jun
> >
> > On Tue, Nov 28, 2023 at 2:19 AM Luke Chen  wrote:
> >
> > > Hi Abhijeet,
> > >
> > > Thanks for the KIP!
> > > This is an important feature for tiered storage.
> > >
> > > Some comments:
> > > 1. Will we introduce new metrics for this tiered storage quotas?
> > > This is important because the admin can know the throttling status by
> > > checking the metrics while the remote write/read are slow, like the rate
> > of
> > > uploading/reading byte rate, the throttled time for upload/read... etc.
> > >
> > > 2. Could you give some examples for the throttling algorithm in the KIP
> > to
> > > explain it? That will make it much clearer.
> > >
> > > 3. To solve this problem, we can break down the RLMTask into two smaller
> > > tasks - one for segment upload and the other for handling expired
> > segments.
> > > How do we handle the situation when a segment is still waiting for
> > > offloading while this segment is expired and eligible to be deleted?
> > > Maybe it'll be easier to not block the RLMTask when quota exceeded, and
> > > just check it each time the RLMTask runs?
> > >
> > > Thank you.
> > > Luke
> > >
> > > On Wed, Nov 22, 2023 at 6:27 PM Abhijeet Kumar <
> > abhijeet.cse@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > I have created KIP-956 for defining read and write quota for tiered
> > > > storage.
> > > >
> > > >
> > > >
> > >
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas
> > > >
> > > > Feedback and suggestions are welcome.
> > > >
> > > > Regards,
> > > > Abhijeet.
> > > >
> > >
> >



-- 
Abhijeet.


Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-02-12 Thread Abhijeet Kumar
uation when a segment is still waiting for
> offloading while this segment is expired and eligible to be deleted?
> Maybe it'll be easier to not block the RLMTask when quota exceeded, and
> just check it each time the RLMTask runs?
>

The concern here is that this may cause starvation from some topic
partitions. RLMTasks are added to the thread pool executor
to run with a specific schedule (every 30 secs). If we do not block the
RLMTask when the quota is exceeded and instead skip
the run, RLMTask for certain topic partitions may never run, because every
time it is scheduled to run, the broker-level upload
quota at that instant may have already been exhausted

Breaking down the RLMTask into smaller tasks is already being proposed in
KIP-950. The two tasks will need some coordination

at a topic-partition level to handle such cases as pointed out. For this
particular case, the upload task could skip uploading the

segment if it is already eligible for deletion.


> Thank you.
> Luke
>
> On Wed, Nov 22, 2023 at 6:27 PM Abhijeet Kumar 
> wrote:
>
> > Hi All,
> >
> > I have created KIP-956 for defining read and write quota for tiered
> > storage.
> >
> >
> >
https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas
> >
> > Feedback and suggestions are welcome.
> >
> > Regards,
> > Abhijeet.
> >



-- 
Abhijeet.


Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-02-12 Thread Abhijeet Kumar
Comments inline

On Wed, Dec 6, 2023 at 1:12 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the KIP. A few comments.
>
> 10. remote.log.manager.write.quota.default:
> 10.1 For other configs, we
> use replica.alter.log.dirs.io.max.bytes.per.second. To be consistent,
> perhaps this can be sth like remote.log.manager.write.max.bytes.per.second.
>

This makes sense, we can rename the following configs to be consistent.

Remote.log.manager.write.quota.default ->
remote.log.manager.write.max.bytes.per.second

Remote.log.manager.read.quota.default ->
remote.log.manager.read.max.bytes.per.second.



> 10.2 Could we list the new metrics associated with the new quota.
>

We will add the following metrics as mentioned in the other response.
*RemoteFetchThrottleTime* - The amount of time needed to bring the observed
remote fetch rate within the read quota
*RemoteCopyThrottleTime *- The amount of time needed to bring the observed
remote copy rate with the copy quota.

10.3 Is this dynamically configurable? If so, could we document the impact
> to tools like kafka-configs.sh and AdminClient?
>

Yes, the quotas are dynamically configurable. We will add them as Dynamic
Broker Configs. Users will be able to change
the following configs using either kafka-configs.sh or AdminClient by
specifying the config name and new value. For eg.

Using kafka-configs.sh

bin/kafka-configs.sh --bootstrap-server  --entity-type
brokers --entity-default --alter --add-config
remote.log.manager.write.max.bytes.per.second=52428800

Using AdminClient

ConfigEntry configEntry = new
ConfigEntry("remote.log.manager.write.max.bytes.per.second", "5242800");
AlterConfigOp alterConfigOp = new AlterConfigOp(configEntry,
AlterConfigOp.OpType.SET);
List alterConfigOps =
Collections.singletonList(alterConfigOp);

ConfigResource resource = new ConfigResource(ConfigResource.Type.BROKER,
"");
Map> updateConfig =
ImmutableMap.of(resource, alterConfigOps);
adminClient.incrementalAlterConfigs(updateConfig);


>
> Jun
>
> On Tue, Nov 28, 2023 at 2:19 AM Luke Chen  wrote:
>
> > Hi Abhijeet,
> >
> > Thanks for the KIP!
> > This is an important feature for tiered storage.
> >
> > Some comments:
> > 1. Will we introduce new metrics for this tiered storage quotas?
> > This is important because the admin can know the throttling status by
> > checking the metrics while the remote write/read are slow, like the rate
> of
> > uploading/reading byte rate, the throttled time for upload/read... etc.
> >
> > 2. Could you give some examples for the throttling algorithm in the KIP
> to
> > explain it? That will make it much clearer.
> >
> > 3. To solve this problem, we can break down the RLMTask into two smaller
> > tasks - one for segment upload and the other for handling expired
> segments.
> > How do we handle the situation when a segment is still waiting for
> > offloading while this segment is expired and eligible to be deleted?
> > Maybe it'll be easier to not block the RLMTask when quota exceeded, and
> > just check it each time the RLMTask runs?
> >
> > Thank you.
> > Luke
> >
> > On Wed, Nov 22, 2023 at 6:27 PM Abhijeet Kumar <
> abhijeet.cse@gmail.com
> > >
> > wrote:
> >
> > > Hi All,
> > >
> > > I have created KIP-956 for defining read and write quota for tiered
> > > storage.
> > >
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas
> > >
> > > Feedback and suggestions are welcome.
> > >
> > > Regards,
> > > Abhijeet.
> > >
> >
>


-- 
Abhijeet.


Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-02-27 Thread Abhijeet Kumar
Hi Jun,

The existing quota system for consumers is designed to throttle the
consumer if it exceeds the allowed fetch rate.
The additional quota we want to add works on the broker level. If the
broker-level remote read quota is being
exceeded, we prevent additional reads from the remote storage but do not
prevent local reads for the consumer.
If the consumer has specified other partitions to read, which can be served
from local, it can continue to read those
partitions. To elaborate more, we make a check for quota exceeded if we
know a segment needs to be read from
remote. If the quota is exceeded, we simply skip the partition and move to
other segments in the fetch request.
This way consumers can continue to read the local data as long as they have
not exceeded the client-level quota.

Also, when we choose the appropriate consumer-level quota, we would
typically look at what kind of local fetch
throughput is supported. If we were to reuse the same consumer quota, we
should also consider the throughput
the remote storage supports. The throughput supported by remote may be
less/more than the throughput supported
by local (when using a cloud provider, it depends on the plan opted by the
user). The consumer quota has to be carefully
set considering both local and remote throughput. Instead, if we have a
separate quota, it makes things much simpler
for the user, since they already know what throughput their remote storage
supports.

(Also, thanks for pointing out. I will update the KIP based on the
discussion)

Regards,
Abhijeet.

On Tue, Feb 27, 2024 at 2:49 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Sorry for the late reply. It seems that you haven't updated the KIP based
> on the discussion? One more comment.
>
> 11. Currently, we already have a quota system for both the producers and
> consumers. I can understand why we need an additional
> remote.log.manager.write.quota.default quota. For example, when tier
> storage is enabled for the first time, there could be a lot of segments
> that need to be written to the remote storage, even though there is no
> increase in the produced data. However, I am not sure about an
> additional remote.log.manager.read.quota.default. The KIP says that the
> reason is "This may happen when the majority of the consumers start reading
> from the earliest offset of their respective Kafka topics.". However, this
> can happen with or without tier storage and the current quota system for
> consumers is designed for solving this exact problem. Could you explain the
> usage of this additional quota?
>
> Thanks,
>
> Jun
>
> On Mon, Feb 12, 2024 at 11:08 AM Abhijeet Kumar <
> abhijeet.cse@gmail.com>
> wrote:
>
> > Comments inline
> >
> > On Wed, Dec 6, 2023 at 1:12 AM Jun Rao  wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the KIP. A few comments.
> > >
> > > 10. remote.log.manager.write.quota.default:
> > > 10.1 For other configs, we
> > > use replica.alter.log.dirs.io.max.bytes.per.second. To be consistent,
> > > perhaps this can be sth like
> > remote.log.manager.write.max.bytes.per.second.
> > >
> >
> > This makes sense, we can rename the following configs to be consistent.
> >
> > Remote.log.manager.write.quota.default ->
> > remote.log.manager.write.max.bytes.per.second
> >
> > Remote.log.manager.read.quota.default ->
> > remote.log.manager.read.max.bytes.per.second.
> >
> >
> >
> > > 10.2 Could we list the new metrics associated with the new quota.
> > >
> >
> > We will add the following metrics as mentioned in the other response.
> > *RemoteFetchThrottleTime* - The amount of time needed to bring the
> observed
> > remote fetch rate within the read quota
> > *RemoteCopyThrottleTime *- The amount of time needed to bring the
> observed
> > remote copy rate with the copy quota.
> >
> > 10.3 Is this dynamically configurable? If so, could we document the
> impact
> > > to tools like kafka-configs.sh and AdminClient?
> > >
> >
> > Yes, the quotas are dynamically configurable. We will add them as Dynamic
> > Broker Configs. Users will be able to change
> > the following configs using either kafka-configs.sh or AdminClient by
> > specifying the config name and new value. For eg.
> >
> > Using kafka-configs.sh
> >
> > bin/kafka-configs.sh --bootstrap-server  --entity-type
> > brokers --entity-default --alter --add-config
> > remote.log.manager.write.max.bytes.per.second=52428800
> >
> > Using AdminClient
> >
> > ConfigEntry configEntry = new
> > ConfigEntry("remote.log.manager.write.max.bytes.per.second",

Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-02-28 Thread Abhijeet Kumar
Hi Jun,

Clarified the meaning of the two metrics. Also updated the KIP.

kafka.log.remote:type=RemoteLogManager, name=RemoteFetchThrottleTime -> The
duration of time required at a given moment to bring the observed fetch
rate within the allowed limit, by preventing further reads.
kafka.log.remote:type=RemoteLogManager, name=RemoteCopyThrottleTime -> The
duration of time required at a given moment to bring the observed remote
copy rate within the allowed limit, by preventing further copies.

Regards.

On Wed, Feb 28, 2024 at 12:28 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the explanation. Makes sense to me now.
>
> Just a minor comment. Could you document the exact meaning of the following
> two metrics? For example, is the time accumulated? If so, is it from the
> start of the broker or over some window?
>
> kafka.log.remote:type=RemoteLogManager, name=RemoteFetchThrottleTime
> kafka.log.remote:type=RemoteLogManager, name=RemoteCopyThrottleTime
>
> Jun
>
> On Tue, Feb 27, 2024 at 1:39 AM Abhijeet Kumar  >
> wrote:
>
> > Hi Jun,
> >
> > The existing quota system for consumers is designed to throttle the
> > consumer if it exceeds the allowed fetch rate.
> > The additional quota we want to add works on the broker level. If the
> > broker-level remote read quota is being
> > exceeded, we prevent additional reads from the remote storage but do not
> > prevent local reads for the consumer.
> > If the consumer has specified other partitions to read, which can be
> served
> > from local, it can continue to read those
> > partitions. To elaborate more, we make a check for quota exceeded if we
> > know a segment needs to be read from
> > remote. If the quota is exceeded, we simply skip the partition and move
> to
> > other segments in the fetch request.
> > This way consumers can continue to read the local data as long as they
> have
> > not exceeded the client-level quota.
> >
> > Also, when we choose the appropriate consumer-level quota, we would
> > typically look at what kind of local fetch
> > throughput is supported. If we were to reuse the same consumer quota, we
> > should also consider the throughput
> > the remote storage supports. The throughput supported by remote may be
> > less/more than the throughput supported
> > by local (when using a cloud provider, it depends on the plan opted by
> the
> > user). The consumer quota has to be carefully
> > set considering both local and remote throughput. Instead, if we have a
> > separate quota, it makes things much simpler
> > for the user, since they already know what throughput their remote
> storage
> > supports.
> >
> > (Also, thanks for pointing out. I will update the KIP based on the
> > discussion)
> >
> > Regards,
> > Abhijeet.
> >
> > On Tue, Feb 27, 2024 at 2:49 AM Jun Rao 
> wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Sorry for the late reply. It seems that you haven't updated the KIP
> based
> > > on the discussion? One more comment.
> > >
> > > 11. Currently, we already have a quota system for both the producers
> and
> > > consumers. I can understand why we need an additional
> > > remote.log.manager.write.quota.default quota. For example, when tier
> > > storage is enabled for the first time, there could be a lot of segments
> > > that need to be written to the remote storage, even though there is no
> > > increase in the produced data. However, I am not sure about an
> > > additional remote.log.manager.read.quota.default. The KIP says that the
> > > reason is "This may happen when the majority of the consumers start
> > reading
> > > from the earliest offset of their respective Kafka topics.". However,
> > this
> > > can happen with or without tier storage and the current quota system
> for
> > > consumers is designed for solving this exact problem. Could you explain
> > the
> > > usage of this additional quota?
> > >
> > > Thanks,
> > >
> > > Jun
> > >
> > > On Mon, Feb 12, 2024 at 11:08 AM Abhijeet Kumar <
> > > abhijeet.cse@gmail.com>
> > > wrote:
> > >
> > > > Comments inline
> > > >
> > > > On Wed, Dec 6, 2023 at 1:12 AM Jun Rao 
> > wrote:
> > > >
> > > > > Hi, Abhijeet,
> > > > >
> > > > > Thanks for the KIP. A few comments.
> > > > >
> > > > > 10. remote.log.manager.write.quota.default:
> > > &

Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-03-02 Thread Abhijeet Kumar
Hi Jun,

Thanks for pointing it out. It makes sense to me. We can have the following
metrics instead. What do you think?

   - remote-(fetch|copy)-throttle-time-avg (The average time in ms remote
   fetches/copies was throttled by a broker)
   - remote-(fetch|copy)-throttle-time--max (The maximum time in ms remote
   fetches/copies was throttled by a broker)

These are similar to fetch-throttle-time-avg and fetch-throttle-time-max
metrics we have for Kafak Consumers?
The Avg and Max are computed over the (sliding) window as defined by the
configuration metrics.sample.window.ms on the server.

(Also, I will update the config and metric names to be consistent)

Regards.

On Thu, Feb 29, 2024 at 2:51 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the reply.
>
> The issue with recording the throttle time as a gauge is that it's
> transient. If the metric is not read immediately, the recorded value could
> be reset to 0. The admin won't realize that throttling has happened.
>
> For client quotas, the throttle time is tracked as the average
> throttle-time per user/client-id. This makes the metric less transient.
>
> Also, the configs use read/write whereas the metrics use fetch/copy. Could
> we make them consistent?
>
> Jun
>
> On Wed, Feb 28, 2024 at 6:49 AM Abhijeet Kumar  >
> wrote:
>
> > Hi Jun,
> >
> > Clarified the meaning of the two metrics. Also updated the KIP.
> >
> > kafka.log.remote:type=RemoteLogManager, name=RemoteFetchThrottleTime ->
> The
> > duration of time required at a given moment to bring the observed fetch
> > rate within the allowed limit, by preventing further reads.
> > kafka.log.remote:type=RemoteLogManager, name=RemoteCopyThrottleTime ->
> The
> > duration of time required at a given moment to bring the observed remote
> > copy rate within the allowed limit, by preventing further copies.
> >
> > Regards.
> >
> > On Wed, Feb 28, 2024 at 12:28 AM Jun Rao 
> wrote:
> >
> > > Hi, Abhijeet,
> > >
> > > Thanks for the explanation. Makes sense to me now.
> > >
> > > Just a minor comment. Could you document the exact meaning of the
> > following
> > > two metrics? For example, is the time accumulated? If so, is it from
> the
> > > start of the broker or over some window?
> > >
> > > kafka.log.remote:type=RemoteLogManager, name=RemoteFetchThrottleTime
> > > kafka.log.remote:type=RemoteLogManager, name=RemoteCopyThrottleTime
> > >
> > > Jun
> > >
> > > On Tue, Feb 27, 2024 at 1:39 AM Abhijeet Kumar <
> > abhijeet.cse@gmail.com
> > > >
> > > wrote:
> > >
> > > > Hi Jun,
> > > >
> > > > The existing quota system for consumers is designed to throttle the
> > > > consumer if it exceeds the allowed fetch rate.
> > > > The additional quota we want to add works on the broker level. If the
> > > > broker-level remote read quota is being
> > > > exceeded, we prevent additional reads from the remote storage but do
> > not
> > > > prevent local reads for the consumer.
> > > > If the consumer has specified other partitions to read, which can be
> > > served
> > > > from local, it can continue to read those
> > > > partitions. To elaborate more, we make a check for quota exceeded if
> we
> > > > know a segment needs to be read from
> > > > remote. If the quota is exceeded, we simply skip the partition and
> move
> > > to
> > > > other segments in the fetch request.
> > > > This way consumers can continue to read the local data as long as
> they
> > > have
> > > > not exceeded the client-level quota.
> > > >
> > > > Also, when we choose the appropriate consumer-level quota, we would
> > > > typically look at what kind of local fetch
> > > > throughput is supported. If we were to reuse the same consumer quota,
> > we
> > > > should also consider the throughput
> > > > the remote storage supports. The throughput supported by remote may
> be
> > > > less/more than the throughput supported
> > > > by local (when using a cloud provider, it depends on the plan opted
> by
> > > the
> > > > user). The consumer quota has to be carefully
> > > > set considering both local and remote throughput. Instead, if we
> have a
> > > > separate quota, it makes things much simpler
> > > > for the user, since they already know what throughput their remote
> > > storage
>

Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-03-08 Thread Abhijeet Kumar
Thank you all for your comments. As all the comments in the thread are
addressed, I am starting a Vote thread for the KIP. Please have a look.

Regards.

On Thu, Mar 7, 2024 at 12:34 PM Luke Chen  wrote:

> Hi Abhijeet,
>
> Thanks for the update and the explanation.
> I had another look, and it LGTM now!
>
> Thanks.
> Luke
>
> On Tue, Mar 5, 2024 at 2:50 AM Jun Rao  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the reply. Sounds good to me.
> >
> > Jun
> >
> >
> > On Sat, Mar 2, 2024 at 7:40 PM Abhijeet Kumar <
> abhijeet.cse@gmail.com>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Thanks for pointing it out. It makes sense to me. We can have the
> > following
> > > metrics instead. What do you think?
> > >
> > >- remote-(fetch|copy)-throttle-time-avg (The average time in ms
> remote
> > >fetches/copies was throttled by a broker)
> > >- remote-(fetch|copy)-throttle-time--max (The maximum time in ms
> > remote
> > >fetches/copies was throttled by a broker)
> > >
> > > These are similar to fetch-throttle-time-avg and
> fetch-throttle-time-max
> > > metrics we have for Kafak Consumers?
> > > The Avg and Max are computed over the (sliding) window as defined by
> the
> > > configuration metrics.sample.window.ms on the server.
> > >
> > > (Also, I will update the config and metric names to be consistent)
> > >
> > > Regards.
> > >
> > > On Thu, Feb 29, 2024 at 2:51 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Abhijeet,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > The issue with recording the throttle time as a gauge is that it's
> > > > transient. If the metric is not read immediately, the recorded value
> > > could
> > > > be reset to 0. The admin won't realize that throttling has happened.
> > > >
> > > > For client quotas, the throttle time is tracked as the average
> > > > throttle-time per user/client-id. This makes the metric less
> transient.
> > > >
> > > > Also, the configs use read/write whereas the metrics use fetch/copy.
> > > Could
> > > > we make them consistent?
> > > >
> > > > Jun
> > > >
> > > > On Wed, Feb 28, 2024 at 6:49 AM Abhijeet Kumar <
> > > abhijeet.cse@gmail.com
> > > > >
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Clarified the meaning of the two metrics. Also updated the KIP.
> > > > >
> > > > > kafka.log.remote:type=RemoteLogManager,
> name=RemoteFetchThrottleTime
> > ->
> > > > The
> > > > > duration of time required at a given moment to bring the observed
> > fetch
> > > > > rate within the allowed limit, by preventing further reads.
> > > > > kafka.log.remote:type=RemoteLogManager, name=RemoteCopyThrottleTime
> > ->
> > > > The
> > > > > duration of time required at a given moment to bring the observed
> > > remote
> > > > > copy rate within the allowed limit, by preventing further copies.
> > > > >
> > > > > Regards.
> > > > >
> > > > > On Wed, Feb 28, 2024 at 12:28 AM Jun Rao  >
> > > > wrote:
> > > > >
> > > > > > Hi, Abhijeet,
> > > > > >
> > > > > > Thanks for the explanation. Makes sense to me now.
> > > > > >
> > > > > > Just a minor comment. Could you document the exact meaning of the
> > > > > following
> > > > > > two metrics? For example, is the time accumulated? If so, is it
> > from
> > > > the
> > > > > > start of the broker or over some window?
> > > > > >
> > > > > > kafka.log.remote:type=RemoteLogManager,
> > name=RemoteFetchThrottleTime
> > > > > > kafka.log.remote:type=RemoteLogManager,
> name=RemoteCopyThrottleTime
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Tue, Feb 27, 2024 at 1:39 AM Abhijeet Kumar <
> > > > > abhijeet.cse@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >

[VOTE] KIP-956: Tiered Storage Quotas

2024-03-08 Thread Abhijeet Kumar
Hi All,

I would like to start the vote for KIP-956 - Tiered Storage Quotas

The KIP is here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas

Regards.
Abhijeet.


Re: [DISCUSS] KIP-956: Tiered Storage Quotas

2024-03-16 Thread Abhijeet Kumar
Hi Jorge,

The configs name was chosen to keep it consistent with the other existing
quota configs, such as
*replica.alter.log.dirs.io.max.bytes.per.second* as pointed out by Jun in
the thread.

Also, we can revisit the names of the components during implementation,
since those are not exposed to the user.

Please let me know if you have any further concerns.

Regards,
Abhijeet.



On Mon, Mar 11, 2024 at 6:11 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi Abhijeet,
>
> Thanks for the KIP! Looks good to me. I just have a minor comments on
> naming:
>
> Would it be work to align the config names to existing quota names?
> e.g. `remote.log.manager.copy.byte.rate.quota` (or similar) instead of
> `remote.log.manager.copy.max.bytes.per.second`?
>
> Same for new components, could we use the same verbs as in the configs:
> - RLMCopyQuotaManager
> - RLMFetchQuotaManager
>
>
> On Fri, 8 Mar 2024 at 13:43, Abhijeet Kumar 
> wrote:
>
> > Thank you all for your comments. As all the comments in the thread are
> > addressed, I am starting a Vote thread for the KIP. Please have a look.
> >
> > Regards.
> >
> > On Thu, Mar 7, 2024 at 12:34 PM Luke Chen  wrote:
> >
> > > Hi Abhijeet,
> > >
> > > Thanks for the update and the explanation.
> > > I had another look, and it LGTM now!
> > >
> > > Thanks.
> > > Luke
> > >
> > > On Tue, Mar 5, 2024 at 2:50 AM Jun Rao 
> wrote:
> > >
> > > > Hi, Abhijeet,
> > > >
> > > > Thanks for the reply. Sounds good to me.
> > > >
> > > > Jun
> > > >
> > > >
> > > > On Sat, Mar 2, 2024 at 7:40 PM Abhijeet Kumar <
> > > abhijeet.cse@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Jun,
> > > > >
> > > > > Thanks for pointing it out. It makes sense to me. We can have the
> > > > following
> > > > > metrics instead. What do you think?
> > > > >
> > > > >- remote-(fetch|copy)-throttle-time-avg (The average time in ms
> > > remote
> > > > >fetches/copies was throttled by a broker)
> > > > >- remote-(fetch|copy)-throttle-time--max (The maximum time in ms
> > > > remote
> > > > >fetches/copies was throttled by a broker)
> > > > >
> > > > > These are similar to fetch-throttle-time-avg and
> > > fetch-throttle-time-max
> > > > > metrics we have for Kafak Consumers?
> > > > > The Avg and Max are computed over the (sliding) window as defined
> by
> > > the
> > > > > configuration metrics.sample.window.ms on the server.
> > > > >
> > > > > (Also, I will update the config and metric names to be consistent)
> > > > >
> > > > > Regards.
> > > > >
> > > > > On Thu, Feb 29, 2024 at 2:51 AM Jun Rao 
> > > > wrote:
> > > > >
> > > > > > Hi, Abhijeet,
> > > > > >
> > > > > > Thanks for the reply.
> > > > > >
> > > > > > The issue with recording the throttle time as a gauge is that
> it's
> > > > > > transient. If the metric is not read immediately, the recorded
> > value
> > > > > could
> > > > > > be reset to 0. The admin won't realize that throttling has
> > happened.
> > > > > >
> > > > > > For client quotas, the throttle time is tracked as the average
> > > > > > throttle-time per user/client-id. This makes the metric less
> > > transient.
> > > > > >
> > > > > > Also, the configs use read/write whereas the metrics use
> > fetch/copy.
> > > > > Could
> > > > > > we make them consistent?
> > > > > >
> > > > > > Jun
> > > > > >
> > > > > > On Wed, Feb 28, 2024 at 6:49 AM Abhijeet Kumar <
> > > > > abhijeet.cse@gmail.com
> > > > > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Jun,
> > > > > > >
> > > > > > > Clarified the meaning of the two metrics. Also updated the KIP.
> > > > > > >
> > > > > > > kafka.log.remote:type=RemoteLogManager,
> > > name=RemoteFetchThrottleTime
> >

Re: [VOTE] KIP-956: Tiered Storage Quotas

2024-03-19 Thread Abhijeet Kumar
Hi All,

This KIP is accepted with 3 +1 binding votes(Jun, Satish, Luke) and 2 +1
non-binding votes(Kamal, Jorge).

Thank you all for voting.

Regards.
Abhijeet.



On Tue, Mar 19, 2024 at 3:35 PM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Thanks Abhjeet! Looking forward for this one.
> +1 (non-binding).
>
> On Thu, 14 Mar 2024 at 06:08, Luke Chen  wrote:
>
> > Thanks for the KIP!
> > +1 from me.
> >
> > Luke
> >
> > On Sun, Mar 10, 2024 at 8:44 AM Satish Duggana  >
> > wrote:
> >
> > > Thanks Abhijeet for the KIP, +1 from me.
> > >
> > >
> > > On Sat, 9 Mar 2024 at 1:51 AM, Kamal Chandraprakash <
> > > kamal.chandraprak...@gmail.com> wrote:
> > >
> > > > +1 (non-binding), Thanks for the KIP, Abhijeet!
> > > >
> > > > --
> > > > Kamal
> > > >
> > > > On Fri, Mar 8, 2024 at 11:02 PM Jun Rao 
> > > wrote:
> > > >
> > > > > Hi, Abhijeet,
> > > > >
> > > > > Thanks for the KIP. +1
> > > > >
> > > > > Jun
> > > > >
> > > > > On Fri, Mar 8, 2024 at 3:44 AM Abhijeet Kumar <
> > > > abhijeet.cse@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I would like to start the vote for KIP-956 - Tiered Storage
> Quotas
> > > > > >
> > > > > > The KIP is here:
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-956+Tiered+Storage+Quotas
> > > > > >
> > > > > > Regards.
> > > > > > Abhijeet.
> > > > > >
> > > > >
> > > >
> > >
> >
>


[DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-03-27 Thread Abhijeet Kumar
Hi All,

I have created KIP-1023 to introduce follower fetch from tiered offset.
This feature will be helpful in significantly reducing Kafka
rebalance/rebuild times when the cluster is enabled with tiered storage.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset

Feedback and suggestions are welcome.

Regards,
Abhijeet.


Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-03-30 Thread Abhijeet Kumar
Hi Christo,

Thanks for reviewing the KIP.

The follower needs the earliest-pending-upload-offset (and the
corresponding leader epoch) from the leader.
This is the first offset the follower will have locally.

Regards,
Abhijeet.



On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
wrote:

> Heya!
>
> First of all, thank you very much for the proposal, you have explained the
> problem you want solved very well - I think a faster bootstrap of an empty
> replica is definitely an improvement!
>
> For my understanding, which concrete offset do you want the leader to give
> back to a follower - earliest-pending-upload-offset or the
> latest-tiered-offset? If it is the second, then I believe KIP-1005 ought to
> already be exposing that offset as part of the ListOffsets API, no?
>
> Best,
> Christo
>
> On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar 
> wrote:
>
> > Hi All,
> >
> > I have created KIP-1023 to introduce follower fetch from tiered offset.
> > This feature will be helpful in significantly reducing Kafka
> > rebalance/rebuild times when the cluster is enabled with tiered storage.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
> >
> > Feedback and suggestions are welcome.
> >
> > Regards,
> > Abhijeet.
> >
>


Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-08 Thread Abhijeet Kumar
Hi Jun,

Thank you for taking the time to review the KIP. Please find my comments
inline.

On Fri, Apr 5, 2024 at 12:09 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the KIP. Left a few comments.
>
> 1. "A drawback of using the last-tiered-offset is that this new follower
> would possess only a limited number of locally stored segments. Should it
> ascend to the role of leader, there is a risk of needing to fetch these
> segments from the remote storage, potentially impacting broker
> performance."
> Since we support consumers fetching from followers, this is a potential
> issue on the follower side too. In theory, it's possible for a segment to
> be tiered immediately after rolling. In that case, there could be very
> little data after last-tiered-offset. It would be useful to think through
> how to address this issue.
>

We plan to have a follow-up KIP that will address both the deprioritization
of these brokers from leadership, as well as
for consumption (when fetching from followers is allowed).


>
> 2. ListOffsetsRequest:
> 2.1 Typically, we need to bump up the version of the request if we add a
> new value for timestamp. See
>
> https://github.com/apache/kafka/pull/10760/files#diff-fac7080d67da905a80126d58fc1745c9a1409de7ef7d093c2ac66a888b134633
> .
>

Yes, let me update the KIP to include this change. We will need a new
timestamp corresponding to Earliest-Pending-Upload-Offset.


> 2.2 Since this changes the inter broker request protocol, it would be
> useful to have a section on upgrade (e.g. new IBP/metadata.version).
>
> Make sense. I will update the KIP to capture this.


> 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch the
> last-tiered-offset from the leader, and make a separate leader call to
> fetch leader epoch for the following offset."
> Why do we need to make a separate call for the leader epoch?
> ListOffsetsResponse include both the offset and the corresponding epoch.
>
> I understand there is some confusion here. Let me try to explain this.

The follower needs to build the local data starting from the offset
Earliest-Pending-Upload-Offset. Hence it needs the offset and the
corresponding leader-epoch.
There are two ways to do this:
   1. We add support in ListOffsetRequest to be able to fetch this offset
(and leader epoch) from the leader.
   2. Or, fetch the tiered-offset (which is already supported). From this
offset, we can get the Earliest-Pending-Upload-Offset. We can just add 1 to
the tiered-offset.
  However, we still need the leader epoch for offset, since there is no
guarantee that the leader epoch for Earliest-Pending-Upload-Offset will be
the same as the
  leader epoch for tiered-offset. We may need another API call to the
leader for this.

I prefer the first approach. The only problem with the first approach is
that it introduces one more offset. The second approach avoids this problem
but is a little complicated.


> 4. "Check if the follower replica is empty and if the feature to use
> last-tiered-offset is enabled."
> Why do we need to check if the follower replica is empty?
>
>
We want to limit this new behavior only to new replicas. Replicas that
become out of ISR are excluded from this behavior change. Those will
continue with the existing behavior.


> 5. It can be confirmed by checking if the leader's Log-Start-Offset is the
> same as the Leader's Local-Log-Start-Offset.
> How does the follower know Local-Log-Start-Offset?
>

Missed this detail. The follower will need to call the leader APi to fetch
the EarliestLocal offset for this.


> Jun
>
> On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar  >
> wrote:
>
> > Hi Christo,
> >
> > Thanks for reviewing the KIP.
> >
> > The follower needs the earliest-pending-upload-offset (and the
> > corresponding leader epoch) from the leader.
> > This is the first offset the follower will have locally.
> >
> > Regards,
> > Abhijeet.
> >
> >
> >
> > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
> > wrote:
> >
> > > Heya!
> > >
> > > First of all, thank you very much for the proposal, you have explained
> > the
> > > problem you want solved very well - I think a faster bootstrap of an
> > empty
> > > replica is definitely an improvement!
> > >
> > > For my understanding, which concrete offset do you want the leader to
> > give
> > > back to a follower - earliest-pending-upload-offset or the
> > > latest-tiered-offset? If it is the second, then I believe KIP-1005
> ought
> > to
> > > already be exposing that offset as part of the ListOffsets API, no?
> > >
> > > Best,
> 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-08 Thread Abhijeet Kumar
 upgrade (e.g. new IBP/metadata.version).
> >
> > 3. "Instead of fetching Earliest-Pending-Upload-Offset, it could fetch
> the
> > last-tiered-offset from the leader, and make a separate leader call to
> > fetch leader epoch for the following offset."
> > Why do we need to make a separate call for the leader epoch?
> > ListOffsetsResponse include both the offset and the corresponding epoch.
> >
> > 4. "Check if the follower replica is empty and if the feature to use
> > last-tiered-offset is enabled."
> > Why do we need to check if the follower replica is empty?
> >
> > 5. It can be confirmed by checking if the leader's Log-Start-Offset is
> the
> > same as the Leader's Local-Log-Start-Offset.
> > How does the follower know Local-Log-Start-Offset?
> >
> > Jun
> >
> > On Sat, Mar 30, 2024 at 5:51 AM Abhijeet Kumar <
> abhijeet.cse@gmail.com
> > >
> > wrote:
> >
> > > Hi Christo,
> > >
> > > Thanks for reviewing the KIP.
> > >
> > > The follower needs the earliest-pending-upload-offset (and the
> > > corresponding leader epoch) from the leader.
> > > This is the first offset the follower will have locally.
> > >
> > > Regards,
> > > Abhijeet.
> > >
> > >
> > >
> > > On Fri, Mar 29, 2024 at 1:14 PM Christo Lolov 
> > > wrote:
> > >
> > > > Heya!
> > > >
> > > > First of all, thank you very much for the proposal, you have
> explained
> > > the
> > > > problem you want solved very well - I think a faster bootstrap of an
> > > empty
> > > > replica is definitely an improvement!
> > > >
> > > > For my understanding, which concrete offset do you want the leader to
> > > give
> > > > back to a follower - earliest-pending-upload-offset or the
> > > > latest-tiered-offset? If it is the second, then I believe KIP-1005
> > ought
> > > to
> > > > already be exposing that offset as part of the ListOffsets API, no?
> > > >
> > > > Best,
> > > > Christo
> > > >
> > > > On Wed, 27 Mar 2024 at 18:23, Abhijeet Kumar <
> > abhijeet.cse@gmail.com
> > > >
> > > > wrote:
> > > >
> > > > > Hi All,
> > > > >
> > > > > I have created KIP-1023 to introduce follower fetch from tiered
> > offset.
> > > > > This feature will be helpful in significantly reducing Kafka
> > > > > rebalance/rebuild times when the cluster is enabled with tiered
> > > storage.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset
> > > > >
> > > > > Feedback and suggestions are welcome.
> > > > >
> > > > > Regards,
> > > > > Abhijeet.
> > > > >
> > > >
> > >
> >
>


Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-22 Thread Abhijeet Kumar
Hi Luke,

Thanks for your comments. Please find my responses inline.

On Tue, Apr 9, 2024 at 2:08 PM Luke Chen  wrote:

> Hi Abhijeet,
>
> Thanks for the KIP to improve the tiered storage feature!
>
> Questions:
> 1. We could also get the "pending-upload-offset" and epoch via remote log
> metadata, instead of adding a new API to fetch from the leader. Could you
> explain why you choose the later approach, instead of the former?
>

The remote log metadata could be tracking overlapping log segments. The
maximum offset
across the log segments it may be tracking, may not be the
last-tiered-offset because of truncations
during unclean leader election. Remote Log metadata alone is not sufficient
to identify last-tiered-offset or
pending-upload-offset.

Only the leader knows the correct lineage of offsets that is required to
identify the "pending-upload-offset".
That is the reason we chose to add a new API to fetch this information from
the leader.


2.
> > We plan to have a follow-up KIP that will address both the
> deprioritization
> of these brokers from leadership, as well as
> for consumption (when fetching from followers is allowed).
>
> I agree with Jun that we might need to make it clear all possible drawbacks
> that could have. So, could we add the drawbacks that Jun mentioned about
> the performance issue when consumer fetch from follower?
>
>
Updated the KIP to call this out.


> 3. Could we add "Rejected Alternatives" section to the end of the KIP to
> add some of them?
> Like the "ListOffsetRequest" approach VS "Earliest-Pending-Upload-Offset"
> approach, or getting the "Earliest-Pending-Upload-Offset" from remote log
> metadata... etc.
>
> Added the section on Rejected Alternatives


> Thanks.
> Luke
>
>
> On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar 
> wrote:
>
> > Hi Christo,
> >
> > Please find my comments inline.
> >
> > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov 
> > wrote:
> >
> > > Hello Abhijeet and Jun,
> > >
> > > I have been mulling this KIP over a bit more in recent days!
> > >
> > > re: Jun
> > >
> > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps - in
> > > retrospect it should have been fairly obvious. I would need to go an
> > update
> > > KIP-1005 myself then, thank you for giving the useful reference!
> > >
> > > 4. I think Abhijeet wants to rebuild state from latest-tiered-offset
> and
> > > fetch from latest-tiered-offset + 1 only for new replicas (or replicas
> > > which experienced a disk failure) to decrease the time a partition
> spends
> > > in under-replicated state. In other words, a follower which has just
> > fallen
> > > out of ISR, but has local data will continue using today's Tiered
> Storage
> > > replication protocol (i.e. fetching from earliest-local). I further
> > believe
> > > he has taken this approach so that local state of replicas which have
> > just
> > > fallen out of ISR isn't forcefully wiped thus leading to situation 1.
> > > Abhijeet, have I understood (and summarised) what you are proposing
> > > correctly?
> > >
> > > Yes, your understanding is correct. We want to limit the behavior
> changes
> > only to new replicas.
> >
> >
> > > 5. I think in today's Tiered Storage we know the leader's
> > log-start-offset
> > > from the FetchResponse and we can learn its local-log-start-offset from
> > the
> > > ListOffsets by asking for earliest-local timestamp (-4). But granted,
> > this
> > > ought to be added as an additional API call in the KIP.
> > >
> > >
> > Yes, I clarified this in my reply to Jun. I will add this missing detail
> in
> > the KIP.
> >
> >
> > > re: Abhijeet
> > >
> > > 101. I am still a bit confused as to why you want to include a new
> offset
> > > (i.e. pending-upload-offset) when you yourself mention that you could
> use
> > > an already existing offset (i.e. last-tiered-offset + 1). In essence,
> you
> > > end your Motivation with "In this KIP, we will focus only on the
> follower
> > > fetch protocol using the *last-tiered-offset*" and then in the
> following
> > > sections you talk about pending-upload-offset. I understand this might
> be
> > > classified as an implementation detail, but if you introduce a new
> offset
> > > (i.e. pending-upload-offset) you have to make a change to the
> ListOffsets
> > > 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-22 Thread Abhijeet Kumar
Hi Jun,

Please find my comments inline.


On Thu, Apr 18, 2024 at 3:26 AM Jun Rao  wrote:

> Hi, Abhijeet,
>
> Thanks for the reply.
>
> 1. I am wondering if we could achieve the same result by just lowering
> local.retention.ms and local.retention.bytes. This also allows the newly
> started follower to build up the local data before serving the consumer
> traffic.
>

I am not sure I fully followed this. Do you mean we could lower the
local.retention (by size and time)
so that there is little data on the leader's local storage so that the
follower can quickly catch up with the leader?

In that case, we will need to set small local retention across brokers in
the cluster. It will have the undesired
effect where there will be increased remote log fetches for serving consume
requests, and this can cause
degradations. Also, this behaviour (of increased remote fetches) will
happen on all brokers at all times, whereas in
the KIP we are restricting the behavior only to the newly bootstrapped
brokers and only until the time it fully builds
the local logs as per retention defined at the cluster level.
(Deprioritization of the broker could help reduce the impact
 even further)


>
> 2. Have you updated the KIP?
>

The KIP has been updated now.


>
> Thanks,
>
> Jun
>
> On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana 
> wrote:
>
> > +1 to Jun for adding the consumer fetching from a follower scenario
> > also to the existing section that talked about the drawback when a
> > node built with last-tiered-offset has become a leader. As Abhijeet
> > mentioned, we plan to have a follow-up KIP that will address these by
> > having a deprioritzation of these brokers. The deprioritization of
> > those brokers can be removed once they catchup until the local log
> > retention.
> >
> > Thanks,
> > Satish.
> >
> > On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> > >
> > > Hi Abhijeet,
> > >
> > > Thanks for the KIP to improve the tiered storage feature!
> > >
> > > Questions:
> > > 1. We could also get the "pending-upload-offset" and epoch via remote
> log
> > > metadata, instead of adding a new API to fetch from the leader. Could
> you
> > > explain why you choose the later approach, instead of the former?
> > > 2.
> > > > We plan to have a follow-up KIP that will address both the
> > > deprioritization
> > > of these brokers from leadership, as well as
> > > for consumption (when fetching from followers is allowed).
> > >
> > > I agree with Jun that we might need to make it clear all possible
> > drawbacks
> > > that could have. So, could we add the drawbacks that Jun mentioned
> about
> > > the performance issue when consumer fetch from follower?
> > >
> > > 3. Could we add "Rejected Alternatives" section to the end of the KIP
> to
> > > add some of them?
> > > Like the "ListOffsetRequest" approach VS
> "Earliest-Pending-Upload-Offset"
> > > approach, or getting the "Earliest-Pending-Upload-Offset" from remote
> log
> > > metadata... etc.
> > >
> > > Thanks.
> > > Luke
> > >
> > >
> > > On Tue, Apr 9, 2024 at 2:25 PM Abhijeet Kumar <
> > abhijeet.cse@gmail.com>
> > > wrote:
> > >
> > > > Hi Christo,
> > > >
> > > > Please find my comments inline.
> > > >
> > > > On Fri, Apr 5, 2024 at 12:36 PM Christo Lolov <
> christolo...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hello Abhijeet and Jun,
> > > > >
> > > > > I have been mulling this KIP over a bit more in recent days!
> > > > >
> > > > > re: Jun
> > > > >
> > > > > I wasn't aware we apply 2.1 and 2.2 for reserving new timestamps -
> in
> > > > > retrospect it should have been fairly obvious. I would need to go
> an
> > > > update
> > > > > KIP-1005 myself then, thank you for giving the useful reference!
> > > > >
> > > > > 4. I think Abhijeet wants to rebuild state from
> latest-tiered-offset
> > and
> > > > > fetch from latest-tiered-offset + 1 only for new replicas (or
> > replicas
> > > > > which experienced a disk failure) to decrease the time a partition
> > spends
> > > > > in under-replicated state. In other words, a follower which has
> just
> > > > fallen
> > > > > out of ISR, but has 

Re: [DISCUSS] KIP-1023: Follower fetch from tiered offset

2024-04-25 Thread Abhijeet Kumar
Thank you all for your comments. As all the comments in the thread are
addressed, I am starting a Vote thread for the KIP. Please have a look.

Regards.
Abhijeet.



On Thu, Apr 25, 2024 at 6:08 PM Luke Chen  wrote:

> Hi, Abhijeet,
>
> Thanks for the update.
>
> I have no more comments.
>
> Luke
>
> On Thu, Apr 25, 2024 at 4:21 AM Jun Rao  wrote:
>
> > Hi, Abhijeet,
> >
> > Thanks for the updated KIP. It looks good to me.
> >
> > Jun
> >
> > On Mon, Apr 22, 2024 at 12:08 PM Abhijeet Kumar <
> > abhijeet.cse@gmail.com>
> > wrote:
> >
> > > Hi Jun,
> > >
> > > Please find my comments inline.
> > >
> > >
> > > On Thu, Apr 18, 2024 at 3:26 AM Jun Rao 
> > wrote:
> > >
> > > > Hi, Abhijeet,
> > > >
> > > > Thanks for the reply.
> > > >
> > > > 1. I am wondering if we could achieve the same result by just
> lowering
> > > > local.retention.ms and local.retention.bytes. This also allows the
> > newly
> > > > started follower to build up the local data before serving the
> consumer
> > > > traffic.
> > > >
> > >
> > > I am not sure I fully followed this. Do you mean we could lower the
> > > local.retention (by size and time)
> > > so that there is little data on the leader's local storage so that the
> > > follower can quickly catch up with the leader?
> > >
> > > In that case, we will need to set small local retention across brokers
> in
> > > the cluster. It will have the undesired
> > > effect where there will be increased remote log fetches for serving
> > consume
> > > requests, and this can cause
> > > degradations. Also, this behaviour (of increased remote fetches) will
> > > happen on all brokers at all times, whereas in
> > > the KIP we are restricting the behavior only to the newly bootstrapped
> > > brokers and only until the time it fully builds
> > > the local logs as per retention defined at the cluster level.
> > > (Deprioritization of the broker could help reduce the impact
> > >  even further)
> > >
> > >
> > > >
> > > > 2. Have you updated the KIP?
> > > >
> > >
> > > The KIP has been updated now.
> > >
> > >
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, Apr 9, 2024 at 3:36 AM Satish Duggana <
> > satish.dugg...@gmail.com>
> > > > wrote:
> > > >
> > > > > +1 to Jun for adding the consumer fetching from a follower scenario
> > > > > also to the existing section that talked about the drawback when a
> > > > > node built with last-tiered-offset has become a leader. As Abhijeet
> > > > > mentioned, we plan to have a follow-up KIP that will address these
> by
> > > > > having a deprioritzation of these brokers. The deprioritization of
> > > > > those brokers can be removed once they catchup until the local log
> > > > > retention.
> > > > >
> > > > > Thanks,
> > > > > Satish.
> > > > >
> > > > > On Tue, 9 Apr 2024 at 14:08, Luke Chen  wrote:
> > > > > >
> > > > > > Hi Abhijeet,
> > > > > >
> > > > > > Thanks for the KIP to improve the tiered storage feature!
> > > > > >
> > > > > > Questions:
> > > > > > 1. We could also get the "pending-upload-offset" and epoch via
> > remote
> > > > log
> > > > > > metadata, instead of adding a new API to fetch from the leader.
> > Could
> > > > you
> > > > > > explain why you choose the later approach, instead of the former?
> > > > > > 2.
> > > > > > > We plan to have a follow-up KIP that will address both the
> > > > > > deprioritization
> > > > > > of these brokers from leadership, as well as
> > > > > > for consumption (when fetching from followers is allowed).
> > > > > >
> > > > > > I agree with Jun that we might need to make it clear all possible
> > > > > drawbacks
> > > > > > that could have. So, could we add the drawbacks that Jun
> mentioned
> > > > about
> > > > > > the performance issue when consumer fetch fr

[VOTE] KIP-1023: Follower fetch from tiered offset

2024-04-25 Thread Abhijeet Kumar
Hi All,

I would like to start the vote for KIP-1023 - Follower fetch from tiered
offset

The KIP is here:
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1023%3A+Follower+fetch+from+tiered+offset

Regards.
Abhijeet.


[DISCUSS] KIP-930: Tiered Storage Metrics

2023-07-23 Thread Abhijeet Kumar
Hi All,

I created KIP-930 for adding RemoteIndexCache stats and also to rename some
tiered storage metrics added as part of KIP-405

to
remove ambiguity.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Tiered+Storage+Metrics

Feedback and suggestions are welcome.

Regards,
Abhijeet.


[DISCUSS] KIP-930: Tiered Storage Metrics

2023-07-24 Thread Abhijeet Kumar
Hi All,

I created KIP-930 for adding RemoteIndexCache stats and also to rename some
tiered storage metrics added as part of KIP-405

to remove ambiguity.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-930%3A+Tiered+Storage+Metrics

Feedback and suggestions are welcome.

Regards,
Abhijeet.


[jira] [Created] (KAFKA-15245) Improve Tiered Storage Metrics

2023-07-24 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-15245:
--

 Summary: Improve Tiered Storage Metrics
 Key: KAFKA-15245
 URL: https://issues.apache.org/jira/browse/KAFKA-15245
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar
Assignee: Abhijeet Kumar


Rename existing tiered storage metrics to remove ambiguity and add metrics for 
the RemoteIndexCache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15260) RLM Task should wait until RLMM is initialized before copying segments to remote

2023-07-27 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-15260:
--

 Summary: RLM Task should wait until RLMM is initialized before 
copying segments to remote
 Key: KAFKA-15260
 URL: https://issues.apache.org/jira/browse/KAFKA-15260
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar


The RLM Task uploads segment to the remote storage for its leader partitions 
and after each upload it sends a message 'COPY_SEGMENT_STARTED' to the Topic 
based RLMM topic and then waits for the TBRLMM to consume the message before 
continuing.

If the RLMM is not initialized, TBRLMM may not be able to consume the message 
within the stipulated time and timeout and RLMM will repeat later. It make take 
a few mins for the TBRLMM to initialize during which RLM Task will keep timing 
out.

Instead the RLM task should wait until RLMM is initialized before attempting to 
copy segments to remote storage.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15261) ReplicaFetcher thread should not block if RLMM is not initialized

2023-07-27 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-15261:
--

 Summary: ReplicaFetcher thread should not block if RLMM is not 
initialized
 Key: KAFKA-15261
 URL: https://issues.apache.org/jira/browse/KAFKA-15261
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar
Assignee: Abhijeet Kumar


While building remote log aux state, the replica fetcher fetches the remote log 
segment metadata. If the TBRLMM is not initialized yet, the call blocks. Since 
replica fetchers share a common lock, it prevents other replica fetchers from 
running as well. Also the same lock is shared in the handle LeaderAndISR 
request path, hence those calls get blocked as well.

Instead, replica fetcher should check if RLMM is initialized before attempting 
to fetch the remote log segment metadata. If RLMM is not initialized, it should 
throw a retryable error so that it can be retried later, and also does not 
block other operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15293) Update metrics doc to add tiered storage metrics

2023-08-02 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-15293:
--

 Summary: Update metrics doc to add tiered storage metrics
 Key: KAFKA-15293
 URL: https://issues.apache.org/jira/browse/KAFKA-15293
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15405) Create a new error code to indicate a resource is not ready yet

2023-08-25 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-15405:
--

 Summary: Create a new error code to indicate a resource is not 
ready yet
 Key: KAFKA-15405
 URL: https://issues.apache.org/jira/browse/KAFKA-15405
 Project: Kafka
  Issue Type: Task
Reporter: Abhijeet Kumar


We need a new error code to indicate to the client that the resource is not 
ready on the server yet and is initializing. When the client receives this 
error it should retry again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15261) ReplicaFetcher thread should not block if RLMM is not initialized

2023-09-05 Thread Abhijeet Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhijeet Kumar resolved KAFKA-15261.

Resolution: Fixed

> ReplicaFetcher thread should not block if RLMM is not initialized
> -
>
> Key: KAFKA-15261
> URL: https://issues.apache.org/jira/browse/KAFKA-15261
> Project: Kafka
>  Issue Type: Sub-task
>    Reporter: Abhijeet Kumar
>        Assignee: Abhijeet Kumar
>Priority: Blocker
> Fix For: 3.6.0
>
>
> While building remote log aux state, the replica fetcher fetches the remote 
> log segment metadata. If the TBRLMM is not initialized yet, the call blocks. 
> Since replica fetchers share a common lock, it prevents other replica 
> fetchers from running as well. Also the same lock is shared in the handle 
> LeaderAndISR request path, hence those calls get blocked as well.
> Instead, replica fetcher should check if RLMM is initialized before 
> attempting to fetch the remote log segment metadata. If RLMM is not 
> initialized, it should throw a retryable error so that it can be retried 
> later, and also does not block other operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15181) Race condition on partition assigned to TopicBasedRemoteLogMetadataManager

2023-09-06 Thread Abhijeet Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhijeet Kumar resolved KAFKA-15181.

Resolution: Fixed

> Race condition on partition assigned to TopicBasedRemoteLogMetadataManager 
> ---
>
> Key: KAFKA-15181
> URL: https://issues.apache.org/jira/browse/KAFKA-15181
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>    Assignee: Abhijeet Kumar
>Priority: Major
>  Labels: tiered-storage
>
> TopicBasedRemoteLogMetadataManager (TBRLMM) uses a cache to be prepared 
> whever partitions are assigned.
> When partitions are assigned to the TBRLMM instance, a consumer is started to 
> keep the cache up to date.
> If the cache hasn't finalized to build, TBRLMM fails to return remote 
> metadata about partitions that are store on the backing topic. TBRLMM may not 
> recover from this failing state.
> A proposal to fix this issue would be wait after a partition is assigned for 
> the consumer to catch up. A similar logic is used at the moment when TBRLMM 
> writes to the topic, and uses send callback to wait for consumer to catch up. 
> This logic can be reused whever a partition is assigned, so when TBRLMM is 
> marked as initialized, cache is ready to serve requests.
> Reference: https://github.com/aiven/kafka/issues/33



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16706) Refactor ReplicationQuotaManager/RLMQuotaManager to eliminate code duplication

2024-05-12 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-16706:
--

 Summary: Refactor ReplicationQuotaManager/RLMQuotaManager to 
eliminate code duplication
 Key: KAFKA-16706
 URL: https://issues.apache.org/jira/browse/KAFKA-16706
 Project: Kafka
  Issue Type: Task
Reporter: Abhijeet Kumar


ReplicationQuotaManager and RLMQuotaManager implementations are similar. We 
should explore ways to refactor them to remove code duplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17069) Remote copy throttle metrics

2024-07-03 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-17069:
--

 Summary: Remote copy throttle metrics 
 Key: KAFKA-17069
 URL: https://issues.apache.org/jira/browse/KAFKA-17069
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17108) ListOffset API changes

2024-07-10 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-17108:
--

 Summary: ListOffset API changes
 Key: KAFKA-17108
 URL: https://issues.apache.org/jira/browse/KAFKA-17108
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar
Assignee: Abhijeet Kumar






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16853) Split RemoteLogManagerScheduledThreadPool

2024-07-17 Thread Abhijeet Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhijeet Kumar resolved KAFKA-16853.

Resolution: Fixed

> Split RemoteLogManagerScheduledThreadPool
> -
>
> Key: KAFKA-16853
> URL: https://issues.apache.org/jira/browse/KAFKA-16853
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Christo Lolov
>        Assignee: Abhijeet Kumar
>Priority: Major
>
> *Summary*
> To begin with create just the RemoteDataExpirationThreadPool and move 
> expiration to it. Keep all settings as if the only thread pool was the 
> RemoteLogManagerScheduledThreadPool. Ensure that the new thread pool is wired 
> correctly to the RemoteLogManager.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17266) Add dynamic broker config to enabled tiered offset follower fetch

2024-08-05 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-17266:
--

 Summary: Add dynamic broker config to enabled tiered offset 
follower fetch
 Key: KAFKA-17266
 URL: https://issues.apache.org/jira/browse/KAFKA-17266
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17302) ReplicaFetcher changes for fetching from tiered offset

2024-08-08 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-17302:
--

 Summary: ReplicaFetcher changes for fetching from tiered offset
 Key: KAFKA-17302
 URL: https://issues.apache.org/jira/browse/KAFKA-17302
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar
Assignee: Abhijeet Kumar






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17313) Fix object metric name for Tiered Storage quotas

2024-08-11 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-17313:
--

 Summary: Fix object metric name for Tiered Storage quotas
 Key: KAFKA-17313
 URL: https://issues.apache.org/jira/browse/KAFKA-17313
 Project: Kafka
  Issue Type: Task
Reporter: Abhijeet Kumar


Currently, the metric object name for tiered storage quotas metrics is set to 
`kafka.server:type=RemoteLogManager`, but it should instead be 
`kafka.log.remote:type=RemoteLogManager`.

This is because of a limitation where when creating a SensorAccess, it needs 
org.apache.kafka.common.metrics.Metrics, but with this the prefix becomes 
kafka.server and cannot be changed. We instead need to use KafkaMetricsGroup 
but there is no way to create SensorAccess with it.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17344) Add Unit tests for empty follower fetch scenarios

2024-08-14 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-17344:
--

 Summary: Add Unit tests for empty follower fetch scenarios
 Key: KAFKA-17344
 URL: https://issues.apache.org/jira/browse/KAFKA-17344
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar
Assignee: Abhijeet Kumar






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15236) Rename Remote Storage metrics to remove ambiguity

2023-07-22 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-15236:
--

 Summary: Rename Remote Storage metrics to remove ambiguity
 Key: KAFKA-15236
 URL: https://issues.apache.org/jira/browse/KAFKA-15236
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar
Assignee: Abhijeet Kumar


As per the Tiered Storage feature introduced in 
[KIP-405|https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage],
 we added several metrics related to reads(from) and writes(to) for remote 
storage. The naming convention that was followed is confusing to the users.

For eg. in regular Kafka, BytesIn means bytes *_written_* to the log, and 
BytesOut means bytes *_read_* from the log. But with tiered storage, the 
concepts are reversed.
 * RemoteBytesIn means "Number of bytes *_read_* from remote storage per second"
 * RemoteBytesOut means "Number of bytes _*written*_ to remote storage per 
second"

We should rename the tiered storage related metrics to remove any ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)