Hi Satish,

I understand your point and I agree with it. TBH, I didn't take into
account the in-memory cache.
Surely, 10 KiB is a relatively arbitrary number. Lowering it to the
proposed 128 bytes won't probably change anything for most of the
implementations. Others could ask for an increase.
I've updated the KIP.
Thanks!

Ivan


On Tue, 13 Jun 2023 at 09:47, Satish Duggana <satish.dugg...@gmail.com>
wrote:

> Thanks Ivan for taking the feedback/suggestions into the KIP.
>
> 10kb limit as default is huge for each segment metadata. This should
> only be used for a few keys pointing to more values if needed from a
> different store. This is intended to be used for purposes like
> pointing to a specific cluster or a specific bucket etc. This is
> generally required for identifiers like uuid or other formats that can
> amount to a few 10s of bytes. So, I suggest having a default value of
> 128 bytes instead of 10kb. If it requires metadata with a few KBs then
> it is better to avoid keeping all of them inmemory with default topic
> based RLMM and have them in a cache with separate storage. In that
> case, it is better to think about why we need that large metadata with
> each segment.
>
> ~Satish.
>
>
> On Wed, 7 Jun 2023 at 23:38, Ivan Yurchenko <ivan0yurche...@gmail.com>
> wrote:
> >
> > Hi Satish,
> >
> > Thank you for your feedback.
> >
> > I've nothing against going from Map<String, byte[]> to byte[].
> > Serialization should not be a problem for RSM implementations: `Struct`,
> > `Schema` and other useful serde classes are distributed as a part of the
> > kafka-clients library.
> >
> > Also a good idea to add the size limiting setting, some
> > `remote.log.metadata.custom.metadata.max.size`. A sensible default may be
> > 10 KB, which is enough to host a struct with 10 long (almost) 1K symbol
> > ASCII strings.
> >
> > If a piece of custom metadata exceeds the limit, the execution of
> > RLMTask.copyLogSegmentsToRemote should be interrupted with an error
> message.
> >
> > Does this sound good?
> > If so, I'll update the KIP accordingly. And I think it may be time for
> the
> > vote after that.
> >
> > Best,
> > Ivan
> >
> >
> >
> > On Sat, 3 Jun 2023 at 17:13, Satish Duggana <satish.dugg...@gmail.com>
> > wrote:
> >
> > > Hi Ivan,
> > > Thanks for the KIP.
> > >
> > > The motivation of the KIP looks reasonable to me. It requires a way
> > > for RSM providers to add custom metadata with the existing
> > > RemoteLogSegmentMetadata. I remember we wanted to introduce a very
> > > similar change in the earlier proposals called
> > > RemoteLogMetadataContext. But we dropped that as we did not feel a
> > > strong need at that time and we wanted to revisit it if needed. But I
> > > see there is a clear need for this kind of custom metadata to keep
> > > with RemoteLogSegmentMetadata.
> > >
> > > It is better to introduce a new class for this custom metadata in
> > > RemoteLogSegmentMetadata like below for any changes in the future.
> > > RemoteLogSegmentMetadata will have this as an optional value.
> > >
> > > public class RemoteLogSegmentMetadata {
> > > ...
> > > public static class CustomMetadata {
> > >      private final byte[] value;
> > >     ...
> > > }
> > > ...
> > > }
> > >
> > > This is completely opaque and it is the RSM implementation provider's
> > > responsibility in serializing and deserializing the bytes. We can
> > > introduce a property to guard the size with a configurable property
> > > with a default value to avoid any unwanted large size values.
> > >
> > > Thanks,
> > > Satish.
> > >
> > > On Tue, 30 May 2023 at 10:59, Ivan Yurchenko <ivan0yurche...@gmail.com
> >
> > > wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I want to bring this to a conclusion (positive or negative), so if
> there
> > > > are no more questions in a couple of days, I'll put the KIP to the
> vote.
> > > >
> > > > Best,
> > > > Ivan
> > > >
> > > >
> > > > On Fri, 5 May 2023 at 18:42, Ivan Yurchenko <
> ivan0yurche...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Alexandre,
> > > > >
> > > > > > combining custom
> > > > > > metadata with rlmMetadata increases coupling between Kafka and
> the
> > > > > > plugin.
> > > > >
> > > > > This is true. However, (if I understand your concern correctly,)
> > > > > rlmMetadata in the current form may be independent from RSM
> plugins,
> > > but
> > > > > data they point to are accessible only via the particular plugin
> (the
> > > one
> > > > > that wrote the data or a compatible one). It seems, this coupling
> > > already
> > > > > exists, but it is implicit. To make my point more concrete,
> imagine an
> > > S3
> > > > > RSM which maps RemoteLogSegmentMetadata objects to S3 object keys.
> This
> > > > > mapping logic is a part of the RSM plugin and without it the
> metadata
> > > is
> > > > > useless. I think it will not get worse if (to follow the example)
> the
> > > > > plugin makes the said S3 object keys explicit by adding them to the
> > > > > metadata. From the high level point of view, moving the custom
> > > metadata to
> > > > > a separate topic doesn't change the picture: it's still the plugin
> that
> > > > > binds the standard and custom metadata together.
> > > > >
> > > > >
> > > > > > For instance, the custom metadata may need to be modified
> > > > > > outside of Kafka, but the rlmMetadata would still be cached on
> > > brokers
> > > > > > independently of any update of custom metadata. Since both types
> of
> > > > > > metadata are authored by different systems, and are cached in
> > > > > > different layers, this may become a problem, or make plugin
> migration
> > > > > > more difficult. What do you think?
> > > > >
> > > > > This is indeed a problem. I think a solution to this would be to
> > > clearly
> > > > > state that metadata being modified outside of Kafka is out of
> scope and
> > > > > instruct the plugin authors that custom metadata could be provided
> only
> > > > > reactively from the copyLogSegmentData method and must remain
> immutable
> > > > > after that. Does it make sense?
> > > > >
> > > > >
> > > > > > Yes, you are right that the suggested alternative is to let the
> > > plugin
> > > > > > store its own metadata separately with a solution chosen by the
> admin
> > > > > > or plugin provider. For instance, it could be using a dedicated
> topic
> > > > > > if chosen to, or relying on an external key-value store.
> > > > >
> > > > > I see. Yes, this option always exists and doesn't even require a
> KIP.
> > > The
> > > > > biggest drawback I see is that a plugin will need to reimplement
> the
> > > > > consumer/producer + caching mechanics that will exist on the broker
> > > side
> > > > > for the standard remote metadata. I'd like to avoid this and this
> KIP
> > > is
> > > > > the best solution I see.
> > > > >
> > > > > Best,
> > > > > Ivan
> > > > >
> > > > >
> > > > >
> > > > > On Tue, 18 Apr 2023 at 13:02, Alexandre Dupriez <
> > > > > alexandre.dupr...@gmail.com> wrote:
> > > > >
> > > > >> Hi Ivan,
> > > > >>
> > > > >> Thanks for the follow-up.
> > > > >>
> > > > >> Yes, you are right that the suggested alternative is to let the
> plugin
> > > > >> store its own metadata separately with a solution chosen by the
> admin
> > > > >> or plugin provider. For instance, it could be using a dedicated
> topic
> > > > >> if chosen to, or relying on an external key-value store.
> > > > >>
> > > > >> I agree with you on the existing risks associated with running
> > > > >> third-party code inside Apache Kafka. That said, combining custom
> > > > >> metadata with rlmMetadata increases coupling between Kafka and the
> > > > >> plugin. For instance, the custom metadata may need to be modified
> > > > >> outside of Kafka, but the rlmMetadata would still be cached on
> brokers
> > > > >> independently of any update of custom metadata. Since both types
> of
> > > > >> metadata are authored by different systems, and are cached in
> > > > >> different layers, this may become a problem, or make plugin
> migration
> > > > >> more difficult. What do you think?
> > > > >>
> > > > >> I have a vague memory of this being discussed back when the tiered
> > > > >> storage KIP was started. Maybe Satish has more background on this.
> > > > >>
> > > > >> Thanks,
> > > > >> Alexandre
> > > > >>
> > > > >> Le lun. 17 avr. 2023 à 16:50, Ivan Yurchenko
> > > > >> <ivan0yurche...@gmail.com> a écrit :
> > > > >> >
> > > > >> > Hi Alexandre,
> > > > >> >
> > > > >> > Thank you for your feedback!
> > > > >> >
> > > > >> > > One question I would have is, what is the benefit of adding
> these
> > > > >> > > custom metadata in the rlmMetadata rather than letting the
> plugin
> > > > >> > > manage access and persistence to them?
> > > > >> >
> > > > >> > Could you please elaborate? Do I understand correctly that the
> idea
> > > is
> > > > >> that
> > > > >> > the plugin will have its own storage for those custom metadata,
> for
> > > > >> example
> > > > >> > a special topic?
> > > > >> >
> > > > >> > > It would be possible for a user
> > > > >> > > to use custom metadata large enough to adversely impact
> access to
> > > and
> > > > >> > > caching of the rlmMetadata by Kafka.
> > > > >> >
> > > > >> > Since the custom metadata is 100% under control of the RSM
> plugin,
> > > the
> > > > >> risk
> > > > >> > is as big as the risk of running a third-party code (i.e. the
> RSM
> > > > >> plugin).
> > > > >> > The cluster admin must make the decision if they trust it.
> > > > >> > To mitigate this risk and put it under control, the RSM plugin
> > > > >> > implementations could document what custom metadata they use and
> > > > >> estimate
> > > > >> > their size.
> > > > >> >
> > > > >> > Best,
> > > > >> > Ivan
> > > > >> >
> > > > >> >
> > > > >> > On Mon, 17 Apr 2023 at 18:14, Alexandre Dupriez <
> > > > >> alexandre.dupr...@gmail.com>
> > > > >> > wrote:
> > > > >> >
> > > > >> > > Hi Ivan,
> > > > >> > >
> > > > >> > > Thank you for the KIP.
> > > > >> > >
> > > > >> > > I think the KIP clearly explains the need for out-of-band
> metadata
> > > > >> > > authored and used by an implementation of the remote storage
> > > manager.
> > > > >> > > One question I would have is, what is the benefit of adding
> these
> > > > >> > > custom metadata in the rlmMetadata rather than letting the
> plugin
> > > > >> > > manage access and persistence to them?
> > > > >> > >
> > > > >> > > Maybe one disadvantage and potential risk with the approach
> > > proposed
> > > > >> > > in the KIP is that the rlmMetadata is not of a predefined,
> > > relatively
> > > > >> > > constant size (although corner cases with thousands of leader
> > > epochs
> > > > >> > > in the leader epoch map are possible). It would be possible
> for a
> > > user
> > > > >> > > to use custom metadata large enough to adversely impact
> access to
> > > and
> > > > >> > > caching of the rlmMetadata by Kafka.
> > > > >> > >
> > > > >> > > Thanks,
> > > > >> > > Alexandre
> > > > >> > >
> > > > >> > > Le jeu. 6 avr. 2023 à 16:03, hzh0425 <hzhka...@163.com> a
> écrit :
> > > > >> > > >
> > > > >> > > > I think it's a good idea as we may want to store remote
> > > segments in
> > > > >> > > different buckets
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > | |
> > > > >> > > > hzhka...@163.com
> > > > >> > > > |
> > > > >> > > > |
> > > > >> > > > 邮箱:hzhka...@163.com
> > > > >> > > > |
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > >
> > > > >> > > > ---- 回复的原邮件 ----
> > > > >> > > > | 发件人 | Ivan Yurchenko<ivan0yurche...@gmail.com> |
> > > > >> > > > | 日期 | 2023年04月06日 22:37 |
> > > > >> > > > | 收件人 | dev@kafka.apache.org<dev@kafka.apache.org> |
> > > > >> > > > | 抄送至 | |
> > > > >> > > > | 主题 | [DISCUSS] KIP-917: Additional custom metadata for
> remote
> > > log
> > > > >> > > segment |
> > > > >> > > > Hello!
> > > > >> > > >
> > > > >> > > > I would like to start the discussion thread on KIP-917:
> > > Additional
> > > > >> custom
> > > > >> > > > metadata for remote log segment [1]
> > > > >> > > > This KIP is fairly small and proposes to add a new field to
> the
> > > > >> remote
> > > > >> > > > segment metadata.
> > > > >> > > >
> > > > >> > > > Thank you!
> > > > >> > > >
> > > > >> > > > Best,
> > > > >> > > > Ivan
> > > > >> > > >
> > > > >> > > > [1]
> > > > >> > > >
> > > > >> > >
> > > > >>
> > >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-917%3A+Additional+custom+metadata+for+remote+log+segment
> > > > >> > >
> > > > >>
> > > > >
> > >
>

Reply via email to