Hi Luke, > I saw you add the `RemoteLogSegmentMetadataRecord` and > `RemoteLogSegmentMetadataSnapshotRecord`, I'd like to confirm we're > not adding these records into any new API request/response, right?
Not really, the KIP only extends these records, they were added before. And to your question: no, they remain internal and used for the serde purpose only. That's basically a convenience to piggyback on the existing code generation mechanism we already have to the API. > I think the serialization/deserialization should be the remote storage > plugin provider's responsibility. And in Kafka's point of view, this > customMetadata is just a bunch of byte array, we don't have to parse > it. > Is my understanding correct? > If so, maybe you can add some comments into the KIP to mention it, to > avoid confusion. Yes, your understanding is correct. I updated the wording a bit to emphasize it's the plugin's responsibility to serialize and deserialize these bytes. Best, Ivan On Tue, 13 Jun 2023 at 05:30, Luke Chen <show...@gmail.com> wrote: > Hi Ivan, > > I agree this is a good improvement for remote storage plugin. > One question from me, > I saw you add the `RemoteLogSegmentMetadataRecord` and > `RemoteLogSegmentMetadataSnapshotRecord`, I'd like to confirm we're > not adding these records into any new API request/response, right? > I think the serialization/deserialization should be the remote storage > plugin provider's responsibility. And in Kafka's point of view, this > customMetadata is just a bunch of byte array, we don't have to parse > it. > Is my understanding correct? > If so, maybe you can add some comments into the KIP to mention it, to > avoid confusion. > > Usually the "record" being added in the KIP will be the one affecting > the public interface, ex: API request/response. > So I'd like to confirm it. > > > Thanks. > Luke > > On Thu, Jun 8, 2023 at 2:09 AM Ivan Yurchenko <ivan0yurche...@gmail.com> > wrote: > > > > Hi Satish, > > > > Thank you for your feedback. > > > > I've nothing against going from Map<String, byte[]> to byte[]. > > Serialization should not be a problem for RSM implementations: `Struct`, > > `Schema` and other useful serde classes are distributed as a part of the > > kafka-clients library. > > > > Also a good idea to add the size limiting setting, some > > `remote.log.metadata.custom.metadata.max.size`. A sensible default may be > > 10 KB, which is enough to host a struct with 10 long (almost) 1K symbol > > ASCII strings. > > > > If a piece of custom metadata exceeds the limit, the execution of > > RLMTask.copyLogSegmentsToRemote should be interrupted with an error > message. > > > > Does this sound good? > > If so, I'll update the KIP accordingly. And I think it may be time for > the > > vote after that. > > > > Best, > > Ivan > > > > > > > > On Sat, 3 Jun 2023 at 17:13, Satish Duggana <satish.dugg...@gmail.com> > > wrote: > > > > > Hi Ivan, > > > Thanks for the KIP. > > > > > > The motivation of the KIP looks reasonable to me. It requires a way > > > for RSM providers to add custom metadata with the existing > > > RemoteLogSegmentMetadata. I remember we wanted to introduce a very > > > similar change in the earlier proposals called > > > RemoteLogMetadataContext. But we dropped that as we did not feel a > > > strong need at that time and we wanted to revisit it if needed. But I > > > see there is a clear need for this kind of custom metadata to keep > > > with RemoteLogSegmentMetadata. > > > > > > It is better to introduce a new class for this custom metadata in > > > RemoteLogSegmentMetadata like below for any changes in the future. > > > RemoteLogSegmentMetadata will have this as an optional value. > > > > > > public class RemoteLogSegmentMetadata { > > > ... > > > public static class CustomMetadata { > > > private final byte[] value; > > > ... > > > } > > > ... > > > } > > > > > > This is completely opaque and it is the RSM implementation provider's > > > responsibility in serializing and deserializing the bytes. We can > > > introduce a property to guard the size with a configurable property > > > with a default value to avoid any unwanted large size values. > > > > > > Thanks, > > > Satish. > > > > > > On Tue, 30 May 2023 at 10:59, Ivan Yurchenko <ivan0yurche...@gmail.com > > > > > wrote: > > > > > > > > Hi all, > > > > > > > > I want to bring this to a conclusion (positive or negative), so if > there > > > > are no more questions in a couple of days, I'll put the KIP to the > vote. > > > > > > > > Best, > > > > Ivan > > > > > > > > > > > > On Fri, 5 May 2023 at 18:42, Ivan Yurchenko < > ivan0yurche...@gmail.com> > > > > wrote: > > > > > > > > > Hi Alexandre, > > > > > > > > > > > combining custom > > > > > > metadata with rlmMetadata increases coupling between Kafka and > the > > > > > > plugin. > > > > > > > > > > This is true. However, (if I understand your concern correctly,) > > > > > rlmMetadata in the current form may be independent from RSM > plugins, > > > but > > > > > data they point to are accessible only via the particular plugin > (the > > > one > > > > > that wrote the data or a compatible one). It seems, this coupling > > > already > > > > > exists, but it is implicit. To make my point more concrete, > imagine an > > > S3 > > > > > RSM which maps RemoteLogSegmentMetadata objects to S3 object keys. > This > > > > > mapping logic is a part of the RSM plugin and without it the > metadata > > > is > > > > > useless. I think it will not get worse if (to follow the example) > the > > > > > plugin makes the said S3 object keys explicit by adding them to the > > > > > metadata. From the high level point of view, moving the custom > > > metadata to > > > > > a separate topic doesn't change the picture: it's still the plugin > that > > > > > binds the standard and custom metadata together. > > > > > > > > > > > > > > > > For instance, the custom metadata may need to be modified > > > > > > outside of Kafka, but the rlmMetadata would still be cached on > > > brokers > > > > > > independently of any update of custom metadata. Since both types > of > > > > > > metadata are authored by different systems, and are cached in > > > > > > different layers, this may become a problem, or make plugin > migration > > > > > > more difficult. What do you think? > > > > > > > > > > This is indeed a problem. I think a solution to this would be to > > > clearly > > > > > state that metadata being modified outside of Kafka is out of > scope and > > > > > instruct the plugin authors that custom metadata could be provided > only > > > > > reactively from the copyLogSegmentData method and must remain > immutable > > > > > after that. Does it make sense? > > > > > > > > > > > > > > > > Yes, you are right that the suggested alternative is to let the > > > plugin > > > > > > store its own metadata separately with a solution chosen by the > admin > > > > > > or plugin provider. For instance, it could be using a dedicated > topic > > > > > > if chosen to, or relying on an external key-value store. > > > > > > > > > > I see. Yes, this option always exists and doesn't even require a > KIP. > > > The > > > > > biggest drawback I see is that a plugin will need to reimplement > the > > > > > consumer/producer + caching mechanics that will exist on the broker > > > side > > > > > for the standard remote metadata. I'd like to avoid this and this > KIP > > > is > > > > > the best solution I see. > > > > > > > > > > Best, > > > > > Ivan > > > > > > > > > > > > > > > > > > > > On Tue, 18 Apr 2023 at 13:02, Alexandre Dupriez < > > > > > alexandre.dupr...@gmail.com> wrote: > > > > > > > > > >> Hi Ivan, > > > > >> > > > > >> Thanks for the follow-up. > > > > >> > > > > >> Yes, you are right that the suggested alternative is to let the > plugin > > > > >> store its own metadata separately with a solution chosen by the > admin > > > > >> or plugin provider. For instance, it could be using a dedicated > topic > > > > >> if chosen to, or relying on an external key-value store. > > > > >> > > > > >> I agree with you on the existing risks associated with running > > > > >> third-party code inside Apache Kafka. That said, combining custom > > > > >> metadata with rlmMetadata increases coupling between Kafka and the > > > > >> plugin. For instance, the custom metadata may need to be modified > > > > >> outside of Kafka, but the rlmMetadata would still be cached on > brokers > > > > >> independently of any update of custom metadata. Since both types > of > > > > >> metadata are authored by different systems, and are cached in > > > > >> different layers, this may become a problem, or make plugin > migration > > > > >> more difficult. What do you think? > > > > >> > > > > >> I have a vague memory of this being discussed back when the tiered > > > > >> storage KIP was started. Maybe Satish has more background on this. > > > > >> > > > > >> Thanks, > > > > >> Alexandre > > > > >> > > > > >> Le lun. 17 avr. 2023 à 16:50, Ivan Yurchenko > > > > >> <ivan0yurche...@gmail.com> a écrit : > > > > >> > > > > > >> > Hi Alexandre, > > > > >> > > > > > >> > Thank you for your feedback! > > > > >> > > > > > >> > > One question I would have is, what is the benefit of adding > these > > > > >> > > custom metadata in the rlmMetadata rather than letting the > plugin > > > > >> > > manage access and persistence to them? > > > > >> > > > > > >> > Could you please elaborate? Do I understand correctly that the > idea > > > is > > > > >> that > > > > >> > the plugin will have its own storage for those custom metadata, > for > > > > >> example > > > > >> > a special topic? > > > > >> > > > > > >> > > It would be possible for a user > > > > >> > > to use custom metadata large enough to adversely impact > access to > > > and > > > > >> > > caching of the rlmMetadata by Kafka. > > > > >> > > > > > >> > Since the custom metadata is 100% under control of the RSM > plugin, > > > the > > > > >> risk > > > > >> > is as big as the risk of running a third-party code (i.e. the > RSM > > > > >> plugin). > > > > >> > The cluster admin must make the decision if they trust it. > > > > >> > To mitigate this risk and put it under control, the RSM plugin > > > > >> > implementations could document what custom metadata they use and > > > > >> estimate > > > > >> > their size. > > > > >> > > > > > >> > Best, > > > > >> > Ivan > > > > >> > > > > > >> > > > > > >> > On Mon, 17 Apr 2023 at 18:14, Alexandre Dupriez < > > > > >> alexandre.dupr...@gmail.com> > > > > >> > wrote: > > > > >> > > > > > >> > > Hi Ivan, > > > > >> > > > > > > >> > > Thank you for the KIP. > > > > >> > > > > > > >> > > I think the KIP clearly explains the need for out-of-band > metadata > > > > >> > > authored and used by an implementation of the remote storage > > > manager. > > > > >> > > One question I would have is, what is the benefit of adding > these > > > > >> > > custom metadata in the rlmMetadata rather than letting the > plugin > > > > >> > > manage access and persistence to them? > > > > >> > > > > > > >> > > Maybe one disadvantage and potential risk with the approach > > > proposed > > > > >> > > in the KIP is that the rlmMetadata is not of a predefined, > > > relatively > > > > >> > > constant size (although corner cases with thousands of leader > > > epochs > > > > >> > > in the leader epoch map are possible). It would be possible > for a > > > user > > > > >> > > to use custom metadata large enough to adversely impact > access to > > > and > > > > >> > > caching of the rlmMetadata by Kafka. > > > > >> > > > > > > >> > > Thanks, > > > > >> > > Alexandre > > > > >> > > > > > > >> > > Le jeu. 6 avr. 2023 à 16:03, hzh0425 <hzhka...@163.com> a > écrit : > > > > >> > > > > > > > >> > > > I think it's a good idea as we may want to store remote > > > segments in > > > > >> > > different buckets > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > | | > > > > >> > > > hzhka...@163.com > > > > >> > > > | > > > > >> > > > | > > > > >> > > > 邮箱:hzhka...@163.com > > > > >> > > > | > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > > > > > >> > > > ---- 回复的原邮件 ---- > > > > >> > > > | 发件人 | Ivan Yurchenko<ivan0yurche...@gmail.com> | > > > > >> > > > | 日期 | 2023年04月06日 22:37 | > > > > >> > > > | 收件人 | dev@kafka.apache.org<dev@kafka.apache.org> | > > > > >> > > > | 抄送至 | | > > > > >> > > > | 主题 | [DISCUSS] KIP-917: Additional custom metadata for > remote > > > log > > > > >> > > segment | > > > > >> > > > Hello! > > > > >> > > > > > > > >> > > > I would like to start the discussion thread on KIP-917: > > > Additional > > > > >> custom > > > > >> > > > metadata for remote log segment [1] > > > > >> > > > This KIP is fairly small and proposes to add a new field to > the > > > > >> remote > > > > >> > > > segment metadata. > > > > >> > > > > > > > >> > > > Thank you! > > > > >> > > > > > > > >> > > > Best, > > > > >> > > > Ivan > > > > >> > > > > > > > >> > > > [1] > > > > >> > > > > > > > >> > > > > > > >> > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-917%3A+Additional+custom+metadata+for+remote+log+segment > > > > >> > > > > > > >> > > > > > > > > >