Hey Matthias, Yes, you are totally right, “list.key/value.serializer.type" in ProducerConfigs. Removed!
And yes, now StreamsConfig just points to configs stored in CommonClientsConfig. I’ll update the KIP. I think we can continue with voting now. Best, Daniyar Yeralin > On Jul 23, 2019, at 9:08 PM, Matthias J. Sax <matth...@confluent.io> wrote: > >>> Just to make sure I understand the problem you're highlighting: >>> I guess the difference is that the serializer and deserializer that are >>> nested inside the serde also need to be configured? So, by default I'd have >>> to specify all six configs when I'm using Streams? > > That is not the problem. And you actually describe the solution for it > yourself: > >>> I guess in the Serde, it could make use of a package-protected constructor >>> for the serializer and deserializer that fixes the list type and inner type >>> to the serde-configured ones. Then, when you're configuring Streams, you >>> only need to specify the StreamsConfigs. > > > > > The problem is, that `ListSerde` is in package `clients` and thus > `ListSerde` cannot access `StreamsConfig`, and hence cannot use > `StreamsConfig#DEFAULT_LIST_KEY_SERDE_TYPE` (and others). Therefore, we > either need to hard-code strings literal for the config names (what does > not sound right) or add `CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE` > (and others). > > In StreamsConfig we would just redefine them for convenience: > >> public static final String DEFAULT_LIST_KEY_SERDE_TYPE = >> CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE; > > > Note that `TimeWindowSerde` is contained in `streams` package and thus > it can access `StreamsConfig` and > `StreamsConfig#DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS`. > > > > > Btw: I just realized that we actually don't need `ProducerConfig` > >> list.key/value.serializer.type > > because the list-type is irrelevant on write. We only need `inner` config. > > > > -Matthias > > > On 7/23/19 1:30 PM, John Roesler wrote: >> Hmm, that's a tricky situation. >> >> I think Daniyar was on the right track... Producer only cares about >> serializer configs, and Consumer only cares about deserializer configs. >> >> I didn't see the problem with your proposal: >> >> ProducerConfig: >>> list.key/value.serializer.type >>> list.key/value.serializer.inner >>> ConsumerConfig: >>> list.key/value.deserializer.type >>> list.key/value.deserializer.inner >>> StreamsConfig: >>> default.list.key/value.serde.type >>> default.list.key/value.serde.inner >> >> >> It seems like the key/value serde configs are a better analogy than the >> windowed serde. >> ProducerConfig: key.serializer >> ConsumerConfig: key.deserializer >> StreamsConfig: default.key.serde >> >> Just to make sure I understand the problem you're highlighting: >> I guess the difference is that the serializer and deserializer that are >> nested inside the serde also need to be configured? So, by default I'd have >> to specify all six configs when I'm using Streams? >> >> I guess in the Serde, it could make use of a package-protected constructor >> for the serializer and deserializer that fixes the list type and inner type >> to the serde-configured ones. Then, when you're configuring Streams, you >> only need to specify the StreamsConfigs. >> >> Does that work? >> -John >> >> >> On Tue, Jul 23, 2019 at 11:39 AM Development <d...@yeralin.net> wrote: >> >>> Bump >>> >>>> On Jul 22, 2019, at 11:26 AM, Development <d...@yeralin.net> wrote: >>>> >>>> Hey Matthias, >>>> >>>> It looks a little confusing, but I don’t have enough expertise to judge >>> on the configuration placement. >>>> >>>> If you think, it is fine I’ll go ahead with this approach. >>>> >>>> Best, >>>> Daniyar Yeralin >>>> >>>>> On Jul 19, 2019, at 5:49 PM, Matthias J. Sax <matth...@confluent.io> >>> wrote: >>>>> >>>>> Good point. >>>>> >>>>> I guess the simplest solution is, to actually add >>>>> >>>>>>> default.list.key/value.serde.type >>>>>>> default.list.key/value.serde.inner >>>>> >>>>> to both `CommonClientConfigs` and `StreamsConfig`. >>>>> >>>>> It's not super clean, but I think it's the best we can do. Thoughts? >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> On 7/19/19 1:23 PM, Development wrote: >>>>>> Hi Matthias, >>>>>> >>>>>> I agree, ConsumerConfig did not seem like a right place for these >>> configurations. >>>>>> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig. >>>>>> >>>>>> However, I have a question. What should I do in "configure(Map<String, >>> ?> configs, boolean isKey)” methods? Which configurations should I try to >>> locate? I was comparing my (de)serializer implementations with >>> SessionWindows(De)serializer classes, and they use StreamsConfig class to >>> get either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS : >>> StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS >>>>>> >>>>>> In my case, as I mentioned earlier, StreamsConfig class is not >>> accessible from org.apache.kafka.common.serialization package. So, I can’t >>> utilize it. Any suggestions here? >>>>>> >>>>>> Best, >>>>>> Daniyar Yeralin >>>>>> >>>>>> >>>>>>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <matth...@confluent.io> >>> wrote: >>>>>>> >>>>>>> Thanks! >>>>>>> >>>>>>> One minor question about the configs. The KIP adds three classes, a >>>>>>> Serializer, a Deserializer, and a Serde. >>>>>>> >>>>>>> Hence, would it make sense to add the corresponding configs to >>>>>>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using slightly >>>>>>> different names each time? >>>>>>> >>>>>>> >>>>>>> Somethin like this: >>>>>>> >>>>>>> ProducerConfig: >>>>>>> >>>>>>> list.key/value.serializer.type >>>>>>> list.key/value.serializer.inner >>>>>>> >>>>>>> ConsumerConfig: >>>>>>> >>>>>>> list.key/value.deserializer.type >>>>>>> list.key/value.deserializer.inner >>>>>>> >>>>>>> StreamsConfig: >>>>>>> >>>>>>> default.list.key/value.serde.type >>>>>>> default.list.key/value.serde.inner >>>>>>> >>>>>>> >>>>>>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound >>> right >>>>>>> to me. Also note, that it seems better to avoid the `default.` prefix >>>>>>> for consumers and producers because there is only one Serializer or >>>>>>> Deserializer anyway. Only for Streams, there are multiple and >>>>>>> StreamsConfig specifies the default one of an operator does not >>>>>>> overwrite it. >>>>>>> >>>>>>> Thoughts? >>>>>>> >>>>>>> >>>>>>> Also, the KIP should explicitly mention to what classed certain >>> configs >>>>>>> are added. Atm, the KIP only list parameter names, but does not state >>>>>>> where those are added. >>>>>>> >>>>>>> >>>>>>> -Matthias >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 7/16/19 1:11 PM, Development wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Yes, totally forgot about the statement. KIP-466 is updated. >>>>>>>> >>>>>>>> Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman >>> for your valuable input! >>>>>>>> >>>>>>>> I hope I did not cause too much trouble :) >>>>>>>> >>>>>>>> I’ll start the vote now. >>>>>>>> >>>>>>>> Best, >>>>>>>> Daniyar Yeralin >>>>>>>> >>>>>>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <j...@confluent.io> >>> wrote: >>>>>>>>> >>>>>>>>> Hi Daniyar, >>>>>>>>> >>>>>>>>> Thanks for that update. I took a look, and I think this is in good >>> shape. >>>>>>>>> >>>>>>>>> One note, the statement "New method public static <T> Serde<List<T>> >>>>>>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class >>>>>>>>> (infers list implementation and inner serde from config file)" is >>>>>>>>> still present in the KIP, although I do it is was removed from the >>> PR. >>>>>>>>> >>>>>>>>> Once you remove that statement from the KIP, then I think this KIP >>> is >>>>>>>>> ready to go up for a vote! Then, we can really review the PR in >>>>>>>>> earnest and get this thing merged. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> -john >>>>>>>>> >>>>>>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <d...@yeralin.net> >>> wrote: >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> Pushed new changes under my PR: >>> https://github.com/apache/kafka/pull/6592 < >>> https://github.com/apache/kafka/pull/6592> >>>>>>>>>> >>>>>>>>>> Feel free to put any comments in there. >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Daniyar Yeralin >>>>>>>>>> >>>>>>>>>>> On Jul 15, 2019, at 1:06 PM, Development <d...@yeralin.net> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi John, >>>>>>>>>>> >>>>>>>>>>> I knew I was missing something. Yes, that makes sense now, I >>> removed all `listSerde()` methods, and left empty constructors instead. >>>>>>>>>>> >>>>>>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t >>> have any properties related to serdes, and that bothers me a little. >>>>>>>>>>> >>>>>>>>>>> All properties like `default.key.serde` >>> `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want >>> to create a confusion. >>>>>>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and >>> its (de)serializers are not located in >>> org.apache.kafka.common.serialization. I guess it kind of makes sense since >>> windowed serdes are only available for kafka streams, not vice versa. >>>>>>>>>>> >>>>>>>>>>> If everyone is okay to put list properties in >>> `CommonClientConfigs` class, I’ll go ahead and do that then. >>>>>>>>>>> >>>>>>>>>>> Thank you for your input! >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>> >>>>>>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <j...@confluent.io> >>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> Regarding the placement, you might as well move the constants to >>> `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and >>> the configs and the code are in the same module. >>>>>>>>>>>> >>>>>>>>>>>> Regarding the constructor... What Matthias said is correct: The >>> serde, serializer, and deserializer all need to have zero-arg constructors >>> so they can be instantiated reflectively by Kafka. However, the factory >>> method you proposed "New method public static <T> Serde<List<T>> >>> ListSerde()" is not a constructor, and is not required. It would be used >>> purely from the Java interface, but has the drawbacks I listed above. This >>> method, not the constructor, is what I proposed to remove from the KIP. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> -John >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <d...@yeralin.net >>> <mailto:d...@yeralin.net>> wrote: >>>>>>>>>>>> One problem though. >>>>>>>>>>>> >>>>>>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m >>> trying to mimic the implementation of my ListSerde accordingly. >>>>>>>>>>>> >>>>>>>>>>>> I created couple constants under StreamsConfig: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> And trying to do similar construct: >>>>>>>>>>>> final String propertyName = isKey ? >>> StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : >>> StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS; >>>>>>>>>>>> But then found out that StreamsConfig is not accessible from >>> org.apache.kafka.common.serialization package while window serde >>> (de)serializers are located under org.apache.kafka.streams.kstream package. >>>>>>>>>>>> >>>>>>>>>>>> What should I do? Should I move my classes under >>> org.apache.kafka.streams.kstream package instead? >>>>>>>>>>>> >>>>>>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <d...@yeralin.net >>> <mailto:d...@yeralin.net>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Matthias, >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you for your input. >>>>>>>>>>>>> >>>>>>>>>>>>> I updated the KIP, made it a little more readable. >>>>>>>>>>>>> >>>>>>>>>>>>> I think the configuration parameters strategy is finalized then. >>>>>>>>>>>>> >>>>>>>>>>>>> Do you have any other questions/concerns regarding this KIP? >>>>>>>>>>>>> >>>>>>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and commit >>> them under my PR. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>> >>>>>>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax < >>> matth...@confluent.io <mailto:matth...@confluent.io>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Daniyar, >>>>>>>>>>>>>> >>>>>>>>>>>>>> thanks for the update to the KIP. It's in really good shape >>> and well >>>>>>>>>>>>>> written. >>>>>>>>>>>>>> >>>>>>>>>>>>>> About the default constructor question: >>>>>>>>>>>>>> >>>>>>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default >>> constructor to >>>>>>>>>>>>>> create them easily via reflections when specifies in a config. >>> I >>>>>>>>>>>>>> understand that it is not super user friendly, but all >>> existing code >>>>>>>>>>>>>> works this way. Hence, it seems best to stick with the >>> established pattern. >>>>>>>>>>>>>> >>>>>>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and >>>>>>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to >>> improve user >>>>>>>>>>>>>> experience that address the exact issue John raised. (cf >>>>>>>>>>>>>> https://github.com/apache/kafka/pull/7067 < >>> https://github.com/apache/kafka/pull/7067>) >>>>>>>>>>>>>> >>>>>>>>>>>>>> Note, that if a user would instantiate the Serde manually, the >>> user >>>>>>>>>>>>>> would also need to call `configure()` to setup the inner >>> serdes. Kafka >>>>>>>>>>>>>> Streams would not setup those automatically and one might most >>> likely >>>>>>>>>>>>>> end-up with an NPE. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes` >>> are >>>>>>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For >>> `WindowedSerdes`, >>>>>>>>>>>>>> we use the following parameter names: >>>>>>>>>>>>>> >>>>>>>>>>>>>> - default.windowed.key.serde.inner >>>>>>>>>>>>>> - default.windowed.value.serde.inner >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> It might be good to align the naming pattern. I would also >>> suggest to >>>>>>>>>>>>>> use `type` instead of `impl`? >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> default.key.list.serde.impl -> default.list.key.serde.type >>>>>>>>>>>>>> default.value.list.serde.impl -> >>> default.list.value.serde.type >>>>>>>>>>>>>> default.key.list.serde.element -> >>> default.list.key.serde.inner >>>>>>>>>>>>>> default.value.list.serde.element -> >>> default.list.value.serde.inner >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 7/10/19 8:52 AM, Development wrote: >>>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is >>> that it goes against what Matthias suggested earlier: >>>>>>>>>>>>>>> "I think that ... `ListSerde` should have an default >>> constructor and it should be possible to pass in the `Class listClass` >>> information via a configuration. Otherwise, KafkaStreams cannot use it as >>> default serde.” >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What do you think about that? I hope I’m not confusing >>> anything. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <j...@confluent.io >>> <mailto:j...@confluent.io>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for >>> the update, too. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Just one more super-small question, do we need this variant: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in >>> org.apache.kafka.common.serialization.Serdes class (infers list >>> implementation and inner serde from config file) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It seems like this situation implies my config file is >>> already set up for the list serde, so passing this serde (e.g., in >>> Produced) would have the same effect as not specifying it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I guess that it could be the case that you have the >>> `default.key/value.serde` set to something else, like StringSerde, but you >>> still have the `default.key/value.list.serde.impl/element` set. This seems >>> like it would result in more confusion than convenience, so my gut instinct >>> is maybe we shouldn't introduce the `ListSerde()` variant until people >>> actually request it later on. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully >>> source-code-driven, not half/half. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <d...@yeralin.net >>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net >>> <mailto:d...@yeralin.net>>> >>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I hope everyone had a great long weekend. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regarding Java interfaces, I may not understand you >>> correctly, but I think I already listed them: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> So for Produced, you would use it in the following fashion, >>> for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, >>> Serdes.Integer())) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization >>> Strategy” where I describe our logic of conditional serialization based on >>> the type of an inner serde. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler < >>> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io >>> <mailto:j...@confluent.io>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for the update, Daniyar! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In addition to specifying the config interface, can you >>> also specify >>>>>>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance >>> of this >>>>>>>>>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized, >>> etc., what >>>>>>>>>>>>>>>>> constructor(s) would I have available? Likewise with the >>> Serializer >>>>>>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the >>> implementation >>>>>>>>>>>>>>>>> logic, since we've already discussed it here. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If you also want to specify the serialized format of the >>> data records >>>>>>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as >>> letting us >>>>>>>>>>>>>>>>> verify the schema for forward/backward compatibility >>> concerns, etc. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> John >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development < >>> d...@yeralin.net <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto: >>> d...@yeralin.net>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Finally made updates to the KIP: >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>> >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization >>> < >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization >>>>>> >>>>>>>>>>>>>>>>> Sorry for the delay :) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank You! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax < >>> matth...@confluent.io <mailto:matth...@confluent.io> <mailto: >>> matth...@confluent.io <mailto:matth...@confluent.io>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes, something like this. I did not think about good >>> configuration >>>>>>>>>>>>>>>>> parameter names yet. I am also not sure if I understand all >>> proposed >>>>>>>>>>>>>>>>> configs atm. But all configs should be listed and explained >>> in the KIP >>>>>>>>>>>>>>>>> anyway, and we can discuss further after you have updated >>> the KIP (I can >>>>>>>>>>>>>>>>> ask more detailed question if I have any). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to >>> have in a list >>>>>>>>>>>>>>>>> of primitives. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As for the default constructor and configurability, just >>> want to make >>>>>>>>>>>>>>>>> sure. Is this what you have on your mind? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax < >>> matth...@confluent.io <mailto:matth...@confluent.io> <mailto: >>> matth...@confluent.io <mailto:matth...@confluent.io>> >>>>>>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io> >>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for the update! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and >>> `ListSerde` >>>>>>>>>>>>>>>>> should have an default constructor and it should be >>> possible to pass in >>>>>>>>>>>>>>>>> the `Class listClass` information via a configuration. >>> Otherwise, >>>>>>>>>>>>>>>>> KafkaStreams cannot use it as default serde. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not >>> primitive IMHO, >>>>>>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for >>> arrays, not for >>>>>>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping >>> `byte[]`). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in >>> the KIP >>>>>>>>>>>>>>>>> discussion. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Can you also update the KIP? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I made and pushed necessary commits, so we could review the >>> final >>>>>>>>>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592 >>> <https://github.com/apache/kafka/pull/6592> < >>> https://github.com/apache/kafka/pull/6592 < >>> https://github.com/apache/kafka/pull/6592>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I also need some advice on writing tests for this new >>> serde. So far I >>>>>>>>>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m >>> not sure >>>>>>>>>>>>>>>>> if it is enough. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank y’all for your help in this KIP :) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler < >>> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io >>> <mailto:j...@confluent.io>> >>>>>>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io> >>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey Daniyar, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Looks good to me! Thanks for considering it. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development < >>> d...@yeralin.net <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto: >>> d...@yeralin.net>> >>>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >>> d...@yeralin.net <mailto:d...@yeralin.net>>> <mailto:d...@yeralin.net >>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net >>> <mailto:d...@yeralin.net>>>> >>> wrote: >>>>>>>>>>>>>>>>> Hey John and Matthias, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant >>> information. >>>>>>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list >>> type. I >>>>>>>>>>>>>>>>> realized that’s the type is not really needed in >>> ListSerializer, but >>>>>>>>>>>>>>>>> only in ListDeserializer: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In ListSerializer we will start storing sizes only if >>> serializer is >>>>>>>>>>>>>>>>> not a primitive serializer: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so that >>> during >>>>>>>>>>>>>>>>> deserialization we could create an instance of it with >>> predefined >>>>>>>>>>>>>>>>> listSize for better performance. >>>>>>>>>>>>>>>>> We also try to locate a primitiveSize based on passed >>> deserializer. >>>>>>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which >>> means >>>>>>>>>>>>>>>>> that each entry’s size was encoded individually. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> This looks much cleaner and more concise. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax < >>> matth...@confluent.io <mailto:matth...@confluent.io> <mailto: >>> matth...@confluent.io <mailto:matth...@confluent.io>> >>>>>>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io> >>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>> <mailto: >>> matth...@confluent.io <mailto:matth...@confluent.io> <mailto: >>> matth...@confluent.io <mailto:matth...@confluent.io>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> For encoding the list-type: I see John's point about >>> re-encoding the >>>>>>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea >>> that the >>>>>>>>>>>>>>>>> Deserializer returns a fixed type... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Maybe it's best allow users to specify the target list type >>> on >>>>>>>>>>>>>>>>> deserialization via config? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Similar for the primitive types: I don't think we need to >>> encode the >>>>>>>>>>>>>>>>> type size, but users could specify the type on the >>> deserializer (via a >>>>>>>>>>>>>>>>> config again)? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I >>> doubt >>>>>>>>>>>>>>>>> we can >>>>>>>>>>>>>>>>> support this and a cast will be necessary at some point in >>> the user >>>>>>>>>>>>>>>>> code. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey Daniyar, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks for looking at it! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Something like your screenshot is more along the lines of >>> what I was >>>>>>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how >>> would that not >>>>>>>>>>>>>>>>> be "vanilla java"? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Unfortunately the deserializer needs more information, >>> though. For >>>>>>>>>>>>>>>>> example, what if the inner type is a Map<String,String>? >>> The serde >>>>>>>>>>>>>>>>> could >>>>>>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still >>> need an >>>>>>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> = >>> Serdes.listSerde( >>>>>>>>>>>>>>>>> /**list type**/ LinkedList.class, >>>>>>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde() >>>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> And in configuration, it's something like: >>>>>>>>>>>>>>>>> default.key.serde: org...ListSerde >>>>>>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList >>>>>>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development < >>> d...@yeralin.net <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto: >>> d...@yeralin.net>> >>>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >>> d...@yeralin.net <mailto:d...@yeralin.net>>> <mailto:d...@yeralin.net >>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net >>>>>> >>>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >>> d...@yeralin.net <mailto:d...@yeralin.net>> <mailto:d...@yeralin.net >>> <mailto: >>> d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>>> >>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hey John, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I gave read about TypeReference. It could work for the list >>> serde. >>>>>>>>>>>>>>>>> However, it is not directly >>>>>>>>>>>>>>>>> supported: >>>>>>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 < >>> https://github.com/FasterXML/jackson-databind/issues/1490> < >>> https://github.com/FasterXML/jackson-databind/issues/1490 < >>> https://github.com/FasterXML/jackson-databind/issues/1490>> >>>>>>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 >>> <https://github.com/FasterXML/jackson-databind/issues/1490> < >>> https://github.com/FasterXML/jackson-databind/issues/1490 < >>> https://github.com/FasterXML/jackson-databind/issues/1490>>> >>>>>>>>>>>>>>>>> The only way is to pass an actual class object into the >>> constructor, >>>>>>>>>>>>>>>>> something like: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It could be an option, but not a pretty one. What do you >>> think of my >>>>>>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As >>> described >>>>>>>>>>>>>>>>> previously) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <d...@yeralin.net >>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net >>> <mailto:d...@yeralin.net>> >>>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >>> d...@yeralin.net <mailto:d...@yeralin.net>>> <mailto:d...@yeralin.net >>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net >>>>>> >>>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >>> d...@yeralin.net <mailto:d...@yeralin.net>> <mailto:d...@yeralin.net >>> <mailto: >>> d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>>> >>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit >>> over >>>>>>>>>>>>>>>>> engineered :) >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since >>> he gave >>>>>>>>>>>>>>>>> me an idea about storing fixed/variable size entries. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler < >>> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io >>> <mailto:j...@confluent.io>> >>>>>>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io> >>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>> <mailto: >>> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io >>> <mailto:j...@confluent.io>>> >>>>>>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io> >>> <mailto:j...@confluent.io <mailto:j...@confluent.io>> <mailto: >>> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io >>> <mailto:j...@confluent.io>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Daniyar, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> That's a very clever solution! >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> One observation is that, now, this is what we might call a >>>>>>>>>>>>>>>>> polymorphic >>>>>>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type >>> and then >>>>>>>>>>>>>>>>> promising to produce the exact same concrete type on read. >>>>>>>>>>>>>>>>> There are >>>>>>>>>>>>>>>>> some inherent problems with this approach, which in general >>>>>>>>>>>>>>>>> require >>>>>>>>>>>>>>>>> some kind of schema registry (not necessarily Schema >>>>>>>>>>>>>>>>> Registry, just >>>>>>>>>>>>>>>>> any registry for schemas) to solve. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Notice that every serialized record has quite a bit of >>> duplicated >>>>>>>>>>>>>>>>> information: the concrete type as well as a byte to indicate >>>>>>>>>>>>>>>>> whether >>>>>>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to >>>>>>>>>>>>>>>>> indicate the >>>>>>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because >>> they >>>>>>>>>>>>>>>>> tell us >>>>>>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately, >>> this >>>>>>>>>>>>>>>>> information is completely redundant. In all likelihood, the >>>>>>>>>>>>>>>>> information will be exactly the same for every record in the >>>>>>>>>>>>>>>>> topic. >>>>>>>>>>>>>>>>> This problem is essentially the core motivation for >>> serializations >>>>>>>>>>>>>>>>> like Avro: to move the schema outside of the serialization >>>>>>>>>>>>>>>>> itself, so >>>>>>>>>>>>>>>>> that the records won't contain so much redundant >>> information. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back to >>>>>>>>>>>>>>>>> something >>>>>>>>>>>>>>>>> like what you had earlier in which you don't support >>> perfectly >>>>>>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead >>> just >>>>>>>>>>>>>>>>> support deserializing to _some_ List. Then, you could defer >>> full, >>>>>>>>>>>>>>>>> perfect, type preservation to serdes that have an external >>>>>>>>>>>>>>>>> system in >>>>>>>>>>>>>>>>> which to register their type information. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There does exist an alternative, if we really do want to >>>>>>>>>>>>>>>>> preserve the >>>>>>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add a >>>>>>>>>>>>>>>>> configuration option specifically for the serde to configure >>>>>>>>>>>>>>>>> what the >>>>>>>>>>>>>>>>> list type will be, and maybe what the element type is, as >>> well. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> As far as "related work" goes, you might be interested to >>> take >>>>>>>>>>>>>>>>> a look >>>>>>>>>>>>>>>>> at how Jackson can be configured to deserialize into a >>> specific, >>>>>>>>>>>>>>>>> arbitrarily nested, generically parameterized class >>> structure. >>>>>>>>>>>>>>>>> Specifically, you might find >>>>>>>>>>>>>>>>> >>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>> < >>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> >>> < >>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>> < >>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>>>> >>>>>>>>>>>>>>>>> < >>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>> < >>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> >>> < >>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>> < >>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>>>>> >>>>>>>>>>>>>>>>> interesting. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development < >>> d...@yeralin.net <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto: >>> d...@yeralin.net>> >>>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >>> d...@yeralin.net <mailto:d...@yeralin.net>>> <mailto:d...@yeralin.net >>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net >>>>>> >>>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >>> d...@yeralin.net <mailto:d...@yeralin.net>> <mailto:d...@yeralin.net >>> <mailto: >>> d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>>> >>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> bump >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >> >