>> Just to make sure I understand the problem you're highlighting: >> I guess the difference is that the serializer and deserializer that are >> nested inside the serde also need to be configured? So, by default I'd have >> to specify all six configs when I'm using Streams?
That is not the problem. And you actually describe the solution for it yourself: >> I guess in the Serde, it could make use of a package-protected constructor >> for the serializer and deserializer that fixes the list type and inner type >> to the serde-configured ones. Then, when you're configuring Streams, you >> only need to specify the StreamsConfigs. The problem is, that `ListSerde` is in package `clients` and thus `ListSerde` cannot access `StreamsConfig`, and hence cannot use `StreamsConfig#DEFAULT_LIST_KEY_SERDE_TYPE` (and others). Therefore, we either need to hard-code strings literal for the config names (what does not sound right) or add `CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE` (and others). In StreamsConfig we would just redefine them for convenience: > public static final String DEFAULT_LIST_KEY_SERDE_TYPE = > CommonClientConfig#DEFAULT_LIST_KEY_SERDE_TYPE; Note that `TimeWindowSerde` is contained in `streams` package and thus it can access `StreamsConfig` and `StreamsConfig#DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS`. Btw: I just realized that we actually don't need `ProducerConfig` > list.key/value.serializer.type because the list-type is irrelevant on write. We only need `inner` config. -Matthias On 7/23/19 1:30 PM, John Roesler wrote: > Hmm, that's a tricky situation. > > I think Daniyar was on the right track... Producer only cares about > serializer configs, and Consumer only cares about deserializer configs. > > I didn't see the problem with your proposal: > > ProducerConfig: >> list.key/value.serializer.type >> list.key/value.serializer.inner >> ConsumerConfig: >> list.key/value.deserializer.type >> list.key/value.deserializer.inner >> StreamsConfig: >> default.list.key/value.serde.type >> default.list.key/value.serde.inner > > > It seems like the key/value serde configs are a better analogy than the > windowed serde. > ProducerConfig: key.serializer > ConsumerConfig: key.deserializer > StreamsConfig: default.key.serde > > Just to make sure I understand the problem you're highlighting: > I guess the difference is that the serializer and deserializer that are > nested inside the serde also need to be configured? So, by default I'd have > to specify all six configs when I'm using Streams? > > I guess in the Serde, it could make use of a package-protected constructor > for the serializer and deserializer that fixes the list type and inner type > to the serde-configured ones. Then, when you're configuring Streams, you > only need to specify the StreamsConfigs. > > Does that work? > -John > > > On Tue, Jul 23, 2019 at 11:39 AM Development <d...@yeralin.net> wrote: > >> Bump >> >>> On Jul 22, 2019, at 11:26 AM, Development <d...@yeralin.net> wrote: >>> >>> Hey Matthias, >>> >>> It looks a little confusing, but I don’t have enough expertise to judge >> on the configuration placement. >>> >>> If you think, it is fine I’ll go ahead with this approach. >>> >>> Best, >>> Daniyar Yeralin >>> >>>> On Jul 19, 2019, at 5:49 PM, Matthias J. Sax <matth...@confluent.io> >> wrote: >>>> >>>> Good point. >>>> >>>> I guess the simplest solution is, to actually add >>>> >>>>>> default.list.key/value.serde.type >>>>>> default.list.key/value.serde.inner >>>> >>>> to both `CommonClientConfigs` and `StreamsConfig`. >>>> >>>> It's not super clean, but I think it's the best we can do. Thoughts? >>>> >>>> >>>> -Matthias >>>> >>>> On 7/19/19 1:23 PM, Development wrote: >>>>> Hi Matthias, >>>>> >>>>> I agree, ConsumerConfig did not seem like a right place for these >> configurations. >>>>> I’ll put them in ProducerConfig, ConsumerConfig, and StreamsConfig. >>>>> >>>>> However, I have a question. What should I do in "configure(Map<String, >> ?> configs, boolean isKey)” methods? Which configurations should I try to >> locate? I was comparing my (de)serializer implementations with >> SessionWindows(De)serializer classes, and they use StreamsConfig class to >> get either StreamsConfig.DEFAULT_WINDOWED_KEY_SERDE_INNER_CLASS : >> StreamsConfig.DEFAULT_WINDOWED_VALUE_SERDE_INNER_CLASS >>>>> >>>>> In my case, as I mentioned earlier, StreamsConfig class is not >> accessible from org.apache.kafka.common.serialization package. So, I can’t >> utilize it. Any suggestions here? >>>>> >>>>> Best, >>>>> Daniyar Yeralin >>>>> >>>>> >>>>>> On Jul 18, 2019, at 8:46 PM, Matthias J. Sax <matth...@confluent.io> >> wrote: >>>>>> >>>>>> Thanks! >>>>>> >>>>>> One minor question about the configs. The KIP adds three classes, a >>>>>> Serializer, a Deserializer, and a Serde. >>>>>> >>>>>> Hence, would it make sense to add the corresponding configs to >>>>>> `ConsumerConfig`, `ProducerConfig`, and `StreamsConfig` using slightly >>>>>> different names each time? >>>>>> >>>>>> >>>>>> Somethin like this: >>>>>> >>>>>> ProducerConfig: >>>>>> >>>>>> list.key/value.serializer.type >>>>>> list.key/value.serializer.inner >>>>>> >>>>>> ConsumerConfig: >>>>>> >>>>>> list.key/value.deserializer.type >>>>>> list.key/value.deserializer.inner >>>>>> >>>>>> StreamsConfig: >>>>>> >>>>>> default.list.key/value.serde.type >>>>>> default.list.key/value.serde.inner >>>>>> >>>>>> >>>>>> Adding `d.l.k/v.serde.t/i` to `CommonClientConfigs does not sound >> right >>>>>> to me. Also note, that it seems better to avoid the `default.` prefix >>>>>> for consumers and producers because there is only one Serializer or >>>>>> Deserializer anyway. Only for Streams, there are multiple and >>>>>> StreamsConfig specifies the default one of an operator does not >>>>>> overwrite it. >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> >>>>>> Also, the KIP should explicitly mention to what classed certain >> configs >>>>>> are added. Atm, the KIP only list parameter names, but does not state >>>>>> where those are added. >>>>>> >>>>>> >>>>>> -Matthias >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> On 7/16/19 1:11 PM, Development wrote: >>>>>>> Hi, >>>>>>> >>>>>>> Yes, totally forgot about the statement. KIP-466 is updated. >>>>>>> >>>>>>> Thank you so much John Roesler, Matthias J. Sax, Sophie Blee-Goldman >> for your valuable input! >>>>>>> >>>>>>> I hope I did not cause too much trouble :) >>>>>>> >>>>>>> I’ll start the vote now. >>>>>>> >>>>>>> Best, >>>>>>> Daniyar Yeralin >>>>>>> >>>>>>>> On Jul 16, 2019, at 3:17 PM, John Roesler <j...@confluent.io> >> wrote: >>>>>>>> >>>>>>>> Hi Daniyar, >>>>>>>> >>>>>>>> Thanks for that update. I took a look, and I think this is in good >> shape. >>>>>>>> >>>>>>>> One note, the statement "New method public static <T> Serde<List<T>> >>>>>>>> ListSerde() in org.apache.kafka.common.serialization.Serdes class >>>>>>>> (infers list implementation and inner serde from config file)" is >>>>>>>> still present in the KIP, although I do it is was removed from the >> PR. >>>>>>>> >>>>>>>> Once you remove that statement from the KIP, then I think this KIP >> is >>>>>>>> ready to go up for a vote! Then, we can really review the PR in >>>>>>>> earnest and get this thing merged. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> -john >>>>>>>> >>>>>>>> On Tue, Jul 16, 2019 at 2:05 PM Development <d...@yeralin.net> >> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Pushed new changes under my PR: >> https://github.com/apache/kafka/pull/6592 < >> https://github.com/apache/kafka/pull/6592> >>>>>>>>> >>>>>>>>> Feel free to put any comments in there. >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> Daniyar Yeralin >>>>>>>>> >>>>>>>>>> On Jul 15, 2019, at 1:06 PM, Development <d...@yeralin.net> wrote: >>>>>>>>>> >>>>>>>>>> Hi John, >>>>>>>>>> >>>>>>>>>> I knew I was missing something. Yes, that makes sense now, I >> removed all `listSerde()` methods, and left empty constructors instead. >>>>>>>>>> >>>>>>>>>> As per `CommonClientConfigs` I looked at the class, it doesn’t >> have any properties related to serdes, and that bothers me a little. >>>>>>>>>> >>>>>>>>>> All properties like `default.key.serde` >> `default.windowed.key.serde.*` are located in StreamsConfig. I don’t want >> to create a confusion. >>>>>>>>>> What also doesn’t make sense to me is that `WindowedSerdes` and >> its (de)serializers are not located in >> org.apache.kafka.common.serialization. I guess it kind of makes sense since >> windowed serdes are only available for kafka streams, not vice versa. >>>>>>>>>> >>>>>>>>>> If everyone is okay to put list properties in >> `CommonClientConfigs` class, I’ll go ahead and do that then. >>>>>>>>>> >>>>>>>>>> Thank you for your input! >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Daniyar Yeralin >>>>>>>>>> >>>>>>>>>>> On Jul 15, 2019, at 11:45 AM, John Roesler <j...@confluent.io> >> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi all, >>>>>>>>>>> >>>>>>>>>>> Regarding the placement, you might as well move the constants to >> `org.apache.kafka.clients.CommonClientConfigs`, so that the constants and >> the configs and the code are in the same module. >>>>>>>>>>> >>>>>>>>>>> Regarding the constructor... What Matthias said is correct: The >> serde, serializer, and deserializer all need to have zero-arg constructors >> so they can be instantiated reflectively by Kafka. However, the factory >> method you proposed "New method public static <T> Serde<List<T>> >> ListSerde()" is not a constructor, and is not required. It would be used >> purely from the Java interface, but has the drawbacks I listed above. This >> method, not the constructor, is what I proposed to remove from the KIP. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> -John >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Jul 15, 2019 at 10:15 AM Development <d...@yeralin.net >> <mailto:d...@yeralin.net>> wrote: >>>>>>>>>>> One problem though. >>>>>>>>>>> >>>>>>>>>>> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m >> trying to mimic the implementation of my ListSerde accordingly. >>>>>>>>>>> >>>>>>>>>>> I created couple constants under StreamsConfig: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> And trying to do similar construct: >>>>>>>>>>> final String propertyName = isKey ? >> StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : >> StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS; >>>>>>>>>>> But then found out that StreamsConfig is not accessible from >> org.apache.kafka.common.serialization package while window serde >> (de)serializers are located under org.apache.kafka.streams.kstream package. >>>>>>>>>>> >>>>>>>>>>> What should I do? Should I move my classes under >> org.apache.kafka.streams.kstream package instead? >>>>>>>>>>> >>>>>>>>>>>> On Jul 15, 2019, at 10:45 AM, Development <d...@yeralin.net >> <mailto:d...@yeralin.net>> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Hi Matthias, >>>>>>>>>>>> >>>>>>>>>>>> Thank you for your input. >>>>>>>>>>>> >>>>>>>>>>>> I updated the KIP, made it a little more readable. >>>>>>>>>>>> >>>>>>>>>>>> I think the configuration parameters strategy is finalized then. >>>>>>>>>>>> >>>>>>>>>>>> Do you have any other questions/concerns regarding this KIP? >>>>>>>>>>>> >>>>>>>>>>>> Meanwhile I’ll start doing appropriate code changes, and commit >> them under my PR. >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>> >>>>>>>>>>>>> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax < >> matth...@confluent.io <mailto:matth...@confluent.io>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Daniyar, >>>>>>>>>>>>> >>>>>>>>>>>>> thanks for the update to the KIP. It's in really good shape >> and well >>>>>>>>>>>>> written. >>>>>>>>>>>>> >>>>>>>>>>>>> About the default constructor question: >>>>>>>>>>>>> >>>>>>>>>>>>> All Serdes/Serializer/Deserializer classes need a default >> constructor to >>>>>>>>>>>>> create them easily via reflections when specifies in a config. >> I >>>>>>>>>>>>> understand that it is not super user friendly, but all >> existing code >>>>>>>>>>>>> works this way. Hence, it seems best to stick with the >> established pattern. >>>>>>>>>>>>> >>>>>>>>>>>>> We have a similar issue with `TimeWindowedSerde` and >>>>>>>>>>>>> `SessionWindowedSerde`, and I just recently did a PR to >> improve user >>>>>>>>>>>>> experience that address the exact issue John raised. (cf >>>>>>>>>>>>> https://github.com/apache/kafka/pull/7067 < >> https://github.com/apache/kafka/pull/7067>) >>>>>>>>>>>>> >>>>>>>>>>>>> Note, that if a user would instantiate the Serde manually, the >> user >>>>>>>>>>>>> would also need to call `configure()` to setup the inner >> serdes. Kafka >>>>>>>>>>>>> Streams would not setup those automatically and one might most >> likely >>>>>>>>>>>>> end-up with an NPE. >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Coming back the KIP, and the parameter names. `WindowedSerdes` >> are >>>>>>>>>>>>> similar to `ListSerde` as they wrap another Serde. For >> `WindowedSerdes`, >>>>>>>>>>>>> we use the following parameter names: >>>>>>>>>>>>> >>>>>>>>>>>>> - default.windowed.key.serde.inner >>>>>>>>>>>>> - default.windowed.value.serde.inner >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> It might be good to align the naming pattern. I would also >> suggest to >>>>>>>>>>>>> use `type` instead of `impl`? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> default.key.list.serde.impl -> default.list.key.serde.type >>>>>>>>>>>>> default.value.list.serde.impl -> >> default.list.value.serde.type >>>>>>>>>>>>> default.key.list.serde.element -> >> default.list.key.serde.inner >>>>>>>>>>>>> default.value.list.serde.element -> >> default.list.value.serde.inner >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> -Matthias >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On 7/10/19 8:52 AM, Development wrote: >>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yes, I do agree. That totally makes sense. The only thing is >> that it goes against what Matthias suggested earlier: >>>>>>>>>>>>>> "I think that ... `ListSerde` should have an default >> constructor and it should be possible to pass in the `Class listClass` >> information via a configuration. Otherwise, KafkaStreams cannot use it as >> default serde.” >>>>>>>>>>>>>> >>>>>>>>>>>>>> What do you think about that? I hope I’m not confusing >> anything. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best, >>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Jul 9, 2019, at 5:56 PM, John Roesler <j...@confluent.io >> <mailto:j...@confluent.io>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Ah, my apologies, I must have just overlooked it. Thanks for >> the update, too. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Just one more super-small question, do we need this variant: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> New method public static <T> Serde<List<T>> ListSerde() in >> org.apache.kafka.common.serialization.Serdes class (infers list >> implementation and inner serde from config file) >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> It seems like this situation implies my config file is >> already set up for the list serde, so passing this serde (e.g., in >> Produced) would have the same effect as not specifying it. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I guess that it could be the case that you have the >> `default.key/value.serde` set to something else, like StringSerde, but you >> still have the `default.key/value.list.serde.impl/element` set. This seems >> like it would result in more confusion than convenience, so my gut instinct >> is maybe we shouldn't introduce the `ListSerde()` variant until people >> actually request it later on. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thus, we'd just stick with fully config-driven or fully >> source-code-driven, not half/half. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Jul 9, 2019 at 9:58 AM Development <d...@yeralin.net >> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net >> <mailto:d...@yeralin.net>>> >> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I hope everyone had a great long weekend. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Regarding Java interfaces, I may not understand you >> correctly, but I think I already listed them: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> So for Produced, you would use it in the following fashion, >> for example: Produced.keySerde(Serdes.ListSerde(ArrayList.class, >> Serdes.Integer())) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I also updated the KIP, and added a section “Serialization >> Strategy” where I describe our logic of conditional serialization based on >> the type of an inner serde. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jun 26, 2019, at 11:44 AM, John Roesler < >> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io >> <mailto:j...@confluent.io>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for the update, Daniyar! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In addition to specifying the config interface, can you >> also specify >>>>>>>>>>>>>>>> the Java interface? Namely, if I need to pass an instance >> of this >>>>>>>>>>>>>>>> serde in to the DSL directly, as in Produced, Materialized, >> etc., what >>>>>>>>>>>>>>>> constructor(s) would I have available? Likewise with the >> Serializer >>>>>>>>>>>>>>>> and Deserailizer. I don't think you need to specify the >> implementation >>>>>>>>>>>>>>>> logic, since we've already discussed it here. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> If you also want to specify the serialized format of the >> data records >>>>>>>>>>>>>>>> in the KIP, it could be useful documentation, as well as >> letting us >>>>>>>>>>>>>>>> verify the schema for forward/backward compatibility >> concerns, etc. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> John >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Wed, Jun 26, 2019 at 10:33 AM Development < >> d...@yeralin.net <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto: >> d...@yeralin.net>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hey, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Finally made updates to the KIP: >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>> >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization >> < >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization >>>>> >>>>>>>>>>>>>>>> Sorry for the delay :) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank You! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax < >> matth...@confluent.io <mailto:matth...@confluent.io> <mailto: >> matth...@confluent.io <mailto:matth...@confluent.io>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes, something like this. I did not think about good >> configuration >>>>>>>>>>>>>>>> parameter names yet. I am also not sure if I understand all >> proposed >>>>>>>>>>>>>>>> configs atm. But all configs should be listed and explained >> in the KIP >>>>>>>>>>>>>>>> anyway, and we can discuss further after you have updated >> the KIP (I can >>>>>>>>>>>>>>>> ask more detailed question if I have any). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 6/21/19 2:05 PM, Development wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes, you are right. ByteSerializer is not what I need to >> have in a list >>>>>>>>>>>>>>>> of primitives. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> As for the default constructor and configurability, just >> want to make >>>>>>>>>>>>>>>> sure. Is this what you have on your mind? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax < >> matth...@confluent.io <mailto:matth...@confluent.io> <mailto: >> matth...@confluent.io <mailto:matth...@confluent.io>> >>>>>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io> >> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for the update! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I think that `ListDeserializer`, `ListSerializer`, and >> `ListSerde` >>>>>>>>>>>>>>>> should have an default constructor and it should be >> possible to pass in >>>>>>>>>>>>>>>> the `Class listClass` information via a configuration. >> Otherwise, >>>>>>>>>>>>>>>> KafkaStreams cannot use it as default serde. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For the primitive serializers: `BytesSerializer` is not >> primitive IMHO, >>>>>>>>>>>>>>>> as is it for `byte[]` with variable length -- it's for >> arrays, not for >>>>>>>>>>>>>>>> single `byte` (note, that `Bytes` is a Kafka class wrapping >> `byte[]`). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For tests, we can comment on the PR. No need to do this in >> the KIP >>>>>>>>>>>>>>>> discussion. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Can you also update the KIP? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 6/21/19 11:29 AM, Development wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I made and pushed necessary commits, so we could review the >> final >>>>>>>>>>>>>>>> version under PR https://github.com/apache/kafka/pull/6592 >> <https://github.com/apache/kafka/pull/6592> < >> https://github.com/apache/kafka/pull/6592 < >> https://github.com/apache/kafka/pull/6592>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I also need some advice on writing tests for this new >> serde. So far I >>>>>>>>>>>>>>>> only have two test cases (roundtrip and empty payload), I’m >> not sure >>>>>>>>>>>>>>>> if it is enough. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank y’all for your help in this KIP :) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jun 21, 2019, at 1:44 PM, John Roesler < >> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io >> <mailto:j...@confluent.io>> >>>>>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io> >> <mailto:j...@confluent.io <mailto:j...@confluent.io>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hey Daniyar, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Looks good to me! Thanks for considering it. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:04 AM Development < >> d...@yeralin.net <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto: >> d...@yeralin.net>> >>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >> d...@yeralin.net <mailto:d...@yeralin.net>>> <mailto:d...@yeralin.net >> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net >> <mailto:d...@yeralin.net>>>> >> wrote: >>>>>>>>>>>>>>>> Hey John and Matthias, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Yes, now I see it all. I’m storing lots of redundant >> information. >>>>>>>>>>>>>>>> Here is my final idea. Yes, now a user should pass a list >> type. I >>>>>>>>>>>>>>>> realized that’s the type is not really needed in >> ListSerializer, but >>>>>>>>>>>>>>>> only in ListDeserializer: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In ListSerializer we will start storing sizes only if >> serializer is >>>>>>>>>>>>>>>> not a primitive serializer: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Then, in deserializer, we persist passed list type, so that >> during >>>>>>>>>>>>>>>> deserialization we could create an instance of it with >> predefined >>>>>>>>>>>>>>>> listSize for better performance. >>>>>>>>>>>>>>>> We also try to locate a primitiveSize based on passed >> deserializer. >>>>>>>>>>>>>>>> If it is not there, then primitiveSize will be null. Which >> means >>>>>>>>>>>>>>>> that each entry’s size was encoded individually. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This looks much cleaner and more concise. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax < >> matth...@confluent.io <mailto:matth...@confluent.io> <mailto: >> matth...@confluent.io <mailto:matth...@confluent.io>> >>>>>>>>>>>>>>>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io> >> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>> <mailto: >> matth...@confluent.io <mailto:matth...@confluent.io> <mailto: >> matth...@confluent.io <mailto:matth...@confluent.io>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> For encoding the list-type: I see John's point about >> re-encoding the >>>>>>>>>>>>>>>> list-type redundantly. However, I also don't like the idea >> that the >>>>>>>>>>>>>>>> Deserializer returns a fixed type... >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Maybe it's best allow users to specify the target list type >> on >>>>>>>>>>>>>>>> deserialization via config? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Similar for the primitive types: I don't think we need to >> encode the >>>>>>>>>>>>>>>> type size, but users could specify the type on the >> deserializer (via a >>>>>>>>>>>>>>>> config again)? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> About generics: nesting could be arbitrarily deep. Hence, I >> doubt >>>>>>>>>>>>>>>> we can >>>>>>>>>>>>>>>> support this and a cast will be necessary at some point in >> the user >>>>>>>>>>>>>>>> code. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On 6/20/19 1:21 PM, John Roesler wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hey Daniyar, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for looking at it! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Something like your screenshot is more along the lines of >> what I was >>>>>>>>>>>>>>>> thinking. Sorry, but I didn't follow what you mean, how >> would that not >>>>>>>>>>>>>>>> be "vanilla java"? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Unfortunately the deserializer needs more information, >> though. For >>>>>>>>>>>>>>>> example, what if the inner type is a Map<String,String>? >> The serde >>>>>>>>>>>>>>>> could >>>>>>>>>>>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still >> need an >>>>>>>>>>>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde). >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Something more like Serde<LinkedList<MyRecord>> = >> Serdes.listSerde( >>>>>>>>>>>>>>>> /**list type**/ LinkedList.class, >>>>>>>>>>>>>>>> /**inner serde**/ new MyRecordSerde() >>>>>>>>>>>>>>>> ) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> And in configuration, it's something like: >>>>>>>>>>>>>>>> default.key.serde: org...ListSerde >>>>>>>>>>>>>>>> default.key.list.serde.type: java.util.LinkedList >>>>>>>>>>>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> What do you think? >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development < >> d...@yeralin.net <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto: >> d...@yeralin.net>> >>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >> d...@yeralin.net <mailto:d...@yeralin.net>>> <mailto:d...@yeralin.net >> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net >>>>> >>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >> d...@yeralin.net <mailto:d...@yeralin.net>> <mailto:d...@yeralin.net <mailto: >> d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>>> >> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hey John, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I gave read about TypeReference. It could work for the list >> serde. >>>>>>>>>>>>>>>> However, it is not directly >>>>>>>>>>>>>>>> supported: >>>>>>>>>>>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 < >> https://github.com/FasterXML/jackson-databind/issues/1490> < >> https://github.com/FasterXML/jackson-databind/issues/1490 < >> https://github.com/FasterXML/jackson-databind/issues/1490>> >>>>>>>>>>>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490 >> <https://github.com/FasterXML/jackson-databind/issues/1490> < >> https://github.com/FasterXML/jackson-databind/issues/1490 < >> https://github.com/FasterXML/jackson-databind/issues/1490>>> >>>>>>>>>>>>>>>> The only way is to pass an actual class object into the >> constructor, >>>>>>>>>>>>>>>> something like: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> It could be an option, but not a pretty one. What do you >> think of my >>>>>>>>>>>>>>>> approach to use vanilla java and canonical class name? (As >> described >>>>>>>>>>>>>>>> previously) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <d...@yeralin.net >> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> >>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >> d...@yeralin.net <mailto:d...@yeralin.net>>> <mailto:d...@yeralin.net >> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net >>>>> >>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >> d...@yeralin.net <mailto:d...@yeralin.net>> <mailto:d...@yeralin.net <mailto: >> d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>>> >> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thank you for your input! Yes, my idea looks a little bit >> over >>>>>>>>>>>>>>>> engineered :) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I also wanted to see a feedback from Mathias as well since >> he gave >>>>>>>>>>>>>>>> me an idea about storing fixed/variable size entries. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>>> Daniyar Yeralin >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler < >> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io >> <mailto:j...@confluent.io>> >>>>>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io> >> <mailto:j...@confluent.io <mailto:j...@confluent.io>>> <mailto: >> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io >> <mailto:j...@confluent.io>>> >>>>>>>>>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io> >> <mailto:j...@confluent.io <mailto:j...@confluent.io>> <mailto: >> j...@confluent.io <mailto:j...@confluent.io> <mailto:j...@confluent.io >> <mailto:j...@confluent.io>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hi Daniyar, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> That's a very clever solution! >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> One observation is that, now, this is what we might call a >>>>>>>>>>>>>>>> polymorphic >>>>>>>>>>>>>>>> serde. That is, you're detecting the actual concrete type >> and then >>>>>>>>>>>>>>>> promising to produce the exact same concrete type on read. >>>>>>>>>>>>>>>> There are >>>>>>>>>>>>>>>> some inherent problems with this approach, which in general >>>>>>>>>>>>>>>> require >>>>>>>>>>>>>>>> some kind of schema registry (not necessarily Schema >>>>>>>>>>>>>>>> Registry, just >>>>>>>>>>>>>>>> any registry for schemas) to solve. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Notice that every serialized record has quite a bit of >> duplicated >>>>>>>>>>>>>>>> information: the concrete type as well as a byte to indicate >>>>>>>>>>>>>>>> whether >>>>>>>>>>>>>>>> the value type is a fixed size, and, if so, an integer to >>>>>>>>>>>>>>>> indicate the >>>>>>>>>>>>>>>> actual size. These constitute a schema, of sorts, because >> they >>>>>>>>>>>>>>>> tell us >>>>>>>>>>>>>>>> later how exactly to deserialize the data. Unfortunately, >> this >>>>>>>>>>>>>>>> information is completely redundant. In all likelihood, the >>>>>>>>>>>>>>>> information will be exactly the same for every record in the >>>>>>>>>>>>>>>> topic. >>>>>>>>>>>>>>>> This problem is essentially the core motivation for >> serializations >>>>>>>>>>>>>>>> like Avro: to move the schema outside of the serialization >>>>>>>>>>>>>>>> itself, so >>>>>>>>>>>>>>>> that the records won't contain so much redundant >> information. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> In this light, I'm wondering if it makes sense to go back to >>>>>>>>>>>>>>>> something >>>>>>>>>>>>>>>> like what you had earlier in which you don't support >> perfectly >>>>>>>>>>>>>>>> preserving the concrete type for _this_ serde, but instead >> just >>>>>>>>>>>>>>>> support deserializing to _some_ List. Then, you could defer >> full, >>>>>>>>>>>>>>>> perfect, type preservation to serdes that have an external >>>>>>>>>>>>>>>> system in >>>>>>>>>>>>>>>> which to register their type information. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> There does exist an alternative, if we really do want to >>>>>>>>>>>>>>>> preserve the >>>>>>>>>>>>>>>> concrete type (which does seem kind of nice). You can add a >>>>>>>>>>>>>>>> configuration option specifically for the serde to configure >>>>>>>>>>>>>>>> what the >>>>>>>>>>>>>>>> list type will be, and maybe what the element type is, as >> well. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> As far as "related work" goes, you might be interested to >> take >>>>>>>>>>>>>>>> a look >>>>>>>>>>>>>>>> at how Jackson can be configured to deserialize into a >> specific, >>>>>>>>>>>>>>>> arbitrarily nested, generically parameterized class >> structure. >>>>>>>>>>>>>>>> Specifically, you might find >>>>>>>>>>>>>>>> >> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >> < >> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> >> < >> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >> < >> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>>> >>>>>>>>>>>>>>>> < >> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >> < >> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> >> < >> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >> < >> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>>>> >>>>>>>>>>>>>>>> interesting. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development < >> d...@yeralin.net <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto: >> d...@yeralin.net>> >>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >> d...@yeralin.net <mailto:d...@yeralin.net>>> <mailto:d...@yeralin.net >> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net >>>>> >>>>>>>>>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto: >> d...@yeralin.net <mailto:d...@yeralin.net>> <mailto:d...@yeralin.net <mailto: >> d...@yeralin.net> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>>> >> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> bump >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >> >> >
signature.asc
Description: OpenPGP digital signature