Hi John, Yes, I do agree. That totally makes sense. The only thing is that it goes against what Matthias suggested earlier: "I think that ... `ListSerde` should have an default constructor and it should be possible to pass in the `Class listClass` information via a configuration. Otherwise, KafkaStreams cannot use it as default serde.”
What do you think about that? I hope I’m not confusing anything. Best, Daniyar Yeralin > On Jul 9, 2019, at 5:56 PM, John Roesler <j...@confluent.io> wrote: > > Ah, my apologies, I must have just overlooked it. Thanks for the update, too. > > Just one more super-small question, do we need this variant: > > > New method public static <T> Serde<List<T>> ListSerde() in > > org.apache.kafka.common.serialization.Serdes class (infers list > > implementation and inner serde from config file) > > It seems like this situation implies my config file is already set up for the > list serde, so passing this serde (e.g., in Produced) would have the same > effect as not specifying it. > > I guess that it could be the case that you have the `default.key/value.serde` > set to something else, like StringSerde, but you still have the > `default.key/value.list.serde.impl/element` set. This seems like it would > result in more confusion than convenience, so my gut instinct is maybe we > shouldn't introduce the `ListSerde()` variant until people actually request > it later on. > > Thus, we'd just stick with fully config-driven or fully source-code-driven, > not half/half. > > What do you think? > > Thanks, > -John > > > On Tue, Jul 9, 2019 at 9:58 AM Development <d...@yeralin.net > <mailto:d...@yeralin.net>> wrote: > > > > Hi John, > > > > I hope everyone had a great long weekend. > > > > Regarding Java interfaces, I may not understand you correctly, but I think > > I already listed them: > > > > So for Produced, you would use it in the following fashion, for example: > > Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer())) > > > > I also updated the KIP, and added a section “Serialization Strategy” where > > I describe our logic of conditional serialization based on the type of an > > inner serde. > > > > Thank you! > > > > Best, > > Daniyar Yeralin > > > > On Jun 26, 2019, at 11:44 AM, John Roesler <j...@confluent.io > > <mailto:j...@confluent.io>> wrote: > > > > Thanks for the update, Daniyar! > > > > In addition to specifying the config interface, can you also specify > > the Java interface? Namely, if I need to pass an instance of this > > serde in to the DSL directly, as in Produced, Materialized, etc., what > > constructor(s) would I have available? Likewise with the Serializer > > and Deserailizer. I don't think you need to specify the implementation > > logic, since we've already discussed it here. > > > > If you also want to specify the serialized format of the data records > > in the KIP, it could be useful documentation, as well as letting us > > verify the schema for forward/backward compatibility concerns, etc. > > > > Thanks, > > John > > > > On Wed, Jun 26, 2019 at 10:33 AM Development <d...@yeralin.net > > <mailto:d...@yeralin.net>> wrote: > > > > > > Hey, > > > > Finally made updates to the KIP: > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>> > > Sorry for the delay :) > > > > Thank You! > > > > Best, > > Daniyar Yeralin > > > > On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matth...@confluent.io > > <mailto:matth...@confluent.io>> wrote: > > > > Yes, something like this. I did not think about good configuration > > parameter names yet. I am also not sure if I understand all proposed > > configs atm. But all configs should be listed and explained in the KIP > > anyway, and we can discuss further after you have updated the KIP (I can > > ask more detailed question if I have any). > > > > > > -Matthias > > > > On 6/21/19 2:05 PM, Development wrote: > > > > Yes, you are right. ByteSerializer is not what I need to have in a list > > of primitives. > > > > As for the default constructor and configurability, just want to make > > sure. Is this what you have on your mind? > > > > Best, > > Daniyar Yeralin > > > > > > > > On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matth...@confluent.io > > <mailto:matth...@confluent.io> > > <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>> wrote: > > > > Thanks for the update! > > > > I think that `ListDeserializer`, `ListSerializer`, and `ListSerde` > > should have an default constructor and it should be possible to pass in > > the `Class listClass` information via a configuration. Otherwise, > > KafkaStreams cannot use it as default serde. > > > > > > For the primitive serializers: `BytesSerializer` is not primitive IMHO, > > as is it for `byte[]` with variable length -- it's for arrays, not for > > single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`). > > > > > > For tests, we can comment on the PR. No need to do this in the KIP > > discussion. > > > > > > Can you also update the KIP? > > > > > > > > -Matthias > > > > > > > > > > > > On 6/21/19 11:29 AM, Development wrote: > > > > I made and pushed necessary commits, so we could review the final > > version under PR https://github.com/apache/kafka/pull/6592 > > <https://github.com/apache/kafka/pull/6592> > > > > I also need some advice on writing tests for this new serde. So far I > > only have two test cases (roundtrip and empty payload), I’m not sure > > if it is enough. > > > > Thank y’all for your help in this KIP :) > > > > Best, > > Daniyar Yeralin > > > > > > On Jun 21, 2019, at 1:44 PM, John Roesler <j...@confluent.io > > <mailto:j...@confluent.io> > > <mailto:j...@confluent.io <mailto:j...@confluent.io>>> wrote: > > > > Hey Daniyar, > > > > Looks good to me! Thanks for considering it. > > > > Thanks, > > -John > > > > On Fri, Jun 21, 2019 at 9:04 AM Development <d...@yeralin.net > > <mailto:d...@yeralin.net> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> wrote: > > Hey John and Matthias, > > > > Yes, now I see it all. I’m storing lots of redundant information. > > Here is my final idea. Yes, now a user should pass a list type. I > > realized that’s the type is not really needed in ListSerializer, but > > only in ListDeserializer: > > > > > > In ListSerializer we will start storing sizes only if serializer is > > not a primitive serializer: > > > > > > Then, in deserializer, we persist passed list type, so that during > > deserialization we could create an instance of it with predefined > > listSize for better performance. > > We also try to locate a primitiveSize based on passed deserializer. > > If it is not there, then primitiveSize will be null. Which means > > that each entry’s size was encoded individually. > > > > > > This looks much cleaner and more concise. > > > > What do you think? > > > > Best, > > Daniyar Yeralin > > > > On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matth...@confluent.io > > <mailto:matth...@confluent.io> > > <mailto:matth...@confluent.io <mailto:matth...@confluent.io>> > > <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>> wrote: > > > > For encoding the list-type: I see John's point about re-encoding the > > list-type redundantly. However, I also don't like the idea that the > > Deserializer returns a fixed type... > > > > Maybe it's best allow users to specify the target list type on > > deserialization via config? > > > > Similar for the primitive types: I don't think we need to encode the > > type size, but users could specify the type on the deserializer (via a > > config again)? > > > > > > About generics: nesting could be arbitrarily deep. Hence, I doubt > > we can > > support this and a cast will be necessary at some point in the user > > code. > > > > > > > > -Matthias > > > > > > > > On 6/20/19 1:21 PM, John Roesler wrote: > > > > Hey Daniyar, > > > > Thanks for looking at it! > > > > Something like your screenshot is more along the lines of what I was > > thinking. Sorry, but I didn't follow what you mean, how would that not > > be "vanilla java"? > > > > Unfortunately the deserializer needs more information, though. For > > example, what if the inner type is a Map<String,String>? The serde > > could > > only be used to produce a LinkedList<Map>, thus, we'd still need an > > inner serde, like you have in the KIP (Serde<T> innerSerde). > > > > Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde( > > /**list type**/ LinkedList.class, > > /**inner serde**/ new MyRecordSerde() > > ) > > > > And in configuration, it's something like: > > default.key.serde: org...ListSerde > > default.key.list.serde.type: java.util.LinkedList > > default.key.list.serde.inner: com.mycompany.MyRecordSerde > > > > > > What do you think? > > Thanks, > > -John > > > > On Thu, Jun 20, 2019 at 2:46 PM Development <d...@yeralin.net > > <mailto:d...@yeralin.net> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto:d...@yeralin.net > > <mailto:d...@yeralin.net>>>> wrote: > > > > Hey John, > > > > I gave read about TypeReference. It could work for the list serde. > > However, it is not directly > > supported: > > https://github.com/FasterXML/jackson-databind/issues/1490 > > <https://github.com/FasterXML/jackson-databind/issues/1490> > > <https://github.com/FasterXML/jackson-databind/issues/1490 > > <https://github.com/FasterXML/jackson-databind/issues/1490>> > > The only way is to pass an actual class object into the constructor, > > something like: > > > > It could be an option, but not a pretty one. What do you think of my > > approach to use vanilla java and canonical class name? (As described > > previously) > > > > Best, > > Daniyar Yeralin > > > > On Jun 20, 2019, at 2:45 PM, Development <d...@yeralin.net > > <mailto:d...@yeralin.net> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>> wrote: > > > > Hi John, > > > > Thank you for your input! Yes, my idea looks a little bit over > > engineered :) > > > > I also wanted to see a feedback from Mathias as well since he gave > > me an idea about storing fixed/variable size entries. > > > > Best, > > Daniyar Yeralin > > > > On Jun 18, 2019, at 6:06 PM, John Roesler <j...@confluent.io > > <mailto:j...@confluent.io> > > <mailto:j...@confluent.io <mailto:j...@confluent.io>> > > <mailto:j...@confluent.io <mailto:j...@confluent.io>> > > <mailto:j...@confluent.io <mailto:j...@confluent.io> > > <mailto:j...@confluent.io <mailto:j...@confluent.io>>>> wrote: > > > > Hi Daniyar, > > > > That's a very clever solution! > > > > One observation is that, now, this is what we might call a > > polymorphic > > serde. That is, you're detecting the actual concrete type and then > > promising to produce the exact same concrete type on read. > > There are > > some inherent problems with this approach, which in general > > require > > some kind of schema registry (not necessarily Schema > > Registry, just > > any registry for schemas) to solve. > > > > Notice that every serialized record has quite a bit of duplicated > > information: the concrete type as well as a byte to indicate > > whether > > the value type is a fixed size, and, if so, an integer to > > indicate the > > actual size. These constitute a schema, of sorts, because they > > tell us > > later how exactly to deserialize the data. Unfortunately, this > > information is completely redundant. In all likelihood, the > > information will be exactly the same for every record in the > > topic. > > This problem is essentially the core motivation for serializations > > like Avro: to move the schema outside of the serialization > > itself, so > > that the records won't contain so much redundant information. > > > > In this light, I'm wondering if it makes sense to go back to > > something > > like what you had earlier in which you don't support perfectly > > preserving the concrete type for _this_ serde, but instead just > > support deserializing to _some_ List. Then, you could defer full, > > perfect, type preservation to serdes that have an external > > system in > > which to register their type information. > > > > There does exist an alternative, if we really do want to > > preserve the > > concrete type (which does seem kind of nice). You can add a > > configuration option specifically for the serde to configure > > what the > > list type will be, and maybe what the element type is, as well. > > > > As far as "related work" goes, you might be interested to take > > a look > > at how Jackson can be configured to deserialize into a specific, > > arbitrarily nested, generically parameterized class structure. > > Specifically, you might find > > > > https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html > > > > <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> > > <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html > > > > <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>> > > interesting. > > > > Thanks, > > -John > > > > On Mon, Jun 17, 2019 at 12:38 PM Development <d...@yeralin.net > > <mailto:d...@yeralin.net> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net> > > <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>> wrote: > > > > > > bump > > > > > > > > > > > > > > > >