Hey, Finally made updates to the KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization> Sorry for the delay :)
Thank You! Best, Daniyar Yeralin > On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matth...@confluent.io> wrote: > > Yes, something like this. I did not think about good configuration > parameter names yet. I am also not sure if I understand all proposed > configs atm. But all configs should be listed and explained in the KIP > anyway, and we can discuss further after you have updated the KIP (I can > ask more detailed question if I have any). > > > -Matthias > > On 6/21/19 2:05 PM, Development wrote: >> Yes, you are right. ByteSerializer is not what I need to have in a list >> of primitives. >> >> As for the default constructor and configurability, just want to make >> sure. Is this what you have on your mind? >> >> Best, >> Daniyar Yeralin >> >> >> >>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matth...@confluent.io >>> <mailto:matth...@confluent.io>> wrote: >>> >>> Thanks for the update! >>> >>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde` >>> should have an default constructor and it should be possible to pass in >>> the `Class listClass` information via a configuration. Otherwise, >>> KafkaStreams cannot use it as default serde. >>> >>> >>> For the primitive serializers: `BytesSerializer` is not primitive IMHO, >>> as is it for `byte[]` with variable length -- it's for arrays, not for >>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`). >>> >>> >>> For tests, we can comment on the PR. No need to do this in the KIP >>> discussion. >>> >>> >>> Can you also update the KIP? >>> >>> >>> >>> -Matthias >>> >>> >>> >>> >>> >>> On 6/21/19 11:29 AM, Development wrote: >>>> I made and pushed necessary commits, so we could review the final >>>> version under PR https://github.com/apache/kafka/pull/6592 >>>> >>>> I also need some advice on writing tests for this new serde. So far I >>>> only have two test cases (roundtrip and empty payload), I’m not sure >>>> if it is enough. >>>> >>>> Thank y’all for your help in this KIP :) >>>> >>>> Best, >>>> Daniyar Yeralin >>>> >>>> >>>>> On Jun 21, 2019, at 1:44 PM, John Roesler <j...@confluent.io >>>>> <mailto:j...@confluent.io>> wrote: >>>>> >>>>> Hey Daniyar, >>>>> >>>>> Looks good to me! Thanks for considering it. >>>>> >>>>> Thanks, >>>>> -John >>>>> >>>>> On Fri, Jun 21, 2019 at 9:04 AM Development <d...@yeralin.net >>>>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net>> wrote: >>>>> Hey John and Matthias, >>>>> >>>>> Yes, now I see it all. I’m storing lots of redundant information. >>>>> Here is my final idea. Yes, now a user should pass a list type. I >>>>> realized that’s the type is not really needed in ListSerializer, but >>>>> only in ListDeserializer: >>>>> >>>>> >>>>> In ListSerializer we will start storing sizes only if serializer is >>>>> not a primitive serializer: >>>>> >>>>> >>>>> Then, in deserializer, we persist passed list type, so that during >>>>> deserialization we could create an instance of it with predefined >>>>> listSize for better performance. >>>>> We also try to locate a primitiveSize based on passed deserializer. >>>>> If it is not there, then primitiveSize will be null. Which means >>>>> that each entry’s size was encoded individually. >>>>> >>>>> >>>>> This looks much cleaner and more concise. >>>>> >>>>> What do you think? >>>>> >>>>> Best, >>>>> Daniyar Yeralin >>>>> >>>>>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matth...@confluent.io >>>>>> <mailto:matth...@confluent.io> <mailto:matth...@confluent.io>> wrote: >>>>>> >>>>>> For encoding the list-type: I see John's point about re-encoding the >>>>>> list-type redundantly. However, I also don't like the idea that the >>>>>> Deserializer returns a fixed type... >>>>>> >>>>>> Maybe it's best allow users to specify the target list type on >>>>>> deserialization via config? >>>>>> >>>>>> Similar for the primitive types: I don't think we need to encode the >>>>>> type size, but users could specify the type on the deserializer (via a >>>>>> config again)? >>>>>> >>>>>> >>>>>> About generics: nesting could be arbitrarily deep. Hence, I doubt >>>>>> we can >>>>>> support this and a cast will be necessary at some point in the user >>>>>> code. >>>>>> >>>>>> >>>>>> >>>>>> -Matthias >>>>>> >>>>>> >>>>>> >>>>>> On 6/20/19 1:21 PM, John Roesler wrote: >>>>>>> Hey Daniyar, >>>>>>> >>>>>>> Thanks for looking at it! >>>>>>> >>>>>>> Something like your screenshot is more along the lines of what I was >>>>>>> thinking. Sorry, but I didn't follow what you mean, how would that not >>>>>>> be "vanilla java"? >>>>>>> >>>>>>> Unfortunately the deserializer needs more information, though. For >>>>>>> example, what if the inner type is a Map<String,String>? The serde >>>>>>> could >>>>>>> only be used to produce a LinkedList<Map>, thus, we'd still need an >>>>>>> inner serde, like you have in the KIP (Serde<T> innerSerde). >>>>>>> >>>>>>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde( >>>>>>> /**list type**/ LinkedList.class, >>>>>>> /**inner serde**/ new MyRecordSerde() >>>>>>> ) >>>>>>> >>>>>>> And in configuration, it's something like: >>>>>>> default.key.serde: org...ListSerde >>>>>>> default.key.list.serde.type: java.util.LinkedList >>>>>>> default.key.list.serde.inner: com.mycompany.MyRecordSerde >>>>>>> >>>>>>> >>>>>>> What do you think? >>>>>>> Thanks, >>>>>>> -John >>>>>>> >>>>>>> On Thu, Jun 20, 2019 at 2:46 PM Development <d...@yeralin.net >>>>>>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net> >>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> wrote: >>>>>>> >>>>>>> Hey John, >>>>>>> >>>>>>> I gave read about TypeReference. It could work for the list serde. >>>>>>> However, it is not directly >>>>>>> supported: >>>>>>> https://github.com/FasterXML/jackson-databind/issues/1490 >>>>>>> <https://github.com/FasterXML/jackson-databind/issues/1490> >>>>>>> The only way is to pass an actual class object into the constructor, >>>>>>> something like: >>>>>>> >>>>>>> It could be an option, but not a pretty one. What do you think of my >>>>>>> approach to use vanilla java and canonical class name? (As described >>>>>>> previously) >>>>>>> >>>>>>> Best, >>>>>>> Daniyar Yeralin >>>>>>> >>>>>>>> On Jun 20, 2019, at 2:45 PM, Development <d...@yeralin.net >>>>>>>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net> >>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> wrote: >>>>>>>> >>>>>>>> Hi John, >>>>>>>> >>>>>>>> Thank you for your input! Yes, my idea looks a little bit over >>>>>>>> engineered :) >>>>>>>> >>>>>>>> I also wanted to see a feedback from Mathias as well since he gave >>>>>>>> me an idea about storing fixed/variable size entries. >>>>>>>> >>>>>>>> Best, >>>>>>>> Daniyar Yeralin >>>>>>>> >>>>>>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <j...@confluent.io >>>>>>>>> <mailto:j...@confluent.io> <mailto:j...@confluent.io> >>>>>>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>> wrote: >>>>>>>>> >>>>>>>>> Hi Daniyar, >>>>>>>>> >>>>>>>>> That's a very clever solution! >>>>>>>>> >>>>>>>>> One observation is that, now, this is what we might call a >>>>>>>>> polymorphic >>>>>>>>> serde. That is, you're detecting the actual concrete type and then >>>>>>>>> promising to produce the exact same concrete type on read. >>>>>>>>> There are >>>>>>>>> some inherent problems with this approach, which in general >>>>>>>>> require >>>>>>>>> some kind of schema registry (not necessarily Schema >>>>>>>>> Registry, just >>>>>>>>> any registry for schemas) to solve. >>>>>>>>> >>>>>>>>> Notice that every serialized record has quite a bit of duplicated >>>>>>>>> information: the concrete type as well as a byte to indicate >>>>>>>>> whether >>>>>>>>> the value type is a fixed size, and, if so, an integer to >>>>>>>>> indicate the >>>>>>>>> actual size. These constitute a schema, of sorts, because they >>>>>>>>> tell us >>>>>>>>> later how exactly to deserialize the data. Unfortunately, this >>>>>>>>> information is completely redundant. In all likelihood, the >>>>>>>>> information will be exactly the same for every record in the >>>>>>>>> topic. >>>>>>>>> This problem is essentially the core motivation for serializations >>>>>>>>> like Avro: to move the schema outside of the serialization >>>>>>>>> itself, so >>>>>>>>> that the records won't contain so much redundant information. >>>>>>>>> >>>>>>>>> In this light, I'm wondering if it makes sense to go back to >>>>>>>>> something >>>>>>>>> like what you had earlier in which you don't support perfectly >>>>>>>>> preserving the concrete type for _this_ serde, but instead just >>>>>>>>> support deserializing to _some_ List. Then, you could defer full, >>>>>>>>> perfect, type preservation to serdes that have an external >>>>>>>>> system in >>>>>>>>> which to register their type information. >>>>>>>>> >>>>>>>>> There does exist an alternative, if we really do want to >>>>>>>>> preserve the >>>>>>>>> concrete type (which does seem kind of nice). You can add a >>>>>>>>> configuration option specifically for the serde to configure >>>>>>>>> what the >>>>>>>>> list type will be, and maybe what the element type is, as well. >>>>>>>>> >>>>>>>>> As far as "related work" goes, you might be interested to take >>>>>>>>> a look >>>>>>>>> at how Jackson can be configured to deserialize into a specific, >>>>>>>>> arbitrarily nested, generically parameterized class structure. >>>>>>>>> Specifically, you might find >>>>>>>>> >>>>>>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>>>>>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> >>>>>>>>> interesting. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> -John >>>>>>>>> >>>>>>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <d...@yeralin.net >>>>>>>>> <mailto:d...@yeralin.net> <mailto:d...@yeralin.net> >>>>>>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> wrote: >>>>>>>>>> >>>>>>>>>> bump >>>>> >>>> >>>> >>> >> >