Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

John Roesler Fri, 21 Jun 2019 10:46:04 -0700

Hey Daniyar,

Looks good to me! Thanks for considering it.


Thanks,
-John

On Fri, Jun 21, 2019 at 9:04 AM Development <d...@yeralin.net> wrote:

> Hey John and Matthias,
>
> Yes, now I see it all. I’m storing lots of redundant information.
> Here is my final idea. Yes, now a user should pass a list type. I realized
> that’s the type is not really needed in ListSerializer, but only in
> ListDeserializer:
>
> In ListSerializer we will start storing sizes only if serializer is not a
> primitive serializer:
>
> Then, in deserializer, we persist passed list type, so that during
> deserialization we could create an instance of it with predefined listSize
> for better performance.
> We also try to locate a primitiveSize based on passed deserializer. If it
> is not there, then primitiveSize will be null. Which means that each
> entry’s size was encoded individually.
>
> This looks much cleaner and more concise.
>
> What do you think?
>
> Best,
> Daniyar Yeralin
>
> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
> For encoding the list-type: I see John's point about re-encoding the
> list-type redundantly. However, I also don't like the idea that the
> Deserializer returns a fixed type...
>
> Maybe it's best allow users to specify the target list type on
> deserialization via config?
>
> Similar for the primitive types: I don't think we need to encode the
> type size, but users could specify the type on the deserializer (via a
> config again)?
>
>
> About generics: nesting could be arbitrarily deep. Hence, I doubt we can
> support this and a cast will be necessary at some point in the user code.
>
>
>
> -Matthias
>
>
>
> On 6/20/19 1:21 PM, John Roesler wrote:
>
> Hey Daniyar,
>
> Thanks for looking at it!
>
> Something like your screenshot is more along the lines of what I was
> thinking. Sorry, but I didn't follow what you mean, how would that not
> be "vanilla java"?
>
> Unfortunately the deserializer needs more information, though. For
> example, what if the inner type is a Map<String,String>? The serde could
> only be used to produce a LinkedList<Map>, thus, we'd still need an
> inner serde, like you have in the KIP (Serde<T> innerSerde).
>
> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
>   /**list type**/ LinkedList.class,
>   /**inner serde**/ new MyRecordSerde()
> )
>
> And in configuration, it's something like:
> default.key.serde: org...ListSerde
> default.key.list.serde.type: java.util.LinkedList
> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>
>
> What do you think?
> Thanks,
> -John
>
> On Thu, Jun 20, 2019 at 2:46 PM Development <d...@yeralin.net
> <mailto:d...@yeralin.net <d...@yeralin.net>>> wrote:
>
>    Hey John,
>
>    I gave read about TypeReference. It could work for the list serde.
>    However, it is not directly
>    supported: https://github.com/FasterXML/jackson-databind/issues/1490
>    The only way is to pass an actual class object into the constructor,
>    something like:
>
>    It could be an option, but not a pretty one. What do you think of my
>    approach to use vanilla java and canonical class name? (As described
>    previously)
>
>    Best,
>    Daniyar Yeralin
>
>    On Jun 20, 2019, at 2:45 PM, Development <d...@yeralin.net
>    <mailto:d...@yeralin.net <d...@yeralin.net>>> wrote:
>
>    Hi John,
>
>    Thank you for your input! Yes, my idea looks a little bit over
>    engineered :)
>
>    I also wanted to see a feedback from Mathias as well since he gave
>    me an idea about storing fixed/variable size entries.
>
>    Best,
>    Daniyar Yeralin
>
>    On Jun 18, 2019, at 6:06 PM, John Roesler <j...@confluent.io
>    <mailto:j...@confluent.io <j...@confluent.io>>> wrote:
>
>    Hi Daniyar,
>
>    That's a very clever solution!
>
>    One observation is that, now, this is what we might call a
>    polymorphic
>    serde. That is, you're detecting the actual concrete type and then
>    promising to produce the exact same concrete type on read. There are
>    some inherent problems with this approach, which in general require
>    some kind of  schema registry (not necessarily Schema Registry, just
>    any registry for schemas) to solve.
>
>    Notice that every serialized record has quite a bit of duplicated
>    information: the concrete type as well as a byte to indicate whether
>    the value type is a fixed size, and, if so, an integer to
>    indicate the
>    actual size. These constitute a schema, of sorts, because they
>    tell us
>    later how exactly to deserialize the data. Unfortunately, this
>    information is completely redundant. In all likelihood, the
>    information will be exactly the same for every record in the topic.
>    This problem is essentially the core motivation for serializations
>    like Avro: to move the schema outside of the serialization itself, so
>    that the records won't contain so much redundant information.
>
>    In this light, I'm wondering if it makes sense to go back to
>    something
>    like what you had earlier in which you don't support perfectly
>    preserving the concrete type for _this_ serde, but instead just
>    support deserializing to _some_ List. Then, you could defer full,
>    perfect, type preservation to serdes that have an external system in
>    which to register their type information.
>
>    There does exist an alternative, if we really do want to preserve the
>    concrete type (which does seem kind of nice). You can add a
>    configuration option specifically for the serde to configure what the
>    list type will be, and maybe what the element type is, as well.
>
>    As far as "related work" goes, you might be interested to take a look
>    at how Jackson can be configured to deserialize into a specific,
>    arbitrarily nested, generically parameterized class structure.
>    Specifically, you might find
>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
>    interesting.
>
>    Thanks,
>    -John
>
>    On Mon, Jun 17, 2019 at 12:38 PM Development <d...@yeralin.net
>    <mailto:d...@yeralin.net <d...@yeralin.net>>> wrote:
>
>
>    bump
>
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Reply via email to