Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Development Wed, 10 Jul 2019 08:52:43 -0700

Hi John,

Yes, I do agree. That totally makes sense. The only thing is that it goes 
against what Matthias suggested earlier:
"I think that ... `ListSerde` should have an default constructor and it should 
be possible to pass in the `Class listClass` information via a configuration. 
Otherwise, KafkaStreams cannot use it as default serde.”


What do you think about that? I hope I’m not confusing anything.

Best,
Daniyar Yeralin

> On Jul 9, 2019, at 5:56 PM, John Roesler <[email protected]> wrote:
> 
> Ah, my apologies, I must have just overlooked it. Thanks for the update, too.
> 
> Just one more super-small question, do we need this variant: 
> 
> > New method public static <T> Serde<List<T>> ListSerde() in 
> > org.apache.kafka.common.serialization.Serdes class (infers list 
> > implementation and inner serde from config file)
> 
> It seems like this situation implies my config file is already set up for the 
> list serde, so passing this serde (e.g., in Produced) would have the same 
> effect as not specifying it. 
> 
> I guess that it could be the case that you have the `default.key/value.serde` 
> set to something else, like StringSerde, but you still have the 
> `default.key/value.list.serde.impl/element` set. This seems like it would 
> result in more confusion than convenience, so my gut instinct is maybe we 
> shouldn't introduce the `ListSerde()` variant until people actually request 
> it later on.
> 
> Thus, we'd just stick with fully config-driven or fully source-code-driven, 
> not half/half.
> 
> What do you think?
> 
> Thanks,
> -John
> 
> 
> On Tue, Jul 9, 2019 at 9:58 AM Development <[email protected] 
> <mailto:[email protected]>> wrote:
> >
> > Hi John,
> >
> > I hope everyone had a great long weekend.
> >
> > Regarding Java interfaces, I may not understand you correctly, but I think 
> > I already listed them:
> >
> > So for Produced, you would use it in the following fashion, for example: 
> > Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
> >
> > I also updated the KIP, and added a section “Serialization Strategy” where 
> > I describe our logic of conditional serialization based on the type of an 
> > inner serde.
> >
> > Thank you!
> >
> > Best,
> > Daniyar Yeralin
> >
> > On Jun 26, 2019, at 11:44 AM, John Roesler <[email protected] 
> > <mailto:[email protected]>> wrote:
> >
> > Thanks for the update, Daniyar!
> >
> > In addition to specifying the config interface, can you also specify
> > the Java interface? Namely, if I need to pass an instance of this
> > serde in to the DSL directly, as in Produced, Materialized, etc., what
> > constructor(s) would I have available? Likewise with the Serializer
> > and Deserailizer. I don't think you need to specify the implementation
> > logic, since we've already discussed it here.
> >
> > If you also want to specify the serialized format of the data records
> > in the KIP, it could be useful documentation, as well as letting us
> > verify the schema for forward/backward compatibility concerns, etc.
> >
> > Thanks,
> > John
> >
> > On Wed, Jun 26, 2019 at 10:33 AM Development <[email protected] 
> > <mailto:[email protected]>> wrote:
> >
> >
> > Hey,
> >
> > Finally made updates to the KIP: 
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >  
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>
> >  
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >  
> > <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>>
> > Sorry for the delay :)
> >
> > Thank You!
> >
> > Best,
> > Daniyar Yeralin
> >
> > On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <[email protected] 
> > <mailto:[email protected]>> wrote:
> >
> > Yes, something like this. I did not think about good configuration
> > parameter names yet. I am also not sure if I understand all proposed
> > configs atm. But all configs should be listed and explained in the KIP
> > anyway, and we can discuss further after you have updated the KIP (I can
> > ask more detailed question if I have any).
> >
> >
> > -Matthias
> >
> > On 6/21/19 2:05 PM, Development wrote:
> >
> > Yes, you are right. ByteSerializer is not what I need to have in a list
> > of primitives.
> >
> > As for the default constructor and configurability, just want to make
> > sure. Is this what you have on your mind?
> >
> > Best,
> > Daniyar Yeralin
> >
> >
> >
> > On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <[email protected] 
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >
> > Thanks for the update!
> >
> > I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
> > should have an default constructor and it should be possible to pass in
> > the `Class listClass` information via a configuration. Otherwise,
> > KafkaStreams cannot use it as default serde.
> >
> >
> > For the primitive serializers: `BytesSerializer` is not primitive IMHO,
> > as is it for `byte[]` with variable length -- it's for arrays, not for
> > single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
> >
> >
> > For tests, we can comment on the PR. No need to do this in the KIP
> > discussion.
> >
> >
> > Can you also update the KIP?
> >
> >
> >
> > -Matthias
> >
> >
> >
> >
> >
> > On 6/21/19 11:29 AM, Development wrote:
> >
> > I made and pushed necessary commits, so we could review the final
> > version under PR https://github.com/apache/kafka/pull/6592 
> > <https://github.com/apache/kafka/pull/6592>
> >
> > I also need some advice on writing tests for this new serde. So far I
> > only have two test cases (roundtrip and empty payload), I’m not sure
> > if it is enough.
> >
> > Thank y’all for your help in this KIP :)
> >
> > Best,
> > Daniyar Yeralin
> >
> >
> > On Jun 21, 2019, at 1:44 PM, John Roesler <[email protected] 
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >
> > Hey Daniyar,
> >
> > Looks good to me! Thanks for considering it.
> >
> > Thanks,
> > -John
> >
> > On Fri, Jun 21, 2019 at 9:04 AM Development <[email protected] 
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>> 
> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> > Hey John and Matthias,
> >
> > Yes, now I see it all. I’m storing lots of redundant information.
> > Here is my final idea. Yes, now a user should pass a list type. I
> > realized that’s the type is not really needed in ListSerializer, but
> > only in ListDeserializer:
> >
> >
> > In ListSerializer we will start storing sizes only if serializer is
> > not a primitive serializer:
> >
> >
> > Then, in deserializer, we persist passed list type, so that during
> > deserialization we could create an instance of it with predefined
> > listSize for better performance.
> > We also try to locate a primitiveSize based on passed deserializer.
> > If it is not there, then primitiveSize will be null. Which means
> > that each entry’s size was encoded individually.
> >
> >
> > This looks much cleaner and more concise.
> >
> > What do you think?
> >
> > Best,
> > Daniyar Yeralin
> >
> > On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <[email protected] 
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>> 
> > <mailto:[email protected] <mailto:[email protected]>>> wrote:
> >
> > For encoding the list-type: I see John's point about re-encoding the
> > list-type redundantly. However, I also don't like the idea that the
> > Deserializer returns a fixed type...
> >
> > Maybe it's best allow users to specify the target list type on
> > deserialization via config?
> >
> > Similar for the primitive types: I don't think we need to encode the
> > type size, but users could specify the type on the deserializer (via a
> > config again)?
> >
> >
> > About generics: nesting could be arbitrarily deep. Hence, I doubt
> > we can
> > support this and a cast will be necessary at some point in the user
> > code.
> >
> >
> >
> > -Matthias
> >
> >
> >
> > On 6/20/19 1:21 PM, John Roesler wrote:
> >
> > Hey Daniyar,
> >
> > Thanks for looking at it!
> >
> > Something like your screenshot is more along the lines of what I was
> > thinking. Sorry, but I didn't follow what you mean, how would that not
> > be "vanilla java"?
> >
> > Unfortunately the deserializer needs more information, though. For
> > example, what if the inner type is a Map<String,String>? The serde
> > could
> > only be used to produce a LinkedList<Map>, thus, we'd still need an
> > inner serde, like you have in the KIP (Serde<T> innerSerde).
> >
> > Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
> > /**list type**/ LinkedList.class,
> > /**inner serde**/ new MyRecordSerde()
> > )
> >
> > And in configuration, it's something like:
> > default.key.serde: org...ListSerde
> > default.key.list.serde.type: java.util.LinkedList
> > default.key.list.serde.inner: com.mycompany.MyRecordSerde
> >
> >
> > What do you think?
> > Thanks,
> > -John
> >
> > On Thu, Jun 20, 2019 at 2:46 PM Development <[email protected] 
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>> 
> > <mailto:[email protected] <mailto:[email protected]>>
> > <mailto:[email protected] <mailto:[email protected]> <mailto:[email protected] 
> > <mailto:[email protected]>>>> wrote:
> >
> >  Hey John,
> >
> >  I gave read about TypeReference. It could work for the list serde.
> >  However, it is not directly
> >  supported:
> > https://github.com/FasterXML/jackson-databind/issues/1490 
> > <https://github.com/FasterXML/jackson-databind/issues/1490>
> > <https://github.com/FasterXML/jackson-databind/issues/1490 
> > <https://github.com/FasterXML/jackson-databind/issues/1490>>
> >  The only way is to pass an actual class object into the constructor,
> >  something like:
> >
> >  It could be an option, but not a pretty one. What do you think of my
> >  approach to use vanilla java and canonical class name? (As described
> >  previously)
> >
> >  Best,
> >  Daniyar Yeralin
> >
> >  On Jun 20, 2019, at 2:45 PM, Development <[email protected] 
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>> 
> > <mailto:[email protected] <mailto:[email protected]>>
> >  <mailto:[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>>> wrote:
> >
> >  Hi John,
> >
> >  Thank you for your input! Yes, my idea looks a little bit over
> >  engineered :)
> >
> >  I also wanted to see a feedback from Mathias as well since he gave
> >  me an idea about storing fixed/variable size entries.
> >
> >  Best,
> >  Daniyar Yeralin
> >
> >  On Jun 18, 2019, at 6:06 PM, John Roesler <[email protected] 
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>> 
> > <mailto:[email protected] <mailto:[email protected]>>
> >  <mailto:[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>>> wrote:
> >
> >  Hi Daniyar,
> >
> >  That's a very clever solution!
> >
> >  One observation is that, now, this is what we might call a
> >  polymorphic
> >  serde. That is, you're detecting the actual concrete type and then
> >  promising to produce the exact same concrete type on read.
> > There are
> >  some inherent problems with this approach, which in general
> > require
> >  some kind of  schema registry (not necessarily Schema
> > Registry, just
> >  any registry for schemas) to solve.
> >
> >  Notice that every serialized record has quite a bit of duplicated
> >  information: the concrete type as well as a byte to indicate
> > whether
> >  the value type is a fixed size, and, if so, an integer to
> >  indicate the
> >  actual size. These constitute a schema, of sorts, because they
> >  tell us
> >  later how exactly to deserialize the data. Unfortunately, this
> >  information is completely redundant. In all likelihood, the
> >  information will be exactly the same for every record in the
> > topic.
> >  This problem is essentially the core motivation for serializations
> >  like Avro: to move the schema outside of the serialization
> > itself, so
> >  that the records won't contain so much redundant information.
> >
> >  In this light, I'm wondering if it makes sense to go back to
> >  something
> >  like what you had earlier in which you don't support perfectly
> >  preserving the concrete type for _this_ serde, but instead just
> >  support deserializing to _some_ List. Then, you could defer full,
> >  perfect, type preservation to serdes that have an external
> > system in
> >  which to register their type information.
> >
> >  There does exist an alternative, if we really do want to
> > preserve the
> >  concrete type (which does seem kind of nice). You can add a
> >  configuration option specifically for the serde to configure
> > what the
> >  list type will be, and maybe what the element type is, as well.
> >
> >  As far as "related work" goes, you might be interested to take
> > a look
> >  at how Jackson can be configured to deserialize into a specific,
> >  arbitrarily nested, generically parameterized class structure.
> >  Specifically, you might find
> >  
> > https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >  
> > <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>
> > <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >  
> > <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>>
> >  interesting.
> >
> >  Thanks,
> >  -John
> >
> >  On Mon, Jun 17, 2019 at 12:38 PM Development <[email protected] 
> > <mailto:[email protected]>
> > <mailto:[email protected] <mailto:[email protected]>> 
> > <mailto:[email protected] <mailto:[email protected]>>
> >  <mailto:[email protected] <mailto:[email protected]> 
> > <mailto:[email protected] <mailto:[email protected]>>>> wrote:
> >
> >
> >  bump
> >
> >
> >
> >
> >
> >
> >
> >

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Reply via email to