Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

John Roesler Mon, 15 Jul 2019 08:46:18 -0700

Hi all,

Regarding the placement, you might as well move the constants to
`org.apache.kafka.clients.CommonClientConfigs`, so that the constants and
the configs and the code are in the same module.


Regarding the constructor... What Matthias said is correct: The serde,
serializer, and deserializer all need to have zero-arg constructors so they
can be instantiated reflectively by Kafka. However, the factory method you
proposed "New method public static <T> Serde<List<T>> ListSerde()" is not a
constructor, and is not required. It would be used purely from the Java
interface, but has the drawbacks I listed above. This method, not the
constructor, is what I proposed to remove from the KIP.

Thanks,
-John


On Mon, Jul 15, 2019 at 10:15 AM Development <d...@yeralin.net> wrote:

> One problem though.
>
> Since WindowedSerde (Windowed(De)Serializer) are so similar, I’m trying to
> mimic the implementation of my ListSerde accordingly.
>
> I created couple constants under StreamsConfig:
>
>
>
> And trying to do similar construct:
>
> final String propertyName = isKey ? 
> StreamsConfig.DEFAULT_LIST_KEY_SERDE_INNER_CLASS : 
> StreamsConfig.DEFAULT_LIST_VALUE_SERDE_INNER_CLASS;
>
> But then found out that *StreamsConfig is not accessible* from
> *org.apache.kafka.common.serialization* package while window serde
> (de)serializers are located under *org.apache.kafka.streams.kstream*
> package.
>
> What should I do? Should I move my classes under 
> *org.apache.kafka.streams.kstream
> *package instead?
>
> On Jul 15, 2019, at 10:45 AM, Development <d...@yeralin.net> wrote:
>
> Hi Matthias,
>
> Thank you for your input.
>
> I updated the KIP, made it a little more readable.
>
> I think the configuration parameters strategy is finalized then.
>
> Do you have any other questions/concerns regarding this KIP?
>
> Meanwhile I’ll start doing appropriate code changes, and commit them under
> my PR.
>
> Best,
> Daniyar Yeralin
>
> On Jul 11, 2019, at 2:44 PM, Matthias J. Sax <matth...@confluent.io>
> wrote:
>
> Daniyar,
>
> thanks for the update to the KIP. It's in really good shape and well
> written.
>
> About the default constructor question:
>
> All Serdes/Serializer/Deserializer classes need a default constructor to
> create them easily via reflections when specifies in a config. I
> understand that it is not super user friendly, but all existing code
> works this way. Hence, it seems best to stick with the established pattern.
>
> We have a similar issue with `TimeWindowedSerde` and
> `SessionWindowedSerde`, and I just recently did a PR to improve user
> experience that address the exact issue John raised. (cf
> https://github.com/apache/kafka/pull/7067)
>
> Note, that if a user would instantiate the Serde manually, the user
> would also need to call `configure()` to setup the inner serdes. Kafka
> Streams would not setup those automatically and one might most likely
> end-up with an NPE.
>
>
> Coming back the KIP, and the parameter names. `WindowedSerdes` are
> similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`,
> we use the following parameter names:
>
> - default.windowed.key.serde.inner
> - default.windowed.value.serde.inner
>
>
> It might be good to align the naming pattern. I would also suggest to
> use `type` instead of `impl`?
>
>
> default.key.list.serde.impl  ->  default.list.key.serde.type
> default.value.list.serde.impl  ->  default.list.value.serde.type
> default.key.list.serde.element  ->  default.list.key.serde.inner
> default.value.list.serde.element  ->  default.list.value.serde.inner
>
>
>
> -Matthias
>
>
> On 7/10/19 8:52 AM, Development wrote:
>
> Hi John,
>
> Yes, I do agree. That totally makes sense. The only thing is that it goes
> against what Matthias suggested earlier:
> "I think that ... `ListSerde` should have an default constructor and it
> should be possible to pass in the `Class listClass` information via a
> configuration. Otherwise, KafkaStreams cannot use it as default serde.”
>
> What do you think about that? I hope I’m not confusing anything.
>
> Best,
> Daniyar Yeralin
>
> On Jul 9, 2019, at 5:56 PM, John Roesler <j...@confluent.io> wrote:
>
> Ah, my apologies, I must have just overlooked it. Thanks for the update,
> too.
>
> Just one more super-small question, do we need this variant:
>
> New method public static <T> Serde<List<T>> ListSerde() in
> org.apache.kafka.common.serialization.Serdes class (infers list
> implementation and inner serde from config file)
>
>
> It seems like this situation implies my config file is already set up for
> the list serde, so passing this serde (e.g., in Produced) would have the
> same effect as not specifying it.
>
> I guess that it could be the case that you have the
> `default.key/value.serde` set to something else, like StringSerde, but you
> still have the `default.key/value.list.serde.impl/element` set. This seems
> like it would result in more confusion than convenience, so my gut instinct
> is maybe we shouldn't introduce the `ListSerde()` variant until people
> actually request it later on.
>
> Thus, we'd just stick with fully config-driven or fully
> source-code-driven, not half/half.
>
> What do you think?
>
> Thanks,
> -John
>
>
> On Tue, Jul 9, 2019 at 9:58 AM Development <d...@yeralin.net <
> mailto:d...@yeralin.net <d...@yeralin.net>>> wrote:
>
>
> Hi John,
>
> I hope everyone had a great long weekend.
>
> Regarding Java interfaces, I may not understand you correctly, but I think
> I already listed them:
>
> So for Produced, you would use it in the following fashion, for example:
> Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer()))
>
> I also updated the KIP, and added a section “Serialization Strategy” where
> I describe our logic of conditional serialization based on the type of an
> inner serde.
>
> Thank you!
>
> Best,
> Daniyar Yeralin
>
> On Jun 26, 2019, at 11:44 AM, John Roesler <j...@confluent.io <
> mailto:j...@confluent.io <j...@confluent.io>>> wrote:
>
> Thanks for the update, Daniyar!
>
> In addition to specifying the config interface, can you also specify
> the Java interface? Namely, if I need to pass an instance of this
> serde in to the DSL directly, as in Produced, Materialized, etc., what
> constructor(s) would I have available? Likewise with the Serializer
> and Deserailizer. I don't think you need to specify the implementation
> logic, since we've already discussed it here.
>
> If you also want to specify the serialized format of the data records
> in the KIP, it could be useful documentation, as well as letting us
> verify the schema for forward/backward compatibility concerns, etc.
>
> Thanks,
> John
>
> On Wed, Jun 26, 2019 at 10:33 AM Development <d...@yeralin.net <
> mailto:d...@yeralin.net <d...@yeralin.net>>> wrote:
>
>
> Hey,
>
> Finally made updates to the KIP:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization>
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> <
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization
> >>
> Sorry for the delay :)
>
> Thank You!
>
> Best,
> Daniyar Yeralin
>
> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matth...@confluent.io <
> mailto:matth...@confluent.io <matth...@confluent.io>>> wrote:
>
> Yes, something like this. I did not think about good configuration
> parameter names yet. I am also not sure if I understand all proposed
> configs atm. But all configs should be listed and explained in the KIP
> anyway, and we can discuss further after you have updated the KIP (I can
> ask more detailed question if I have any).
>
>
> -Matthias
>
> On 6/21/19 2:05 PM, Development wrote:
>
> Yes, you are right. ByteSerializer is not what I need to have in a list
> of primitives.
>
> As for the default constructor and configurability, just want to make
> sure. Is this what you have on your mind?
>
> Best,
> Daniyar Yeralin
>
>
>
> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matth...@confluent.io <
> mailto:matth...@confluent.io <matth...@confluent.io>>
> <mailto:matth...@confluent.io <matth...@confluent.io> <
> mailto:matth...@confluent.io <matth...@confluent.io>>>> wrote:
>
> Thanks for the update!
>
> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde`
> should have an default constructor and it should be possible to pass in
> the `Class listClass` information via a configuration. Otherwise,
> KafkaStreams cannot use it as default serde.
>
>
> For the primitive serializers: `BytesSerializer` is not primitive IMHO,
> as is it for `byte[]` with variable length -- it's for arrays, not for
> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`).
>
>
> For tests, we can comment on the PR. No need to do this in the KIP
> discussion.
>
>
> Can you also update the KIP?
>
>
>
> -Matthias
>
>
>
>
>
> On 6/21/19 11:29 AM, Development wrote:
>
> I made and pushed necessary commits, so we could review the final
> version under PR https://github.com/apache/kafka/pull/6592 <
> https://github.com/apache/kafka/pull/6592>
>
> I also need some advice on writing tests for this new serde. So far I
> only have two test cases (roundtrip and empty payload), I’m not sure
> if it is enough.
>
> Thank y’all for your help in this KIP :)
>
> Best,
> Daniyar Yeralin
>
>
> On Jun 21, 2019, at 1:44 PM, John Roesler <j...@confluent.io <
> mailto:j...@confluent.io <j...@confluent.io>>
> <mailto:j...@confluent.io <j...@confluent.io> <mailto:j...@confluent.io
> <j...@confluent.io>>>> wrote:
>
> Hey Daniyar,
>
> Looks good to me! Thanks for considering it.
>
> Thanks,
> -John
>
> On Fri, Jun 21, 2019 at 9:04 AM Development <d...@yeralin.net <
> mailto:d...@yeralin.net <d...@yeralin.net>>
> <mailto:d...@yeralin.net <d...@yeralin.net> <mailto:d...@yeralin.net
> <d...@yeralin.net>>> <mailto:d...@yeralin.net <d...@yeralin.net> <
> mailto:d...@yeralin.net <d...@yeralin.net>>>> wrote:
> Hey John and Matthias,
>
> Yes, now I see it all. I’m storing lots of redundant information.
> Here is my final idea. Yes, now a user should pass a list type. I
> realized that’s the type is not really needed in ListSerializer, but
> only in ListDeserializer:
>
>
> In ListSerializer we will start storing sizes only if serializer is
> not a primitive serializer:
>
>
> Then, in deserializer, we persist passed list type, so that during
> deserialization we could create an instance of it with predefined
> listSize for better performance.
> We also try to locate a primitiveSize based on passed deserializer.
> If it is not there, then primitiveSize will be null. Which means
> that each entry’s size was encoded individually.
>
>
> This looks much cleaner and more concise.
>
> What do you think?
>
> Best,
> Daniyar Yeralin
>
> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matth...@confluent.io <
> mailto:matth...@confluent.io <matth...@confluent.io>>
> <mailto:matth...@confluent.io <matth...@confluent.io> <
> mailto:matth...@confluent.io <matth...@confluent.io>>> <
> mailto:matth...@confluent.io <matth...@confluent.io> <
> mailto:matth...@confluent.io <matth...@confluent.io>>>> wrote:
>
> For encoding the list-type: I see John's point about re-encoding the
> list-type redundantly. However, I also don't like the idea that the
> Deserializer returns a fixed type...
>
> Maybe it's best allow users to specify the target list type on
> deserialization via config?
>
> Similar for the primitive types: I don't think we need to encode the
> type size, but users could specify the type on the deserializer (via a
> config again)?
>
>
> About generics: nesting could be arbitrarily deep. Hence, I doubt
> we can
> support this and a cast will be necessary at some point in the user
> code.
>
>
>
> -Matthias
>
>
>
> On 6/20/19 1:21 PM, John Roesler wrote:
>
> Hey Daniyar,
>
> Thanks for looking at it!
>
> Something like your screenshot is more along the lines of what I was
> thinking. Sorry, but I didn't follow what you mean, how would that not
> be "vanilla java"?
>
> Unfortunately the deserializer needs more information, though. For
> example, what if the inner type is a Map<String,String>? The serde
> could
> only be used to produce a LinkedList<Map>, thus, we'd still need an
> inner serde, like you have in the KIP (Serde<T> innerSerde).
>
> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde(
> /**list type**/ LinkedList.class,
> /**inner serde**/ new MyRecordSerde()
> )
>
> And in configuration, it's something like:
> default.key.serde: org...ListSerde
> default.key.list.serde.type: java.util.LinkedList
> default.key.list.serde.inner: com.mycompany.MyRecordSerde
>
>
> What do you think?
> Thanks,
> -John
>
> On Thu, Jun 20, 2019 at 2:46 PM Development <d...@yeralin.net <
> mailto:d...@yeralin.net <d...@yeralin.net>>
> <mailto:d...@yeralin.net <d...@yeralin.net> <mailto:d...@yeralin.net
> <d...@yeralin.net>>> <mailto:d...@yeralin.net <d...@yeralin.net> <
> mailto:d...@yeralin.net <d...@yeralin.net>>>
> <mailto:d...@yeralin.net <d...@yeralin.net> <mailto:d...@yeralin.net
> <d...@yeralin.net>> <mailto:d...@yeralin.net <d...@yeralin.net> <
> mailto:d...@yeralin.net <d...@yeralin.net>>>>> wrote:
>
> Hey John,
>
> I gave read about TypeReference. It could work for the list serde.
> However, it is not directly
> supported:
> https://github.com/FasterXML/jackson-databind/issues/1490 <
> https://github.com/FasterXML/jackson-databind/issues/1490>
> <https://github.com/FasterXML/jackson-databind/issues/1490 <
> https://github.com/FasterXML/jackson-databind/issues/1490>>
> The only way is to pass an actual class object into the constructor,
> something like:
>
> It could be an option, but not a pretty one. What do you think of my
> approach to use vanilla java and canonical class name? (As described
> previously)
>
> Best,
> Daniyar Yeralin
>
> On Jun 20, 2019, at 2:45 PM, Development <d...@yeralin.net <
> mailto:d...@yeralin.net <d...@yeralin.net>>
> <mailto:d...@yeralin.net <d...@yeralin.net> <mailto:d...@yeralin.net
> <d...@yeralin.net>>> <mailto:d...@yeralin.net <d...@yeralin.net> <
> mailto:d...@yeralin.net <d...@yeralin.net>>>
> <mailto:d...@yeralin.net <d...@yeralin.net> <mailto:d...@yeralin.net
> <d...@yeralin.net>> <mailto:d...@yeralin.net <d...@yeralin.net> <
> mailto:d...@yeralin.net <d...@yeralin.net>>>>> wrote:
>
> Hi John,
>
> Thank you for your input! Yes, my idea looks a little bit over
> engineered :)
>
> I also wanted to see a feedback from Mathias as well since he gave
> me an idea about storing fixed/variable size entries.
>
> Best,
> Daniyar Yeralin
>
> On Jun 18, 2019, at 6:06 PM, John Roesler <j...@confluent.io <
> mailto:j...@confluent.io <j...@confluent.io>>
> <mailto:j...@confluent.io <j...@confluent.io> <mailto:j...@confluent.io
> <j...@confluent.io>>> <mailto:j...@confluent.io <j...@confluent.io> <
> mailto:j...@confluent.io <j...@confluent.io>>>
> <mailto:j...@confluent.io <j...@confluent.io> <mailto:j...@confluent.io
> <j...@confluent.io>> <mailto:j...@confluent.io <j...@confluent.io> <
> mailto:j...@confluent.io <j...@confluent.io>>>>> wrote:
>
> Hi Daniyar,
>
> That's a very clever solution!
>
> One observation is that, now, this is what we might call a
> polymorphic
> serde. That is, you're detecting the actual concrete type and then
> promising to produce the exact same concrete type on read.
> There are
> some inherent problems with this approach, which in general
> require
> some kind of  schema registry (not necessarily Schema
> Registry, just
> any registry for schemas) to solve.
>
> Notice that every serialized record has quite a bit of duplicated
> information: the concrete type as well as a byte to indicate
> whether
> the value type is a fixed size, and, if so, an integer to
> indicate the
> actual size. These constitute a schema, of sorts, because they
> tell us
> later how exactly to deserialize the data. Unfortunately, this
> information is completely redundant. In all likelihood, the
> information will be exactly the same for every record in the
> topic.
> This problem is essentially the core motivation for serializations
> like Avro: to move the schema outside of the serialization
> itself, so
> that the records won't contain so much redundant information.
>
> In this light, I'm wondering if it makes sense to go back to
> something
> like what you had earlier in which you don't support perfectly
> preserving the concrete type for _this_ serde, but instead just
> support deserializing to _some_ List. Then, you could defer full,
> perfect, type preservation to serdes that have an external
> system in
> which to register their type information.
>
> There does exist an alternative, if we really do want to
> preserve the
> concrete type (which does seem kind of nice). You can add a
> configuration option specifically for the serde to configure
> what the
> list type will be, and maybe what the element type is, as well.
>
> As far as "related work" goes, you might be interested to take
> a look
> at how Jackson can be configured to deserialize into a specific,
> arbitrarily nested, generically parameterized class structure.
> Specifically, you might find
>
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> <
> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html
> >>
> interesting.
>
> Thanks,
> -John
>
> On Mon, Jun 17, 2019 at 12:38 PM Development <d...@yeralin.net <
> mailto:d...@yeralin.net <d...@yeralin.net>>
> <mailto:d...@yeralin.net <d...@yeralin.net> <mailto:d...@yeralin.net
> <d...@yeralin.net>>> <mailto:d...@yeralin.net <d...@yeralin.net> <
> mailto:d...@yeralin.net <d...@yeralin.net>>>
> <mailto:d...@yeralin.net <d...@yeralin.net> <mailto:d...@yeralin.net
> <d...@yeralin.net>> <mailto:d...@yeralin.net <d...@yeralin.net> <
> mailto:d...@yeralin.net <d...@yeralin.net>>>>> wrote:
>
>
> bump
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Re: [DISCUSS] KIP-466: Add support for List serialization and deserialization

Reply via email to