Re: Are defaults serde in Kafka streams doing more harm then good ?

Stephane Maarek Sat, 16 Jun 2018 02:04:04 -0700

Thank Guozhang!

Let's pick this up in
https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-7066 and
https://github.com/apache/kafka/pull/5239


Cheers
Stephane

Kind regards,
Stephane

[image: Simple Machines]

Stephane Maarek | Developer

+61 416 575 980
steph...@simplemachines.com.au
simplemachines.com.au
Level 2, 145 William Street, Sydney NSW 2010

On 16 June 2018 at 07:27, Guozhang Wang <wangg...@gmail.com> wrote:

> Hi Stephane,
>
> The serdes only happen in the following case:
>
> 1. when sending to an external topic or repartition topic, this is covered
> in SinkNode.
> 2. when reading from external topic, we cover deserialization errors in
> the DeserializationExceptionHandler interface, customizable in config.
> 3. when writing into the store, which accepts only serialized bytes (note
> it includes sending to the changelog topic as well if the store is logging
> enabled).
>
> So as of now only case 3) is not captured, and the serdes happens at
> MeteredXXStores, calling the serde, i.e. not centralized in one class. We
> can add the logic similar in SinkNode to capture ClassCastException in the
> serde calls there.
>
>
> Guozhang
>
>
> On Fri, Jun 15, 2018 at 2:05 AM, Stephane Maarek <
> steph...@simplemachines.com.au> wrote:
>
>> 3) I've had trouble finding the proper place to catch the exception as
>> the stack trace is huge.
>>
>> I've found some "wanted" behaviour is implemented in SinkNode but not
>> elsewhere: https://github.com/apache/kafka/blob/trunk/stream
>> s/src/main/java/org/apache/kafka/streams/processor/
>> internals/SinkNode.java#L93
>>
>> Overall it'd be ideal to catch that in the Serde classes, but they don't
>> expose the correct types.
>>
>> I'm happy to propose a PR but not sure where the correct try / catch
>> should go... too high in the trace and I lose the "key value serde"
>> information, and too low in the trace I don't encompass all the cases (just
>> like SinkNode).
>>
>> If you have any pointers, much appreciated :)
>> Stephane
>>
>>
>> On Fri., 15 Jun. 2018, 4:20 am Guozhang Wang, <wangg...@gmail.com> wrote:
>>
>>> 2) Pre-registering serdes and data types for Kafka topics as well as
>>> state
>>> stores could be a good feature to add.
>>>
>>> 3) For this, we can consider capturing the ClassCastException in serde
>>> callers and returns a more informative error.
>>>
>>>
>>> Guozhang
>>>
>>> On Wed, Jun 13, 2018 at 8:34 PM, Stephane Maarek <
>>> steph...@simplemachines.com.au> wrote:
>>>
>>> > Thanks Matthias and Guozhang
>>> >
>>> > 1) regarding having json protobuf or avro across the entire topology
>>> this
>>> > makes sense. I still wish the builder could take a 'defaultSerde' for
>>> value
>>> > and keys to make types explicit throughout the topology vs a class as
>>> > string in a properties. That might also help with Java types through
>>> the
>>> > topology as now we can infer that the default serde<T> implies T as the
>>> > operators are chained
>>> >
>>> > 1*) I still think as soon as a 'count' or any 'window' happens the user
>>> > needs to override the default serde which can be confusing for end
>>> users
>>> >
>>> > 2) I very much agree a type and serde map could be very useful.
>>> >
>>> > 2*) big scala user here but this will affect maybe 10 percent of the
>>> user
>>> > unfortunately. Java is still where people try most things out. Still
>>> very
>>> > excited for that release !
>>> >
>>> > 3) haven't dug through the code, but how easy would it be to indicate
>>> to
>>> > the end user that a default serde was used during a runtime error ?
>>> This
>>> > could be a very quick kip-less win for the developers
>>> >
>>> > On Thu., 14 Jun. 2018, 12:28 am Guozhang Wang, <wangg...@gmail.com>
>>> wrote:
>>> >
>>> > > Hello Stéphane,
>>> > >
>>> > > Good question :) And there have been some discussions about the
>>> default
>>> > > serdes in the past in the community, my two cents about this:
>>> > >
>>> > > 1) When a user tries out Streams for the first time she is likely to
>>> use
>>> > > some primitive typed data as her first POC app, in which case the
>>> data
>>> > > types of the intermediate streams can change frequently and hence a
>>> > default
>>> > > serde would not help much but may introduce confusions; on the other
>>> > hand,
>>> > > in real production environment users are likely to use some data
>>> schema
>>> > > system like Avro / Protobuf, and hence their declared serde may well
>>> be
>>> > > consistent. For example if you are using Avro with GenericRecord,
>>> then
>>> > all
>>> > > the value types throughout your topology may be of the same type, so
>>> just
>>> > > declaring a `Serdes<GenericRecord, GenericRecord>` would help. Over
>>> time,
>>> > > this is indeed what we have seen from practical user scenarios.
>>> > >
>>> > > 2) So to me the question is for top-of-the-funnel adoptions, could we
>>> > make
>>> > > the OOTB experience better with serdes for users. We've discussed
>>> some
>>> > > ideas around this topic, like improving our typing systems so that
>>> users
>>> > > can specify some serdes per type (for primitive types we can
>>> > pre-register a
>>> > > list of default ones as well), and the library can infer the data
>>> types
>>> > and
>>> > > choose which serde to use automatically. However for Java type
>>> erasure
>>> > > makes it tricky (I think it is still the case in Java8), and we
>>> cannot
>>> > > always make it work. And that's where we paused on investigating
>>> further.
>>> > > Note that in the coming 2.0 release we have a Scala API for Streams
>>> where
>>> > > default serdes are indeed dropped since with Scala we can safely
>>> rely on
>>> > > implicit typing inference to override the serdes automatically.
>>> > >
>>> > >
>>> > >
>>> > > Guozhang
>>> > >
>>> > >
>>> > > On Tue, Jun 12, 2018 at 6:32 PM, Stephane Maarek <
>>> > > steph...@simplemachines.com.au> wrote:
>>> > >
>>> > > > Hi
>>> > > >
>>> > > > Coming from a user perspective, I see a lot of beginners not
>>> > > understanding
>>> > > > the need for serdes and misusing the default serde settings.
>>> > > >
>>> > > > I believe default serdes do more harm than good. At best, they
>>> save a
>>> > bit
>>> > > > of boilerplate code but hide the complexity of serde happening at
>>> each
>>> > > > step. At worst, they generate confusion and make debugging
>>> tremendously
>>> > > > hard as the errors thrown at runtime don't indicate that the serde
>>> > being
>>> > > > used is the default one.
>>> > > >
>>> > > > What do you think of deprecating them as well as any API that does
>>> not
>>> > > use
>>> > > > explicit serde?
>>> > > >
>>> > > > I know this may be a "tough change", but in my opinion it'll allow
>>> for
>>> > > more
>>> > > > explicit development and easier debugging.
>>> > > >
>>> > > > Regards
>>> > > > Stéphane
>>> > > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > > -- Guozhang
>>> > >
>>> >
>>>
>>>
>>>
>>> --
>>> -- Guozhang
>>>
>>
>
>
> --
> -- Guozhang
>

Re: Are defaults serde in Kafka streams doing more harm then good ?

Reply via email to