Thank Guozhang! Let's pick this up in https://issues.apache.org/jira/projects/KAFKA/issues/KAFKA-7066 and https://github.com/apache/kafka/pull/5239
Cheers Stephane Kind regards, Stephane [image: Simple Machines] Stephane Maarek | Developer +61 416 575 980 steph...@simplemachines.com.au simplemachines.com.au Level 2, 145 William Street, Sydney NSW 2010 On 16 June 2018 at 07:27, Guozhang Wang <wangg...@gmail.com> wrote: > Hi Stephane, > > The serdes only happen in the following case: > > 1. when sending to an external topic or repartition topic, this is covered > in SinkNode. > 2. when reading from external topic, we cover deserialization errors in > the DeserializationExceptionHandler interface, customizable in config. > 3. when writing into the store, which accepts only serialized bytes (note > it includes sending to the changelog topic as well if the store is logging > enabled). > > So as of now only case 3) is not captured, and the serdes happens at > MeteredXXStores, calling the serde, i.e. not centralized in one class. We > can add the logic similar in SinkNode to capture ClassCastException in the > serde calls there. > > > Guozhang > > > On Fri, Jun 15, 2018 at 2:05 AM, Stephane Maarek < > steph...@simplemachines.com.au> wrote: > >> 3) I've had trouble finding the proper place to catch the exception as >> the stack trace is huge. >> >> I've found some "wanted" behaviour is implemented in SinkNode but not >> elsewhere: https://github.com/apache/kafka/blob/trunk/stream >> s/src/main/java/org/apache/kafka/streams/processor/ >> internals/SinkNode.java#L93 >> >> Overall it'd be ideal to catch that in the Serde classes, but they don't >> expose the correct types. >> >> I'm happy to propose a PR but not sure where the correct try / catch >> should go... too high in the trace and I lose the "key value serde" >> information, and too low in the trace I don't encompass all the cases (just >> like SinkNode). >> >> If you have any pointers, much appreciated :) >> Stephane >> >> >> On Fri., 15 Jun. 2018, 4:20 am Guozhang Wang, <wangg...@gmail.com> wrote: >> >>> 2) Pre-registering serdes and data types for Kafka topics as well as >>> state >>> stores could be a good feature to add. >>> >>> 3) For this, we can consider capturing the ClassCastException in serde >>> callers and returns a more informative error. >>> >>> >>> Guozhang >>> >>> On Wed, Jun 13, 2018 at 8:34 PM, Stephane Maarek < >>> steph...@simplemachines.com.au> wrote: >>> >>> > Thanks Matthias and Guozhang >>> > >>> > 1) regarding having json protobuf or avro across the entire topology >>> this >>> > makes sense. I still wish the builder could take a 'defaultSerde' for >>> value >>> > and keys to make types explicit throughout the topology vs a class as >>> > string in a properties. That might also help with Java types through >>> the >>> > topology as now we can infer that the default serde<T> implies T as the >>> > operators are chained >>> > >>> > 1*) I still think as soon as a 'count' or any 'window' happens the user >>> > needs to override the default serde which can be confusing for end >>> users >>> > >>> > 2) I very much agree a type and serde map could be very useful. >>> > >>> > 2*) big scala user here but this will affect maybe 10 percent of the >>> user >>> > unfortunately. Java is still where people try most things out. Still >>> very >>> > excited for that release ! >>> > >>> > 3) haven't dug through the code, but how easy would it be to indicate >>> to >>> > the end user that a default serde was used during a runtime error ? >>> This >>> > could be a very quick kip-less win for the developers >>> > >>> > On Thu., 14 Jun. 2018, 12:28 am Guozhang Wang, <wangg...@gmail.com> >>> wrote: >>> > >>> > > Hello Stéphane, >>> > > >>> > > Good question :) And there have been some discussions about the >>> default >>> > > serdes in the past in the community, my two cents about this: >>> > > >>> > > 1) When a user tries out Streams for the first time she is likely to >>> use >>> > > some primitive typed data as her first POC app, in which case the >>> data >>> > > types of the intermediate streams can change frequently and hence a >>> > default >>> > > serde would not help much but may introduce confusions; on the other >>> > hand, >>> > > in real production environment users are likely to use some data >>> schema >>> > > system like Avro / Protobuf, and hence their declared serde may well >>> be >>> > > consistent. For example if you are using Avro with GenericRecord, >>> then >>> > all >>> > > the value types throughout your topology may be of the same type, so >>> just >>> > > declaring a `Serdes<GenericRecord, GenericRecord>` would help. Over >>> time, >>> > > this is indeed what we have seen from practical user scenarios. >>> > > >>> > > 2) So to me the question is for top-of-the-funnel adoptions, could we >>> > make >>> > > the OOTB experience better with serdes for users. We've discussed >>> some >>> > > ideas around this topic, like improving our typing systems so that >>> users >>> > > can specify some serdes per type (for primitive types we can >>> > pre-register a >>> > > list of default ones as well), and the library can infer the data >>> types >>> > and >>> > > choose which serde to use automatically. However for Java type >>> erasure >>> > > makes it tricky (I think it is still the case in Java8), and we >>> cannot >>> > > always make it work. And that's where we paused on investigating >>> further. >>> > > Note that in the coming 2.0 release we have a Scala API for Streams >>> where >>> > > default serdes are indeed dropped since with Scala we can safely >>> rely on >>> > > implicit typing inference to override the serdes automatically. >>> > > >>> > > >>> > > >>> > > Guozhang >>> > > >>> > > >>> > > On Tue, Jun 12, 2018 at 6:32 PM, Stephane Maarek < >>> > > steph...@simplemachines.com.au> wrote: >>> > > >>> > > > Hi >>> > > > >>> > > > Coming from a user perspective, I see a lot of beginners not >>> > > understanding >>> > > > the need for serdes and misusing the default serde settings. >>> > > > >>> > > > I believe default serdes do more harm than good. At best, they >>> save a >>> > bit >>> > > > of boilerplate code but hide the complexity of serde happening at >>> each >>> > > > step. At worst, they generate confusion and make debugging >>> tremendously >>> > > > hard as the errors thrown at runtime don't indicate that the serde >>> > being >>> > > > used is the default one. >>> > > > >>> > > > What do you think of deprecating them as well as any API that does >>> not >>> > > use >>> > > > explicit serde? >>> > > > >>> > > > I know this may be a "tough change", but in my opinion it'll allow >>> for >>> > > more >>> > > > explicit development and easier debugging. >>> > > > >>> > > > Regards >>> > > > Stéphane >>> > > > >>> > > >>> > > >>> > > >>> > > -- >>> > > -- Guozhang >>> > > >>> > >>> >>> >>> >>> -- >>> -- Guozhang >>> >> > > > -- > -- Guozhang >