Re: [DISCUSS] KIP-598: Augment TopologyDescription with store and source / sink serde information

Guozhang Wang Mon, 18 May 2020 21:25:18 -0700

Bruno, Matthias:

Thanks for your inputs. After some thoughts I've decide to update my
proposal in the following way:


1. Store#serdes() would return a "Map<String, String>"

2. Topology's description would be independent of whether it is generated
from `StreamsBuilder#build(props)` or `StreamsBuilder#build()`, and if the
serde is not known we would use "<unknown>" as the default value.

3. Add `List<String> TopologyDescription#sourceTopics() / sinkTopics() /
repartitionTopics() / changelogTopics()` and for pattern / topic-extractor
we would use fixed format of "<pattern:regex>" and
"<dynamic:extractor-class-name>".


I will try to implement this in my existing PR and after I've confirmed it
works, I will update the final wiki for voting.


Guozhang


On Mon, May 18, 2020 at 9:13 PM Guozhang Wang <[email protected]> wrote:

> Hello Andy,
>
> Thanks a lot for your comments! I do not mind at all :)
>
> I think that's a valid point, what I have in mind is to expose an
> interface which can be optionally overridden in the overridden describe()
> call:
>
> Topology#describe(final TopologyDescriber)
>
> Interface TopologyDescriber {
>
>     default describeSerde(final Serde);
>
>     default describeSerializer(final Serializer);
>
>     default describeDeserializer(final Serializer);
> }
>
> And we would expose a DefaultTopologyDescriber class that just print the
> serde class names -- and if it is a wrapper serde, also print its inner
> serde name.
>
> Guozhang
>
>
> On Mon, May 11, 2020 at 12:13 PM Andy Coates <[email protected]> wrote:
>
>> Hi Guozhang,
>>
>> Thanks for writing this up. I’m very interested to see this, so I hope
>> you don’t mind me commenting.
>>
>> I’ve only really one comment to make, and that’s on the text printed for
>> the serde classes:
>>
>> As I understand it, the name will either come from the passed in config,
>> or may default to “unknown”, or may be obtained from the instances passed
>> while building the topology. It’s this latter case that interests me.
>> Where you have an actual serde instance could we not output more
>> information?
>>
>> The examples use simple (de)serialization classes such as
>> `LongDeseriailizer` where the name alone imparts all the information the
>> user is likely to need. However, users may provide there own custom
>> serialisers and such serialisers may contain state that is important, e.g.
>> the serialiser may know the schema of the data being serialized.  May there
>> be benefit from taking the `toString()` representation of the serialiser?
>>
>> Of course, this would require adding suitable `toString` impls to our own
>> stock serialisers, but may ultimately prove more versatile in the future.
>> The downside is potential to corrupt the topology description, e.g. a
>> toString that includes new lines etc.
>>
>> Thanks,
>>
>> Andy
>>
>>
>>
>> > On 4 May 2020, at 19:27, Bruno Cadonna <[email protected]> wrote:
>> >
>> > Hi Guozhang,
>> >
>> > Thank you for the KIP!
>> >
>> > Exposing also the inner types of the wrapper serdes would be
>> > important. For debugging as Matthias has already mentioned and to see
>> > more easily changes that are applied to a topology.
>> >
>> > I am also +1 on the `toJson()` method to easily access the topology
>> > description programmatically and to make the description backward
>> > compatible.
>> >
>> > Regarding `List<String> serdeNames();`, I would be in favour of a more
>> > expressive return type, like a Map that assigns labels to Serde names.
>> > For example, for key and value serdes the label could be "key" and
>> > "value". Or something similar.
>> >
>> > Best,
>> > Bruno
>> >
>> >
>> >
>> > On Mon, May 4, 2020 at 2:25 AM Guozhang Wang <[email protected]>
>> wrote:
>> >>
>> >> Hello Matthias John, thanks for your comments!! Replied them inline.
>> >>
>> >> I think there are a couple open questions that I'd like to hear your
>> >> opinions on with the context:
>> >>
>> >> a. For stores's serdes, the reason I proposed to expose a set of serde
>> >> names instead of a pair of key / value serdes is for future possible
>> store
>> >> types which may not be key-values. I admit it could just be
>> over-killing
>> >> here so if you have a strong preference on the latter, I could be
>> convinced
>> >> to change that part but I'd want to make the original motivation clear.
>> >>
>> >> b. I think I'm convinced that I'd just augment the `toString` result
>> >> regardless of which func generated the Topology (and hence its
>> >> TopologyDescription), note this would mean that we break the
>> compatibility
>> >> of the current `toString` function. As a remedy for that, we will also
>> >> expose a `toJson` function to programmatical purposes.
>> >>
>> >> Guozhang
>> >>
>> >>
>> >>> (1) In the new TopologyDescription output, the line for the
>> >>> windowed-count processor is:
>> >>>
>> >>>> Processor: myname (stores: [(myname-store, serdes:
>> >> [SessionWindowedSerde, FullChangeSerde])])
>> >>>
>> >>> For this case, both Serdes are wrappers and user would actually only
>> >>> specified wrapped Serdes for the key and value. Can we do anything
>> about
>> >>> this? Otherwise, there might still be a runtime `ClassCastException`
>> >>> that a user cannot easily debug.
>> >>>
>> >>>
>> >>> (2) Nit: The JavaDocs of `Processor#storeSet()` seems to be incorrect
>> >>> (it says "The names of all connected stores." -- guess it's c&p
>> error)?
>> >>>
>> >> Yes! Fixed.
>> >>
>> >>>
>> >>> (3) The KIP mentioned to add `Store#changelogTopic()` method, but the
>> >>> output of `TopologyDescription#toString()` does not contain it. I
>> think
>> >>> it might be good do add it, too?
>> >>>
>> >> Yes, that's right. I'm going to add to the example as well.
>> >>
>> >>>
>> >>> (4) The KIP also list
>> https://issues.apache.org/jira/browse/KAFKA-9913
>> >>> but it seems not to address it yet?
>> >>>
>> >> I actually did intent to have it addressed; the proposal includes:
>> >>
>> >> a. Return the set of source / sink nodes of a sub-topology, and their
>> >> corresponding source / sink topics could be accessed.
>> >> b. Return the set of stores of a sub-topology, and their corresponding
>> >> changelog topics could be accessed.
>> >>
>> >> The reason I did not choose to just expose the set of all topics
>> directly,
>> >> but indirectly, is stated in the wiki:
>> >>
>> >> "the reason we did not expose APIs for topic names directly is that for
>> >> source nodes, it is possible to have Pattern and for sink nodes, it is
>> >> possible to have topic-extractors, and hence it's better to let users
>> >> leveraging on the lower-level APIs to construct the topic names
>> >> programmatically themselves."
>> >>
>> >>>
>> >>> (5) As John, I also noticed that `List<String> Store#sedersNames()` is
>> >>> not a great API. I am not sure if I understand your reply thought.
>> >>> AFAIK, there is no exiting API
>> >>>
>> >>>> List<Serde> StoreBuilder#serdes()
>> >>>
>> >>> (cf
>> >>>
>> >>
>> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/StoreBuilder.java
>> >> )
>> >>>
>> >> Ah yes, that's added as part of this KIP.
>> >>
>> >>>
>> >>> (6) Atm, we return `String` type for the Serdes. Do we think it's
>> >>> sufficient? Just want to double check.
>> >>
>> >> The reason is that we can only get the serde-name at the time of the
>> >> topology since it may be from config and hence there's no serde object
>> >> actually.
>> >>
>> >>> (10) Can we avoid coupling this KIP’s behavior to the choice of
>> ‘build’
>> >> method? I.e., can we return the improved description even when people
>> just
>> >> call ‘build()’?
>> >>
>> >> Yes, as I replied in the above comment to yours, I've changed my mind
>> to
>> >> just return the augmented description no matter of the function; and
>> will
>> >> expose toJson() for future compatibilities. I've not yet updated the
>> wiki
>> >> yet.
>> >>
>> >>> Clearly, we need a placeholder if no serde is specified. How about
>> >> “unknown”, or the name of the config keys,
>> >> “default.key.serde”/“default.value.serde”?
>> >>
>> >> I think if `build(props)` is used, we can use the name of the
>> configured
>> >> values; otherwise since we do not know the config yet we'd have to use
>> >> "unknown".
>> >>
>> >>> I still have some deep reservation about the ‘build(Parameters)’
>> method
>> >> itself. I don’t really want to side-track this conversation with all my
>> >> concerns if we can avoid it, though. It seems like justification enough
>> >> that introducing dramatically different behavior based in on seemingly
>> >> minor differences in api calls will be a source of mystery and
>> complexity
>> >> for users.
>> >>
>> >>> I.e., I’m characterizing a completely different string format as
>> >> “dramatically different”, as opposed to just having a placeholder
>> string.
>> >>
>> >>> (11) Regarding the wrapper serdes, I bet we can capture and print the
>> >> inner types as well.
>> >>
>> >> Ack, I can do that.
>> >>
>> >> On Sat, May 2, 2020 at 8:19 AM John Roesler <[email protected]>
>> wrote:
>> >>
>> >>> Hi all,
>> >>>
>> >>> I’ve been sitting on another concern about this proposal. Since
>> Matthias
>> >>> has just submitted a few questions, perhaps I can pile on two more
>> this
>> >>> round.
>> >>>
>> >>> (10) Can we avoid coupling this KIP’s behavior to the choice of
>> ‘build’
>> >>> method? I.e., can we return the improved description even when people
>> just
>> >>> call ‘build()’?
>> >>>
>> >>> Clearly, we need a placeholder if no serde is specified. How about
>> >>> “unknown”, or the name of the config keys,
>> >>> “default.key.serde”/“default.value.serde”?
>> >>>
>> >>> I still have some deep reservation about the ‘build(Parameters)’
>> method
>> >>> itself. I don’t really want to side-track this conversation with all
>> my
>> >>> concerns if we can avoid it, though. It seems like justification
>> enough
>> >>> that introducing dramatically different behavior based in on seemingly
>> >>> minor differences in api calls will be a source of mystery and
>> complexity
>> >>> for users.
>> >>>
>> >>> I.e., I’m characterizing a completely different string format as
>> >>> “dramatically different”, as opposed to just having a placeholder
>> string.
>> >>>
>> >>> (11) Regarding the wrapper serdes, I bet we can capture and print the
>> >>> inner types as well.
>> >>>
>> >>> Thanks again for the KIP!
>> >>> -John
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Apr 30, 2020, at 19:10, Matthias J. Sax wrote:
>> >>>> Guozhang,
>> >>>>
>> >>>> thanks for the KIP!
>> >>>>
>> >>>> Couple of comments/questions.
>> >>>>
>> >>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> -Matthias
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> On 4/25/20 1:24 PM, Guozhang Wang wrote:
>> >>>>> Hi John,
>> >>>>>
>> >>>>> Thanks for the review! Replied inline.
>> >>>>>
>> >>>>> On Fri, Apr 24, 2020 at 8:09 PM John Roesler <[email protected]>
>> >>> wrote:
>> >>>>>
>> >>>>>> Hi Guozhang,
>> >>>>>>
>> >>>>>> Thanks for the KIP! I took a quick look, and I'm really happy to
>> see
>> >>> this
>> >>>>>> underway.
>> >>>>>>
>> >>>>>> Some quick questions:
>> >>>>>>
>> >>>>>> 1.  Can you elaborate on the reason that stores just have a list of
>> >>>>>> serdes, whereas
>> >>>>>> other components have an explicit key/value serde?
>> >>>>>>
>> >>>>>
>> >>>>> This is because of the existing API "List<Serde>
>> >>> StoreBuilder#serdes()".
>> >>>>> Although both of its implementations would return two serdes (one
>> for
>> >>> key
>> >>>>> and one for value), the API is more general to return a list. And
>> >>> hence the
>> >>>>> TopologyDescription#Store which gets them directly from
>> StoreBuilder is
>> >>>>> exposing the same API.
>> >>>>>
>> >>>>> 1.5. A side-effect of this seems to be that the string-formatted
>> serde
>> >>>>>> description is
>> >>>>>> different, depending on whether the serdes are listed on a store
>> or a
>> >>>>>> topic. Just an
>> >>>>>> observation.
>> >>>>>>
>> >>>>>
>> >>>>> Yes I agree. I think we can probably change the "List<Serde>
>> >>>>> StoreBuilder#serdes()" signature as well (which would be a breaking
>> >>> change
>> >>>>> though, so we should do that via deprecation), but I'm a bit
>> concerned
>> >>>>> since it was designed for future store types which may not be of K-V
>> >>> format
>> >>>>> any more.
>> >>>>>
>> >>>>>
>> >>>>>> 2. You mentioned the key compatibility concern in my mind. We do
>> know
>> >>> that
>> >>>>>> such
>> >>>>>> use cases exist. Namely, our own tests and
>> >>>>>> https://zz85.github.io/kafka-streams-viz/
>> >>>>>> I'm wondering if we can add a forward-compatible machine-readable
>> >>> format
>> >>>>>> to the
>> >>>>>> KIP, so that even though we must break the parsers right now, maybe
>> >>> we'll
>> >>>>>> never
>> >>>>>> have to break them again. For example, I'm thinking of a "toJson"
>> >>> method
>> >>>>>> on the
>> >>>>>> TopologyDescription that formats the entire topology description
>> as a
>> >>> json
>> >>>>>> string.
>> >>>>>>
>> >>>>>>
>> >>>>> Yes, I also have concerns about that (as described in the
>> compatibility
>> >>>>> section). One proposal I have is that we ONLY augment the toString
>> >>> result
>> >>>>> if the TopologyDescription is from a Topology built from
>> >>>>> `StreamsBuilder#build(Properties)`, which is only recently added and
>> >>> hence
>> >>>>> most old usage would not get the benefits of it. But after thinking
>> >>> about
>> >>>>> this a bit more, I'm now more inclined to just always augment the
>> >>> string,
>> >>>>> and also add a `toJson` method in addition to `toString`.
>> >>>>>
>> >>>>>
>> >>>>>> Thanks again!
>> >>>>>> -John
>> >>>>>>
>> >>>>>> On Fri, Apr 24, 2020, at 00:26, Guozhang Wang wrote:
>> >>>>>>> Hello folks,
>> >>>>>>>
>> >>>>>>> I'd like to start the discussion for KIP-598:
>> >>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>
>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148648762
>> >>>>>>>
>> >>>>>>> It proposes to augment the topology description's sub-classes with
>> >>> store
>> >>>>>>> and source / sink serde information. Let me know what you think,
>> >>> thanks!
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>> -- Guozhang
>> >>>>>>>
>> >>>>>>
>> >>>>>
>> >>>>>
>> >>>>
>> >>>>
>> >>>> Attachments:
>> >>>> * signature.asc
>> >>>
>> >>
>> >>
>> >> --
>> >> -- Guozhang
>>
>>
>
> --
> -- Guozhang
>


-- 
-- Guozhang

Re: [DISCUSS] KIP-598: Augment TopologyDescription with store and source / sink serde information

Reply via email to