Hello Bill, I've made another pass over the wiki page and it lgtm now. Thanks!
Guozhang On Fri, Sep 14, 2018 at 3:05 PM, John Roesler <j...@confluent.io> wrote: > Hey Bill (and Guozhang), > > Apologies. I misunderstood what Guozhang was getting at in his earlier > remark. > > I'm a +1 on the current proposal. > > Thanks, > -John > > On Fri, Sep 14, 2018 at 3:10 PM Bill Bejeck <bbej...@gmail.com> wrote: > > > All, > > > > Thanks for the comments, I'll respond to all of your comments in the > > recieved order. > > > > > > Guozhang, > > > > > And if [join-name] not specified, stay the same, which is: > > > * [previous-processor-name]-repartition for both Stream-Stream (S-S) > join > > and S-T join > > > > I believe the current approach is > > [appId]-[previous-processor-name]-repartition, but the main point is if > no > > name is provided, the current naming scheme will remain in place > > > > > I think it is more natural to rename it to > > > * [app-id]-[join-name]-left-repartition > > > * [app-id]-[join-name]-right-repartition > > > > I agree and I'll update the KIP accordingly > > > > > 2 I'd suggest to use the name to also define the corresponding > processor > > > > I'll address this in conjunction with comments from Matthias and your > > second round of comments. > > > > > > >3. Could you also specify how this would affect the optimization for > > > > merging multiple repartition topics? > > > > With an optimization where we reduce the number of repartition > > topics, if the user has not named the repartition topics, we'll generate > a > > name for > > the optimized repartition topic. In the case where users have > > provided names for the repartition topics, > > we'll reuse the name from the first of the named repartition > > topics. > > However, optimizing a topology by reducing the number of > > repartition topics may require an application reset, but that is beyond > the > > scope of this KIP. > > > > I'll update the KIP with this case > > > > I've also addressed 4.a and 4.b in the "Compatibility, > > Deprecation and Migration plan" section > > > > > > Matthias > > > > > 1) We should either use `[app-id]-this|other-[join-name]-repartition` > or > > > `app-id]-[join-name]-left|right-repartition` but we should not change > > > the pattern depending if the user specifies a name of not. > > > > I've updated the KIP to say in the case of Joins and a name is provided > > we'll go with `app-id-name-left|right-repartition` in all cases if the > name > > is not provided > > we will continue to use the current naming process. > > > > > 2) I didn't see why we would need to do this in this KIP. KIP-307 seems > > > to be orthogonal, and thus KIP-372 should not change any processor > > > names, but KIP-307 should define a holistic strategy for all processor. > > > Otherwise, we might up with different strategies or revert what we > > > decide in this KIP if it's not compatible with KIP-307. > > > > I agree and have updated the name of this KIP to be more precise. > > > > Eno > > > > > I know we don't normally have a "Related work" section in KIPs, but > > > sometimes I find it useful to see what others have done in similar > cases. > > > Since this will be important for rolling re-deployments, I wonder what > > > other frameworks like Flink (or Samza) have done in these cases. > Perhaps > > > they have done nothing, in which case it's fine to do this from first > > > principles, but IMO it would be good to know just to make sure we're > > > heading in the right direction. > > > > > Also I don't get a good feel for how much work this will be for an end > > user > > > who is doing the rolling deployment, perhaps an end-to-end example > would > > > help. > > > > I can't speak for the other projects, so I'll have to take a look and > see. > > > > As for getting a feel for how much work this will be for an end user, > could > > you look at the > > updated KIP and see if it clears things up? > > > > > > Matthias > > > > > (1) For `Grouped` should we add `with(String name, Serde<K> key, > > > Serde<V> value)` to allow specifying all parameters at once? > > > > > Produced/Consumed/Serialized etc follow a similar pattern. There is one > > > static method for each config parameter, plus one static method that > > > accepts all parameters. Might be more consistent if we follow this > > pattern. > > > > > (2) It seems that `Serialized` is only used in `groupBy` and > > > `groupByKey` -- because both methods accepting `Serialized` parameter > > > are deprecated and replaced with methods accepting `Grouped`, it seems > > > that we also want to deprecate `Serialized`? > > > > > (3) About naming repartition topics: thinking about this once more, I > > > actually prefer to use `left|right` instead of `this|other` :) > > > > For 1, I agree and I'll update the KIP. > > > > For 2, Yes that seems to make sense and I'll update the KIP > > > > Part 3 addressed in earlier comments and the KIP is clarified. > > > > Guozhang > > > > > Just to clarify on 2): currently KIP-307 do not have proposed APIs for > > > `groupBy/groupByKey` naming schemes, and for joins its current proposal > > is > > > to extend ValueJoiner with Named and hence this part is what I meant to > > > have "overlaps". > > > > > Thinking about it a bit more, since Joined is only used for S-S and S-T > > > joins but not T-T joins, having the naming schemes on Joined would not > be > > > sufficient, and extending ValueJoiner would indeed be a good choice. > > > > > As for groupBy, since it is using KeyValueMapper which is supposed to > be > > > extended with Named in KIP-307, it does not require extending to > > processor > > > nodes as well. > > > > > Given this, I'm fine with limiting the scope to only repartition > topics. > > > > I agree with limiting the scope of this KIP to only repartition topics > and > > have updated the KIP > > > > John > > > > > I think it's slightly out of scope for this KIP, but I'm not sure it's > > > right to add a name to ValueJoiner or KeyValueMapper. > > > Both of those are "functional interfaces", that is, they are basically > > > named functions. > > > > > > It seems like we should preserve this property both to provide a clean > > > separation between the operation *logic* and the operator *config*. > > > > > > It's a good point that `Joined` doesn't appear in the KTable join. > > However, > > > KTable join does have the variant that accepts "Materialized". > > > This makes it similar to the grouping operators that currently accept > > > "Serialized", namely the operator is already configurable, but only in > > > a narrow way (we can only configure the materialization of the table or > > the > > > serialization of the grouping). > > > > > > Consider as an alternative the KStream.join operation that takes a > config > > > object called "Joined". This implies no more than that we are > > > configuring the join operation. If we are deciding to allow adding a > name > > > to the operation, it's natural to add it to the operation's config. > > > > > > Conversely, we also want to name the KTable.join operation, not the > > > materialization itself (which already has a name with separate > > semantics). > > > > > > To me, this suggests that, just like we propose to subsume the > Serialized > > > config for grouping into a more general config called Grouped, we > should > > > also want to do something similar and replace `KTable#join(KTable<>, > > > ValueJoiner<>, Materialized<>)` with one taking a more general > > > configuration, maybe: > > > `KTable#join(KTable<>, ValueJoiner<>, TableJoined<>)`? > > > > I'm not sure I completely follow what you are suggesting. But it seems > > like you are in favor of combining the naming of join > > > > and grouping operation and processors with this KIP. > > > > I'll say that I don't agree we should do the renaming in this KIP and > > restrict it to naming repartition topics > > > > > > 1. It gives us a good head start for users to be able to do rolling > > updates once repartition topics have been named > > 2. Undertaking any KIP changes almost always introduces some > uncertainty > > with respect to what changes will introduce, so a smaller scope helps > > ensure the success of the KIP > > 3. We have an existing proposal KIP-307 for processor naming and I > think > > it's best to do it all in one singular KIP so we don't have to undo > any > > inconsistent previous work. > > > > So I'll have to agree with Matthias and Guozhang in limiting the scope of > > this KIP to naming repartition topics. > > > > To that end, I'll also say we defer adding another general configuration > > object `TableJoined` to the work of naming operations and processors in > > KIP-307 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Sep 13, 2018 at 6:49 PM John Roesler <j...@confluent.io> wrote: > > > > > Hey all, > > > > > > I think it's slightly out of scope for this KIP, but I'm not sure it's > > > right to add a name to ValueJoiner or KeyValueMapper. > > > Both of those are "functional interfaces", that is, they are basically > > > named functions. > > > > > > It seems like we should preserve this property both to provide a clean > > > separation between the operation *logic* and the operator *config*. > > > > > > It's a good point that `Joined` doesn't appear in the KTable join. > > However, > > > KTable join does have the variant that accepts "Materialized". > > > This makes it similar to the grouping operators that currently accept > > > "Serialized", namely the operator is already configurable, but only in > > > a narrow way (we can only configure the materialization of the table or > > the > > > serialization of the grouping). > > > > > > Consider as an alternative the KStream.join operation that takes a > config > > > object called "Joined". This implies no more than that we are > > > configuring the join operation. If we are deciding to allow adding a > name > > > to the operation, it's natural to add it to the operation's config. > > > > > > Conversely, we also want to name the KTable.join operation, not the > > > materialization itself (which already has a name with separate > > semantics). > > > > > > To me, this suggests that, just like we propose to subsume the > Serialized > > > config for grouping into a more general config called Grouped, we > should > > > also want to do something similar and replace `KTable#join(KTable<>, > > > ValueJoiner<>, Materialized<>)` with one taking a more general > > > configuration, maybe: > > > `KTable#join(KTable<>, ValueJoiner<>, TableJoined<>)`? > > > > > > Does this make sense? > > > > > > Thanks, > > > -John > > > > > > On Thu, Sep 13, 2018 at 3:45 PM Guozhang Wang <wangg...@gmail.com> > > wrote: > > > > > > > Just to clarify on 2): currently KIP-307 do not have proposed APIs > for > > > > `groupBy/groupByKey` naming schemes, and for joins its current > proposal > > > is > > > > to extend ValueJoiner with Named and hence this part is what I meant > to > > > > have "overlaps". > > > > > > > > Thinking about it a bit more, since Joined is only used for S-S and > S-T > > > > joins but not T-T joins, having the naming schemes on Joined would > not > > be > > > > sufficient, and extending ValueJoiner would indeed be a good choice. > > > > > > > > As for groupBy, since it is using KeyValueMapper which is supposed to > > be > > > > extended with Named in KIP-307, it does not require extending to > > > processor > > > > nodes as well. > > > > > > > > > > > > Given this, I'm fine with limiting the scope to only repartition > > topics. > > > > > > > > > > > > Guozhang > > > > > > > > On Wed, Sep 12, 2018 at 10:22 PM, Matthias J. Sax < > > matth...@confluent.io > > > > > > > > wrote: > > > > > > > > > Follow up comments: > > > > > > > > > > 1) We should either use `[app-id]-this|other-[join- > name]-repartition` > > > or > > > > > `app-id]-[join-name]-left|right-repartition` but we should not > change > > > > > the pattern depending if the user specifies a name of not. I am > fine > > > > > with both patterns---just want to make sure with stick with one. > > > > > > > > > > 2) I didn't see why we would need to do this in this KIP. KIP-307 > > seems > > > > > to be orthogonal, and thus KIP-372 should not change any processor > > > > > names, but KIP-307 should define a holistic strategy for all > > processor. > > > > > Otherwise, we might up with different strategies or revert what we > > > > > decide in this KIP if it's not compatible with KIP-307. > > > > > > > > > > > > > > > -Matthias > > > > > > > > > > > > > > > On 9/12/18 6:28 PM, Guozhang Wang wrote: > > > > > > Hello Bill, > > > > > > > > > > > > I made a pass over your proposal and here are some questions: > > > > > > > > > > > > 1. For Joined names, the current proposal is to define the > > > repartition > > > > > > topic names as > > > > > > > > > > > > * [app-id]-this-[join-name]-repartition > > > > > > > > > > > > * [app-id]-other-[join-name]-repartition > > > > > > > > > > > > > > > > > > And if [join-name] not specified, stay the same, which is: > > > > > > > > > > > > * [previous-processor-name]-repartition for both Stream-Stream > > (S-S) > > > > > join > > > > > > and S-T join > > > > > > > > > > > > I think it is more natural to rename it to > > > > > > > > > > > > * [app-id]-[join-name]-left-repartition > > > > > > > > > > > > * [app-id]-[join-name]-right-repartition > > > > > > > > > > > > > > > > > > 2. I'd suggest to use the name to also define the corresponding > > > > processor > > > > > > names accordingly, in addition to the repartition topic names. > Note > > > > that > > > > > > for joins, this may be overlapping with KIP-307 > > > > > > <https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > 307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL> > > > > > > as > > > > > > it also have proposals for defining processor names for join > > > operators > > > > as > > > > > > well. > > > > > > > > > > > > 3. Could you also specify how this would affect the optimization > > for > > > > > > merging multiple repartition topics? > > > > > > > > > > > > 4. In the "Compatibility, Deprecation, and Migration Plan" > section, > > > > could > > > > > > you also mention the following scenarios, if any of the upgrade > > path > > > > > would > > > > > > be changed: > > > > > > > > > > > > a) changing user DSL code: under which scenarios users can now > do > > a > > > > > > rolling bounce instead of resetting applications. > > > > > > > > > > > > b) upgrading from older version to new version, with all the > names > > > > > > specified, and with optimization turned on. E.g. say we have the > > code > > > > > > written in 2.1 with all names specified, and now upgrading to 2.2 > > > with > > > > > new > > > > > > optimizations that may potentially change the repartition topics. > > Is > > > > that > > > > > > always safe to do? > > > > > > > > > > > > > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > > > On Wed, Sep 12, 2018 at 4:52 PM, Bill Bejeck <bbej...@gmail.com> > > > > wrote: > > > > > > > > > > > >> All I'd like to start a discussion on KIP-372 for the naming of > > > joins > > > > > and > > > > > >> grouping operations in Kafka Streams. > > > > > >> > > > > > >> The KIP page can be found here: > > > > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP- > > > > > >> 372%3A+Naming+Joins+and+Grouping > > > > > >> > > > > > >> I look forward to feedback and comments. > > > > > >> > > > > > >> Thanks, > > > > > >> Bill > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > -- Guozhang > > > > > > > > > > -- -- Guozhang