Re: [DISCUSS] KIP-307: Allow to define custom processor names with KStreams DSL

John Roesler Fri, 14 Dec 2018 10:25:00 -0800

Hi Florian,

Sorry about the run-around of rejecting the original proposal,
only to return to it later on. Hopefully, it's more encouraging
than frustrating that we're coming around to your initial way of
thinking.


Thanks!
-John

On Thu, Dec 13, 2018 at 4:28 PM Florian Hussonnois <[email protected]>
wrote:

> Hi all,
>
> Thanks again. I agree with your propositions.
> Also IMHO, overloading all methods (filter, map) to accept a new control
> object seems to provide a more natural development experience for users.
>
> Actually, this was the first proposition for this KIP, but we have rejected
> it because this solution led to adding a lot of new methods.
> As you mentioned it, the API has evolve since the creation of this KIP -
> some existing control objects already allow to customize internal names. We
> should so keep on that strategy.
>
> If everyone is OK with that, I will update the KIP and the PR accordingly;
>
> Thanks.
>
> Le jeu. 13 déc. 2018 à 18:08, John Roesler <[email protected]> a écrit :
>
> > Hi again, all,
> >
> > Matthias, I agree with you.
> >
> > Florian, thanks for your response.
> >
> > I think your proposal is the best way to address the ask for hiding the
> > name() getter. But I'd like to question that ask and instead propose that
> > we just make the name() getter part of the public API.
> >
> > The desire to "hide" the getters causes a lot of complexity in our code
> > base, and it will become completely impractical with the mixin strategy
> of
> > Named.
> >
> > If we were to switch strategies back to mixing Named in to the control
> > objects rather than the functions, then the path forward becomes quite
> > clear.
> >
> > On the other hand, it seems harmless for anyone who wants to be able to
> > query the name from a control object after setting it, so my vote would
> be
> > simply to keep the Named interface as:
> >
> > public interface Named<T extends Named<T>> {
> >   String name();
> >   T withName(String name);
> > }
> >
> > Under this proposal, we only mix Named in to the control objects, which
> > means we have no need of default implementations anymore (because we can
> > update all the control objects concurrently with adding this interface to
> > them).
> >
> > This does hinge on switching over to a control-object-only strategy,
> which
> > introduces the need to add about 50 new control object classes, which
> would
> > only serve to implement Named. As a middle ground, maybe we could just
> add
> > one generic control object class, like:
> >
> > public class NamedOperation implements Named<NamedOperation> {
> >   private final String name;
> >   private NamedOperation(final String name) { this.name = name; }
> >   public static NamedOperation name(final String name) {
> >     return new NamedOperation(name);
> >   }
> >   public String name() { return name; }
> >   public NamedOperation withName(final String name) {
> >     return new NamedOperation(name);
> >   }
> > }
> >
> > And then, we'd add overloads for all the methods that don't have control
> > objects already (for example, filter() ):
> >
> > // existing
> > KStream<K, V> filter(Predicate<? super K, ? super V> predicate);
> >
> > // new
> > KStream<K, V> filter(Predicate<? super K, ? super V> predicate,
> > NamedOperation named);
> >
> > Additionally, in regard to Matthias's point about existing control
> objects
> > with naming semantics, they would extend Named (but not NamedOperation)
> for
> > uniformity.
> >
> > You provided a good approach to hide the getter with your SettableName
> > class; I think what you proposed is the only way we could hide the name.
> > In the end, though, it's a lot of complexity added (control object class
> > hierarchy, inheritance, mutable state, internal casting) for something of
> > dubious value: to be able to hide the name from someone *after they
> > themselves have set it*.
> >
> > Although it'll be a pain, perhaps Matthias's suggestion to enumerate all
> > the API methods is the best way to be sure we all agree on what's going
> to
> > happen.
> >
> > Thanks again for wrangling with this issue,
> > -John
> >
> > On Thu, Dec 13, 2018 at 9:03 AM Matthias J. Sax <[email protected]>
> > wrote:
> >
> > > Just catching up on this discussion.
> > >
> > > My overall personal take is, that I am not a big fan of the interface
> > > `Named` that is used as a factory. I would rather prefer to add a
> > > control object parameter to all methods that don't have one yet. This
> > > KIP was started a while ago, and we added new naming capabilities in
> the
> > > meantime. Guozhang's example in the PR comment about naming in
> > > stream-stream join shows, that we might end up in a confusion situation
> > > for users if we use `Named`. Also, in 2.1, user can already name as
> > > repartition-/changelog-topics and stores. Thus, KIP-307 boils down to
> > > provide non-functional naming?
> > >
> > > Hence, for all methods that allow to specify names already, I don't see
> > > any reason to change them, but use the existing API to also name the
> > > processor(s) instead of allowing uses to specify a new name.
> > >
> > > About the inconsistency in method naming. I agree, that `as` is very
> > > generic and maybe not the best choice.
> > >
> > > I think it might be helpful, to have a table overview in the KIP, that
> > > list all existing static/non-static methods that allow to specify a
> > > name, plus a columns with the new suggested naming for those methods?
> > >
> > > Thoughts?
> > >
> > >
> > > -Matthias
> > >
> > >
> > > On 12/12/18 12:45 AM, Florian Hussonnois wrote:
> > > > Thank you very much for your feedbacks.
> > > >
> > > > Currently, there is still lot of discussions regarding the Named
> > > interface.
> > > > On the one hand we should provided consistency over the stream API
> and
> > on
> > > > the other hand we should not break the semantic as John point it up.
> > > >
> > > > Guozhang, I'm sorry, but I'm little bit confused, maybe I missed
> > > something.
> > > > In your comment you have suggested that :
> > > > * Produced/Consumed/Suppressed should extends Named
> > > > * Named should have a private-package method to get the specified
> > > processor
> > > > name internally (processorName())
> > > > * Finally we should end up with something like :  Named -> XXX ->
> > > > XXXInternal or Named -> Produced -> ProducedInternal
> > > >
> > > > The objective behind that is to :
> > > > * consolidate the internal method processorName()
> > > > * consolidate the method withName that exists now existing into
> > Produced,
> > > > Consumed and Suppressed.
> > > >
> > > > But, Named is an interface so we can't define a private-package
> method
> > on
> > > > it. Also, for example Produced and ProducedInternal are not in the
> same
> > > > package so having a private-package method doesn't really help.
> > > > In addition, if we add the withName method into Named interface this
> > can
> > > > become confusing for developers because action interfaces
> (ValueMapper,
> > > > Reducer, etc) extend it.
> > > > The interface would look like :
> > > >
> > > > public interface Named<T extends Named<T>> {
> > > >     default String name() {
> > > >         return null;
> > > >     }
> > > >     default Named<T> withName(final String name) {
> > > >         return null;
> > > >     }
> > > > ...
> > > > }
> > > >
> > > > So maybe instead of adding another method to Named we could create a
> > new
> > > > package-private class that could be extended by
> > > > Produced/Consumed/Joined/Suppressed. For exemple,
> > > > class SettableName<T extends SettableName<T>> implements Named {
> > > >
> > > >     protected String processorName;
> > > >
> > > >     SettableName(final SettableName settable) {
> > > >         this(Objects.requireNonNull(settable, "settable can't be
> > > > null").name());
> > > >     }
> > > >
> > > >     SettableName(final String processorName) {
> > > >         this.processorName = processorName;
> > > >     }
> > > >
> > > >     @Override
> > > >     public String name() {
> > > >         return processorName;
> > > >     }
> > > >     public T withName(final String processorName) {
> > > >         this.processorName = processorName;
> > > >         return (T)this;
> > > >     }
> > > > }
> > > >
> > > > In that way, we will get : public class Produced implements
> > > > SettableName<Produced> { ...
> > > >
> > > > WDYT?
> > > >
> > > >
> > > > Le mar. 11 déc. 2018 à 02:46, Guozhang Wang <[email protected]> a
> > > écrit :
> > > >
> > > >> I had one meta comment on the PR:
> > > >> https://github.com/apache/kafka/pull/5909#discussion_r240447153
> > > >>
> > > >> On Mon, Dec 10, 2018 at 5:22 PM John Roesler <[email protected]>
> > wrote:
> > > >>
> > > >>> Hi Florian,
> > > >>>
> > > >>> I hope it's ok if I ask a few questions at this late stage...
> > > >>>
> > > >>> Comment 1 ======
> > > >>>
> > > >>> It seems like the proposal is to add a new "Named" interface that
> is
> > > >>> intended to be mixed in with the existing API objects at various
> > > points.
> > > >>>
> > > >>> Just to preface some of my comments, it looks like your KIP was
> > created
> > > >>> quite a while ago, so the API may have changed somewhat since you
> > > >> started.
> > > >>>
> > > >>> As I see the API, there are a few different kinds of DSL method
> > > >> arguments:
> > > >>> * functions: things like Initializer, Aggregator, ValueJoiner,
> > > >>> ForEachAction... All of these are essentially Streams-flavored
> > Function
> > > >>> interfaces with different arities, type bounds, and semantics.
> > > >>> * config objects: things like Produced, Consumed, Joined,
> Grouped...
> > > >> These
> > > >>> are containers for configurations, where the target of the
> > > configuration
> > > >> is
> > > >>> the operation itself
> > > >>> * raw configurations: things like a raw topic-name string and
> > > >> Materialized:
> > > >>> These are configurations for operations that have no config object,
> > and
> > > >> for
> > > >>> various reasons, we didn't make one. The distinguishing feature is
> > that
> > > >> the
> > > >>> target of the configuration is not the operation itself, but some
> > > aspect
> > > >> of
> > > >>> it. For example, in Materialized, we are not setting the caching
> > > behavior
> > > >>> of, for example, an aggregation; we're setting the caching behavior
> > of
> > > a
> > > >>> materialized state store attached to the aggregation.
> > > >>>
> > > >>> It seems like choosing to mix the Named interface in with the
> > functions
> > > >> has
> > > >>> a couple of unfortunate side-effects:
> > > >>> * Aggregator is not the only function passed to any of the relevant
> > > >>> aggregate methods, so it seems a little arbitrary to pick that
> > function
> > > >>> over Initializer or Merger.
> > > >>> * As you noted, branch() takes an array of Predicate, so we just
> > ignore
> > > >> the
> > > >>> provided name(s), even though Predicate names are used elsewhere.
> > > >>> * Not all things that we want to name have function arguments,
> > notably
> > > >>> source and sink, so we'd switch paradigms and use the config object
> > > >>> instead.
> > > >>> * Adding an extra method to the function interfaces means that
> those
> > > are
> > > >> no
> > > >>> longer SAM interfaces. You proposed to add a default
> implementation,
> > so
> > > >> we
> > > >>> could still pass a lambda if we don't want to set the name, but if
> we
> > > >> *do*
> > > >>> want to set the name, we can no longer use lambdas.
> > > >>>
> > > >>> I think the obvious other choice would be to mix Named in with the
> > > config
> > > >>> objects instead, but this has one main downside of its own...
> > > >>> * not every operator we wish to name has a config object. I don't
> > know
> > > if
> > > >>> everyone involved is comfortable with adding a config object to
> every
> > > >>> operator that's missing one.
> > > >>>
> > > >>> Personally, I favor moving toward a more consistent state that's
> > > forward
> > > >>> compatible with any further changes we wish to make. I *think* that
> > > >> giving
> > > >>> every operator two forms (one with no config and one with a config
> > > >> object)
> > > >>> would be such an API.
> > > >>>
> > > >>> Comment 2 =========
> > > >>>
> > > >>> Finally, just a minor comment: the static method in Named wouldn't
> > work
> > > >>> properly as defined. Assuming that we mix Named in with Produced,
> for
> > > >>> example, we'd need to be able to use it like:
> > > >>>>  kStream.to("out", Produced.with("myOut"))
> > > >>> This doesn't work because with() returns a Named, but we need a
> > > Produced.
> > > >>>
> > > >>> We can pull off a builder method in the interface, but not a static
> > > >> method.
> > > >>> To define a builder method in the interface that returns an
> instance
> > of
> > > >> the
> > > >>> concrete subtype, you have to use the "curiously recurring generic"
> > > >>> pattern.
> > > >>>
> > > >>> It would look like:
> > > >>>
> > > >>> public interface Named<N extends Named<N>> {
> > > >>>   String name();
> > > >>>   N withName(String name);
> > > >>> }
> > > >>>
> > > >>> You can see where the name of the pattern comes from ;)
> > > >>> An implementation would then look like:
> > > >>>
> > > >>> public class Produced implements Named<Produced> {
> > > >>>   String name() { return name; }
> > > >>>   Produced withName(final String name) { this.name = name; return
> > > this;
> > > >> }
> > > >>> }
> > > >>>
> > > >>> Note that the generic parameter gets filled in properly in the
> > > >> implementing
> > > >>> class, so that you get the right return type out.
> > > >>>
> > > >>> It doesn't work at all with a static factory method at the
> interface
> > > >> level,
> > > >>> so it would be up to Produced to define a static factory if it
> wants
> > to
> > > >>> present one.
> > > >>>
> > > >>> ======
> > > >>>
> > > >>> Those are my two feedbacks!
> > > >>>
> > > >>> I hope you find this helpful, rather than frustrating. I'm sorry I
> > > didn't
> > > >>> get a chance to comment sooner.
> > > >>>
> > > >>> Thanks for the KIP, I think it will be much nicer to be able to
> name
> > > the
> > > >>> processor nodes.
> > > >>>
> > > >>> -John
> > > >>>
> > > >>> On Tue, Nov 27, 2018 at 6:34 PM Guozhang Wang <[email protected]>
> > > >> wrote:
> > > >>>
> > > >>>> Hi Florian,
> > > >>>>
> > > >>>> I've made a pass over the PR. There are some comments that are
> > related
> > > >> to
> > > >>>> the function names which may be affecting the KIP wiki page, but
> > > >> overall
> > > >>> I
> > > >>>> think it looks good already.
> > > >>>>
> > > >>>>
> > > >>>> Guozhang
> > > >>>>
> > > >>>>
> > > >>>> On Fri, Nov 16, 2018 at 4:21 PM Guozhang Wang <[email protected]
> >
> > > >>> wrote:
> > > >>>>
> > > >>>>> Thanks Florian! I will take a look at the PR.
> > > >>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> On Mon, Nov 12, 2018 at 2:44 PM Florian Hussonnois <
> > > >>>> [email protected]>
> > > >>>>> wrote:
> > > >>>>>
> > > >>>>>> Hi Matthias,
> > > >>>>>>
> > > >>>>>> Sorry I was absent for a while. I have started a new PR for this
> > > >> KIP.
> > > >>> It
> > > >>>>>> is
> > > >>>>>> still in progress for now. I'm working on it.
> > > >>>>>> https://github.com/apache/kafka/pull/5909
> > > >>>>>>
> > > >>>>>> Le ven. 19 oct. 2018 à 20:13, Matthias J. Sax <
> > > >> [email protected]>
> > > >>> a
> > > >>>>>> écrit :
> > > >>>>>>
> > > >>>>>>> What is the status of this KIP?
> > > >>>>>>>
> > > >>>>>>> -Matthias
> > > >>>>>>>
> > > >>>>>>> On 7/19/18 5:17 PM, Guozhang Wang wrote:
> > > >>>>>>>> Hello Florian,
> > > >>>>>>>>
> > > >>>>>>>> Sorry for being late... Found myself keep apologizing for late
> > > >>>> replies
> > > >>>>>>>> these days. But I do want to push this KIP's progress forward
> > > >> as I
> > > >>>>>> see it
> > > >>>>>>>> very important and helpful feature for extensibility.
> > > >>>>>>>>
> > > >>>>>>>> About the exceptions, I've gone through them and hopefully it
> is
> > > >>> an
> > > >>>>>>>> exhaustive list:
> > > >>>>>>>>
> > > >>>>>>>> 1. KTable#toStream()
> > > >>>>>>>> 2. KStream#merge(KStream)
> > > >>>>>>>> 3. KStream#process() / transform() / transformValues()
> > > >>>>>>>> 4. KGroupedTable / KGroupedStream#count()
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Here's my reasoning:
> > > >>>>>>>>
> > > >>>>>>>> * It is okay not letting users to override the name for 1/2,
> > > >> since
> > > >>>>>> they
> > > >>>>>>> are
> > > >>>>>>>> too trivial to be useful for debugging, plus their processor
> > > >> names
> > > >>>>>> would
> > > >>>>>>>> not determine any related topic / store names.
> > > >>>>>>>> * For 3, I'd vote for adding overloaded functions with Named.
> > > >>>>>>>> * For 4, if users really want to name the processor she can
> call
> > > >>>>>>>> aggregate() instead, so I think it is okay to skip this case.
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> Guozhang
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> On Fri, Jul 6, 2018 at 3:06 PM, Florian Hussonnois <
> > > >>>>>>> [email protected]>
> > > >>>>>>>> wrote:
> > > >>>>>>>>
> > > >>>>>>>>> Hi,
> > > >>>>>>>>>
> > > >>>>>>>>> The option #3 seems to be a good alternative and I find the
> API
> > > >>>> more
> > > >>>>>>>>> elegant (thanks John).
> > > >>>>>>>>>
> > > >>>>>>>>> But, we still have the need to overload some methods either
> > > >>> because
> > > >>>>>>> they do
> > > >>>>>>>>> not accept an action instance or because they are translated
> to
> > > >>>>>> multiple
> > > >>>>>>>>> processors.
> > > >>>>>>>>>
> > > >>>>>>>>> For example, this is the case for methods branch() and
> merge().
> > > >>> We
> > > >>>>>> could
> > > >>>>>>>>> introduce a new interface Named (or maybe a different name ?)
> > > >>> with
> > > >>>> a
> > > >>>>>>> method
> > > >>>>>>>>> name(). All action interfaces could extend this one to
> > > >> implement
> > > >>>> the
> > > >>>>>>> option
> > > >>>>>>>>> 3).
> > > >>>>>>>>> This would result by having the following overloads  :
> > > >>>>>>>>>
> > > >>>>>>>>> Stream<K, V> merge(final Named name, final KStream<K, V>
> > > >> stream);
> > > >>>>>>>>> KStream<K, V>[] branch(final Named name, final Predicate<?
> > > >> super
> > > >>>> K, ?
> > > >>>>>>> super
> > > >>>>>>>>> V>... predicates)
> > > >>>>>>>>>
> > > >>>>>>>>> N.B : The list above is  not exhaustive
> > > >>>>>>>>>
> > > >>>>>>>>> ---------
> > > >>>>>>>>> user's code will become :
> > > >>>>>>>>>
> > > >>>>>>>>>         KStream<String, Integer> stream =
> > > >> builder.stream("test");
> > > >>>>>>>>>         KStream<String, Integer>[] branches =
> > > >>>>>>>>> stream.branch(Named.with("BRANCH-STREAM-ON-VALUE"),
> > > >>>>>>>>>                 Predicate.named("STREAM-PAIR-VALUE", (k, v)
> ->
> > > >> v
> > > >>> %
> > > >>>> 2
> > > >>>>>> ==
> > > >>>>>>>>> 0),
> > > >>>>>>>>>                 Predicate.named("STREAM-IMPAIR-VALUE", (k, v)
> > > >> ->
> > > >>> v
> > > >>>> %
> > > >>>>>> 2
> > > >>>>>>> !=
> > > >>>>>>>>> 0));
> > > >>>>>>>>>
> > > >>>>>>>>>         branches[0].to("pair");
> > > >>>>>>>>>         branches[1].to("impair");
> > > >>>>>>>>> ---------
> > > >>>>>>>>>
> > > >>>>>>>>> This is a mix of the options 3) and 1)
> > > >>>>>>>>>
> > > >>>>>>>>> Le ven. 6 juil. 2018 à 22:58, Guozhang Wang <
> > > >> [email protected]>
> > > >>> a
> > > >>>>>>> écrit :
> > > >>>>>>>>>
> > > >>>>>>>>>> Hi folks, just to summarize the options we have so far:
> > > >>>>>>>>>>
> > > >>>>>>>>>> 1) Add a new "as" for KTable / KStream, plus adding new
> fields
> > > >>> for
> > > >>>>>>>>>> operators-returns-void control objects (the current wiki's
> > > >>>>>> proposal).
> > > >>>>>>>>>>
> > > >>>>>>>>>> Pros: no more overloads.
> > > >>>>>>>>>> Cons: a bit departing with the current high-level API design
> > > >> of
> > > >>>> the
> > > >>>>>>> DSL,
> > > >>>>>>>>>> plus, the inconsistency between operators-returns-void and
> > > >>>>>>>>>> operators-not-return-voids.
> > > >>>>>>>>>>
> > > >>>>>>>>>> 2) Add overloaded functions for all operators, that accepts
> a
> > > >>> new
> > > >>>>>>> control
> > > >>>>>>>>>> object "Described".
> > > >>>>>>>>>>
> > > >>>>>>>>>> Pros: consistent with current APIs.
> > > >>>>>>>>>> Cons: lots of overloaded functions to add.
> > > >>>>>>>>>>
> > > >>>>>>>>>> 3) Add another default function in the interface (thank you
> > > >> J8!)
> > > >>>> as
> > > >>>>>>> John
> > > >>>>>>>>>> proposed.
> > > >>>>>>>>>>
> > > >>>>>>>>>> Pros: no overloaded functions, no "Described".
> > > >>>>>>>>>> Cons: do we lose lambda functions really (seems not if we
> > > >>> provide
> > > >>>> a
> > > >>>>>>>>> "named"
> > > >>>>>>>>>> for each func)? Plus "Described" may be more extensible
> than a
> > > >>>>>> single
> > > >>>>>>>>>> `String`.
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> My principle of considering which one is better depends
> > > >>> primarily
> > > >>>> on
> > > >>>>>>> "how
> > > >>>>>>>>>> to make advanced users easily use the additional API, while
> > > >>>> keeping
> > > >>>>>> it
> > > >>>>>>>>>> hidden from normal users who do not care at all". For that
> > > >>>> purpose I
> > > >>>>>>>>> think
> > > >>>>>>>>>> 3) > 1) > 2).
> > > >>>>>>>>>>
> > > >>>>>>>>>> One caveat though, is that changing the interface would not
> be
> > > >>>>>>>>>> binary-compatible though source-compatible, right? I.e.
> users
> > > >>> need
> > > >>>>>> to
> > > >>>>>>>>>> recompile their code though no changes needed.
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> Another note: for 3), if we really want to keep
> extensibility
> > > >> of
> > > >>>>>>>>> Described
> > > >>>>>>>>>> we could do sth. like:
> > > >>>>>>>>>>
> > > >>>>>>>>>> ---------
> > > >>>>>>>>>>
> > > >>>>>>>>>> public interface Predicate<K, V> {
> > > >>>>>>>>>>     // existing method
> > > >>>>>>>>>>     boolean test(final K key, final V value);
> > > >>>>>>>>>>
> > > >>>>>>>>>>     // new default method adds the ability to name the
> > > >> predicate
> > > >>>>>>>>>>     default Described described() {
> > > >>>>>>>>>>         return new Described(null);
> > > >>>>>>>>>>     }
> > > >>>>>>>>>> }
> > > >>>>>>>>>>
> > > >>>>>>>>>> ----------
> > > >>>>>>>>>>
> > > >>>>>>>>>> where user's code becomes:
> > > >>>>>>>>>>
> > > >>>>>>>>>> stream.filter(named("key", (k, v) -> true));   // note
> `named`
> > > >>> now
> > > >>>>>> just
> > > >>>>>>>>>> sets a Described("key") in "described()".
> > > >>>>>>>>>>
> > > >>>>>>>>>> stream.filter(described(Described.as("key", /* any other
> fancy
> > > >>>>>>>>> parameters
> > > >>>>>>>>>> in the future*/), (k, v) -> true));
> > > >>>>>>>>>> ----------
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> I feel it is not much likely that we'd need to extend it
> > > >> further
> > > >>>> in
> > > >>>>>> the
> > > >>>>>>>>>> future, so just a `String` would be good enough. But just
> > > >>> listing
> > > >>>>>> all
> > > >>>>>>>>>> possibilities here.
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> Guozhang
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> On Fri, Jul 6, 2018 at 8:19 AM, John Roesler <
> > > >> [email protected]
> > > >>>>
> > > >>>>>>> wrote:
> > > >>>>>>>>>>
> > > >>>>>>>>>>> Hi Florian,
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Sorry I'm late to the party, but I missed the message
> > > >>> originally.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Regarding the names, it's probably a good idea to stick to
> > > >> the
> > > >>>> same
> > > >>>>>>>>>>> character set we're currently using: letters, numbers, and
> > > >>>> hyphens.
> > > >>>>>>> The
> > > >>>>>>>>>>> names are used in Kafka topics, files and folders, and
> > > >> RocksDB
> > > >>>>>>>>> databases,
> > > >>>>>>>>>>> and we also need them to work with the file systems of
> > > >> Windows,
> > > >>>>>> Linux,
> > > >>>>>>>>>> and
> > > >>>>>>>>>>> MacOS. My opinion is that with a situation like that, it's
> > > >>> better
> > > >>>>>> to
> > > >>>>>>> be
> > > >>>>>>>>>>> conservative. It might also be a good idea to impose an
> upper
> > > >>>>>> limit on
> > > >>>>>>>>>> name
> > > >>>>>>>>>>> length to avoid running afoul of any of those systems.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> ---
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> It seems like there's a small debate between 1) adding a
> new
> > > >>>>>> method to
> > > >>>>>>>>>>> KStream (and maybe KTable) to modify its name after the
> fact,
> > > >>> or
> > > >>>> 2)
> > > >>>>>>>>>>> piggy-backing on the config objects where they exist and
> > > >> adding
> > > >>>> one
> > > >>>>>>>>> where
> > > >>>>>>>>>>> they don't. To me, #2 is the better alternative even though
> > > >> it
> > > >>>>>>> produces
> > > >>>>>>>>>>> more overloads and may be a bit awkward in places.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> The reason is simply that #1 is a high-level departure from
> > > >> the
> > > >>>>>>>>>>> graph-building paradigm we're using in the DSL. Consider:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Graph.node1(config).node2(config)
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> vs
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Graph.node1().config().node2().config()
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> We could have done either, but we picked the former. I
> think
> > > >>> it's
> > > >>>>>>>>>> probably
> > > >>>>>>>>>>> a good goal to try and stick to it so that developers can
> > > >>> develop
> > > >>>>>> and
> > > >>>>>>>>>> rely
> > > >>>>>>>>>>> on their instincts for how the DSL will behave.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> I do want to present one alternative to adding new config
> > > >>>> objects:
> > > >>>>>> we
> > > >>>>>>>>> can
> > > >>>>>>>>>>> just add a "name()" method to all our "action" interfaces.
> > > >> For
> > > >>>>>>> example,
> > > >>>>>>>>>>> I'll demonstrate how we can add a "name" to Predicate and
> > > >> then
> > > >>>> use
> > > >>>>>> it
> > > >>>>>>>>> to
> > > >>>>>>>>>>> name a "KStream#filter" DSL operator:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> public interface Predicate<K, V> {
> > > >>>>>>>>>>>     // existing method
> > > >>>>>>>>>>>     boolean test(final K key, final V value);
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>     // new default method adds the ability to name the
> > > >>> predicate
> > > >>>>>>>>>>>     default String name() {
> > > >>>>>>>>>>>         return null;
> > > >>>>>>>>>>>     }
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>     // new static factory method adds the ability to wrap
> > > >>> lambda
> > > >>>>>>>>>> predicates
> > > >>>>>>>>>>> with a named predicate
> > > >>>>>>>>>>>     static <K, V> Predicate<K, V> named(final String name,
> > > >>> final
> > > >>>>>>>>>>> Predicate<K, V> predicate) {
> > > >>>>>>>>>>>         return new Predicate<K, V>() {
> > > >>>>>>>>>>>             @Override
> > > >>>>>>>>>>>             public boolean test(final K key, final V
> value) {
> > > >>>>>>>>>>>                 return predicate.test(key, value);
> > > >>>>>>>>>>>             }
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>             @Override
> > > >>>>>>>>>>>             public String name() {
> > > >>>>>>>>>>>                 return name;
> > > >>>>>>>>>>>             }
> > > >>>>>>>>>>>         };
> > > >>>>>>>>>>>     }
> > > >>>>>>>>>>> }
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Then, here's how it would look to use it:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> // Anonymous predicates continue to work just fine
> > > >>>>>>>>>>> stream.filter((k, v) -> true);
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> // Devs can swap in a Predicate that implements the name()
> > > >>>> method.
> > > >>>>>>>>>>> stream.filter(new Predicate<Object, Object>() {
> > > >>>>>>>>>>>     @Override
> > > >>>>>>>>>>>     public boolean test(final Object key, final Object
> > > >> value) {
> > > >>>>>>>>>>>         return true;
> > > >>>>>>>>>>>     }
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>     @Override
> > > >>>>>>>>>>>     public String name() {
> > > >>>>>>>>>>>         return "hey";
> > > >>>>>>>>>>>     }
> > > >>>>>>>>>>> });
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> // Or they can wrap their existing lambda using the static
> > > >>>> factory
> > > >>>>>>>>> method
> > > >>>>>>>>>>> stream.filter(named("key", (k, v) -> true));
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Just a thought.
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Overall, I think it's really valuable to be able to name
> the
> > > >>>>>>>>> processors,
> > > >>>>>>>>>>> for all the reasons you mentioned in the KIP. So thank you
> > > >> for
> > > >>>>>>>>>> introducing
> > > >>>>>>>>>>> this!
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> Thanks,
> > > >>>>>>>>>>> -John
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> On Thu, Jul 5, 2018 at 4:53 PM Florian Hussonnois <
> > > >>>>>>>>> [email protected]
> > > >>>>>>>>>>>
> > > >>>>>>>>>>> wrote:
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>> Hi, thank you very much for all you suggestions. I've
> > > >> started
> > > >>> to
> > > >>>>>>>>> update
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>> KIP (
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >>>>>>>>>>>
> > > >> 307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL
> > > >>>>>>>>>>>> ).
> > > >>>>>>>>>>>> Also, I propose to rename the Processed class into
> > > >> Described -
> > > >>>>>> this
> > > >>>>>>>>>> will
> > > >>>>>>>>>>> be
> > > >>>>>>>>>>>> more meaningful (but this is just a detail).
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> I'm OK to not enforcing uppercase for specific names but
> > > >>> should
> > > >>>> we
> > > >>>>>>>>>> allow
> > > >>>>>>>>>>>> arbitrary names with whitespaces for example ? Currently,
> I
> > > >>>> can't
> > > >>>>>>>>> tell
> > > >>>>>>>>>> if
> > > >>>>>>>>>>>> this can lead to some side effects ?
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> Le lun. 11 juin 2018 à 01:31, Matthias J. Sax <
> > > >>>>>> [email protected]
> > > >>>>>>>>>>
> > > >>>>>>>>>> a
> > > >>>>>>>>>>>> écrit :
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>>> Just catching up on this thread.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> I like the general idea. Couple of comments:
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>  - I think that adding `Processed` (or maybe a different
> > > >>> name?)
> > > >>>>>> is
> > > >>>>>>>>> a
> > > >>>>>>>>>>>>> valid proposal for stateless operators that only have a
> > > >>> single
> > > >>>>>>>>>> overload
> > > >>>>>>>>>>>>> atm. It would align with the overall API design.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>  - for all methods with multiple existing overloads, we
> can
> > > >>>>>>>>> consider
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>>> extend `Consumed`, `Produced`, `Materialized` etc to take
> > > >> an
> > > >>>>>>>>>> additional
> > > >>>>>>>>>>>>> processor name (not sure atm how elegant this is; we
> would
> > > >>> need
> > > >>>>>> to
> > > >>>>>>>>>>>>> "play" with the API a little bit; the advantage would be,
> > > >>> that
> > > >>>> we
> > > >>>>>>>>> do
> > > >>>>>>>>>>> not
> > > >>>>>>>>>>>>> add more overloads what seems to be key for this KIP)
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>  - operators return void: while I agree that the "name
> > > >> first"
> > > >>>>>>>>>> chaining
> > > >>>>>>>>>>>>> idea is not very intuitive, it might still work, if we
> name
> > > >>> the
> > > >>>>>>>>>> method
> > > >>>>>>>>>>>>> correctly (again, we would need to "play" with the API a
> > > >>> little
> > > >>>>>> bit
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>> see)
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>  - for DSL operators that are translated to multiple
> nodes:
> > > >>> it
> > > >>>>>>>>> might
> > > >>>>>>>>>>>>> make sense to use the specified operator name as prefix
> and
> > > >>> add
> > > >>>>>>>>>>>>> reasonable suffixes. For example, a join translates into
> 5
> > > >>>>>>>>> operators
> > > >>>>>>>>>>>>> that could be name "name-left-store-processor",
> > > >>>>>>>>>>>>> "name-left-join-processor", "name-right-store-processor",
> > > >>>>>>>>>>>>> "name-right-join-processor", and
> > > >> "name-join-merge-processor"
> > > >>>> (or
> > > >>>>>>>>>>>>> similar). Maybe just using numbers might also work.
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>  - I think, we should strip the number suffixes if a user
> > > >>>>>> provides
> > > >>>>>>>>>>> names
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>  - enforcing upper case seems to be tricky: for example,
> we
> > > >>> do
> > > >>>>>> not
> > > >>>>>>>>>>>>> enforce upper case for store names and we cannot easily
> > > >>> change
> > > >>>> it
> > > >>>>>>>>> as
> > > >>>>>>>>>> it
> > > >>>>>>>>>>>>> would break compatibility -- thus, for consistency
> reasons
> > > >> we
> > > >>>>>> might
> > > >>>>>>>>>> not
> > > >>>>>>>>>>>>> want to do this
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>  - for better understand of the impact of the KIP, it
> would
> > > >>> be
> > > >>>>>>>>> quite
> > > >>>>>>>>>>>>> helpful if you would list all method names that are
> > > >> affected
> > > >>> in
> > > >>>>>> the
> > > >>>>>>>>>> KIP
> > > >>>>>>>>>>>>> (ie, list all newly added overloads)
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> -Matthias
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>> On 5/31/18 6:40 PM, Guozhang Wang wrote:
> > > >>>>>>>>>>>>>> Hi Florian,
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Re 1: I think changing the KStreamImpl / KTableImpl to
> > > >> allow
> > > >>>>>>>>>>> modifying
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> processor name after the operator is fine as long as we
> do
> > > >>> the
> > > >>>>>>>>>> check
> > > >>>>>>>>>>>>> again
> > > >>>>>>>>>>>>>> when modifying that. In fact, we are having some
> topology
> > > >>>>>>>>>>> optimization
> > > >>>>>>>>>>>>>> going on which may modify processor names in the final
> > > >>>> topology
> > > >>>>>>>>>>>> anyways (
> > > >>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4983).
> Semantically
> > > >> I
> > > >>>>>> think
> > > >>>>>>>>>> it
> > > >>>>>>>>>>> is
> > > >>>>>>>>>>>>>> easier to understand to developers than "deciding the
> > > >>>> processor
> > > >>>>>>>>>> name
> > > >>>>>>>>>>>> for
> > > >>>>>>>>>>>>>> the next operator".
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Re 2: Yeah I'm thinking that for operators that
> translates
> > > >>> to
> > > >>>>>>>>>>> multiple
> > > >>>>>>>>>>>>>> processor names, we can still use the provided "hint" to
> > > >>> name
> > > >>>>>> the
> > > >>>>>>>>>>>>> processor
> > > >>>>>>>>>>>>>> names, e.g. for Joins we can name them as
> `join-foo-this`
> > > >>> and
> > > >>>>>>>>>>>>>> `join-foo-that` etc if user calls `as("foo")`.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Re 3: The motivation I had about removing the suffix is
> > > >> that
> > > >>>> it
> > > >>>>>>>>> has
> > > >>>>>>>>>>>> huge
> > > >>>>>>>>>>>>>> restrictions on topology compatibilities: consider if
> user
> > > >>>> code
> > > >>>>>>>>>>> added a
> > > >>>>>>>>>>>>> new
> > > >>>>>>>>>>>>>> operator, or library does some optimization to remove
> some
> > > >>>>>>>>>> operators,
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>> suffix indexing may be changed for a large amount of the
> > > >>>>>>>>> processor
> > > >>>>>>>>>>>> names:
> > > >>>>>>>>>>>>>> this will in turn change the internal state store names,
> > > >> as
> > > >>>> well
> > > >>>>>>>>> as
> > > >>>>>>>>>>>>>> internal topic names as well, making the new application
> > > >>>>>> topology
> > > >>>>>>>>>> to
> > > >>>>>>>>>>> be
> > > >>>>>>>>>>>>>> incompatible with the ones. One rationale I had about
> this
> > > >>> KIP
> > > >>>>>> is
> > > >>>>>>>>>>> that
> > > >>>>>>>>>>>>>> aligned this effort, moving forward we can allow users
> to
> > > >>>>>>>>> customize
> > > >>>>>>>>>>>>>> internal names so that they can still be reused even
> with
> > > >>>>>>>>> topology
> > > >>>>>>>>>>>>> changes
> > > >>>>>>>>>>>>>> (e.g. KIP-230), so I think removing the suffix index
> would
> > > >>> be
> > > >>>>>>>>> more
> > > >>>>>>>>>>>>>> applicable in the long run.
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> Guozhang
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>> On Thu, May 31, 2018 at 3:08 PM, Florian Hussonnois <
> > > >>>>>>>>>>>>> [email protected]>
> > > >>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Hi ,
> > > >>>>>>>>>>>>>>> Thank you very much for your feedback.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> 1/
> > > >>>>>>>>>>>>>>> I agree that overloading most of the methods with a
> > > >>> Processed
> > > >>>>>> is
> > > >>>>>>>>>> not
> > > >>>>>>>>>>>>> ideal.
> > > >>>>>>>>>>>>>>> I've started modifying the KStream API and I got to the
> > > >>> same
> > > >>>>>>>>>>>> conclusion.
> > > >>>>>>>>>>>>>>> Also ading a new method directly to KStreamImpl and
> > > >>>> KTableImpl
> > > >>>>>>>>>>> classes
> > > >>>>>>>>>>>>>>> seems to be a better option.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> However a processor name cannot be redefined after
> > > >> calling
> > > >>> an
> > > >>>>>>>>>>> operator
> > > >>>>>>>>>>>>> (or
> > > >>>>>>>>>>>>>>> maybe I miss something in the code).
> > > >>>>>>>>>>>>>>> From my understanding, this will only set the KStream
> > > >> name
> > > >>>>>>>>>> property
> > > >>>>>>>>>>>> not
> > > >>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> processor name previsouly added to the topology
> builder -
> > > >>>>>>>>> leading
> > > >>>>>>>>>> to
> > > >>>>>>>>>>>>>>> InvalidTopology exception.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> So the new method should actually defines the name of
> the
> > > >>>> next
> > > >>>>>>>>>>>>> processor :
> > > >>>>>>>>>>>>>>> Below is an example :
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> *stream.as <http://stream.as
> > > >>>>>>>>>>> (Processed.name("MAPPE_TO_UPPERCASE")*
> > > >>>>>>>>>>>>>>> *          .map( (k, v) -> KeyValue.pair(k,
> > > >>>> v.toUpperCase()))*
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> I think this approach could solve the cases for methods
> > > >>>>>>>>> returning
> > > >>>>>>>>>>>> void ?
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Regarding this new method we have two possible
> > > >>>> implementations
> > > >>>>>> :
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>    1. Adding a method like : withName(String
> > > >> processorName)
> > > >>>>>>>>>>>>>>>    2. or adding a method accepting an Processed object
> :
> > > >>>>>>>>>>>> as(Processed).
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> I think solution 2. is preferable as the Processed
> class
> > > >>>> could
> > > >>>>>>>>> be
> > > >>>>>>>>>>>>> enriched
> > > >>>>>>>>>>>>>>> further (in futur).
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> 2/
> > > >>>>>>>>>>>>>>> As Guozhang said some operators add internal
> processors.
> > > >>>>>>>>>>>>>>> For example the branch() method create one
> KStreamBranch
> > > >>>>>>>>> processor
> > > >>>>>>>>>>> to
> > > >>>>>>>>>>>>> route
> > > >>>>>>>>>>>>>>> records and one KStreamPassThrough processor for each
> > > >>> branch.
> > > >>>>>>>>>>>>>>> In that situation only the parent processor can be
> named.
> > > >>> For
> > > >>>>>>>>>>> children
> > > >>>>>>>>>>>>>>> processors we could keep the current behaviour that
> add a
> > > >>>>>> suffix
> > > >>>>>>>>>>> (i.e
> > > >>>>>>>>>>>>>>> KSTREAM-BRANCHCHILD-)
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> This also the case for the join() method that result to
> > > >>>> adding
> > > >>>>>>>>>>>> multiple
> > > >>>>>>>>>>>>>>> processors to the topology (windowing, left/right joins
> > > >>> and a
> > > >>>>>>>>>> merge
> > > >>>>>>>>>>>>>>> processor).
> > > >>>>>>>>>>>>>>> I think, like for the branch method users could only
> > > >>> define a
> > > >>>>>>>>>>>> processor
> > > >>>>>>>>>>>>>>> name prefix.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> 3/
> > > >>>>>>>>>>>>>>> I think we should  still added a suffix like
> > > >> "-0000000000"
> > > >>> to
> > > >>>>>>>>>>>> processor
> > > >>>>>>>>>>>>>>> name and enforce uppercases as this will keep some
> > > >>>> consistency
> > > >>>>>>>>>> with
> > > >>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> ones generated by the API.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> 4/
> > > >>>>>>>>>>>>>>> Yes, the KTable interface should be modified like
> KStream
> > > >>> to
> > > >>>>>>>>> allow
> > > >>>>>>>>>>>>> custom
> > > >>>>>>>>>>>>>>> processor names definition.
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> Le jeu. 31 mai 2018 à 19:18, Damian Guy <
> > > >>>> [email protected]>
> > > >>>>>>>>> a
> > > >>>>>>>>>>>> écrit
> > > >>>>>>>>>>>>> :
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Hi Florian,
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Thanks for the KIP. What about KTable and other DSL
> > > >>>>>> interfaces?
> > > >>>>>>>>>>> Will
> > > >>>>>>>>>>>>> they
> > > >>>>>>>>>>>>>>>> not want to be able to do the same thing?
> > > >>>>>>>>>>>>>>>> It would be good to see a complete set of the public
> API
> > > >>>>>>>>> changes.
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> Cheers,
> > > >>>>>>>>>>>>>>>> Damian
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>> On Wed, 30 May 2018 at 19:45 Guozhang Wang <
> > > >>>>>> [email protected]
> > > >>>>>>>>>>
> > > >>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Hello Florian,
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Thanks for the KIP. I have some meta feedbacks on the
> > > >>>>>>>>> proposal:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 1. You mentioned that this `Processed` object will be
> > > >>> added
> > > >>>>>>>>> to a
> > > >>>>>>>>>>> new
> > > >>>>>>>>>>>>>>>>> overloaded variant of all the stateless operators,
> what
> > > >>>> about
> > > >>>>>>>>>> the
> > > >>>>>>>>>>>>>>>> stateful
> > > >>>>>>>>>>>>>>>>> operators? Would like to hear your opinions if you
> have
> > > >>>>>>>>> thought
> > > >>>>>>>>>>>> about
> > > >>>>>>>>>>>>>>>> that:
> > > >>>>>>>>>>>>>>>>> note for stateful operators they will usually be
> mapped
> > > >>> to
> > > >>>>>>>>>>> multiple
> > > >>>>>>>>>>>>>>>>> processor node names, so we probably need to come up
> > > >> with
> > > >>>>>> some
> > > >>>>>>>>>>> ways
> > > >>>>>>>>>>>> to
> > > >>>>>>>>>>>>>>>>> define all their names.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 2. I share the same concern with Bill as for adding
> > > >> lots
> > > >>> of
> > > >>>>>>>>> new
> > > >>>>>>>>>>>>>>> overload
> > > >>>>>>>>>>>>>>>>> functions into the stateless operators, as we have
> just
> > > >>>> spent
> > > >>>>>>>>>>> quite
> > > >>>>>>>>>>>>>>> some
> > > >>>>>>>>>>>>>>>>> effort in trimming them since 1.0.0 release. If the
> > > >> goal
> > > >>> is
> > > >>>>>> to
> > > >>>>>>>>>>> just
> > > >>>>>>>>>>>>>>>> provide
> > > >>>>>>>>>>>>>>>>> some "hints" on the generated processor node names,
> not
> > > >>>>>>>>> strictly
> > > >>>>>>>>>>>>>>>> enforcing
> > > >>>>>>>>>>>>>>>>> the exact names that to be generated, then how about
> we
> > > >>>> just
> > > >>>>>>>>>> add a
> > > >>>>>>>>>>>> new
> > > >>>>>>>>>>>>>>>>> function to `KStream` and `KTable` classes like:
> > > >>>>>>>>>> "as(Processed)",
> > > >>>>>>>>>>>> with
> > > >>>>>>>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>> semantics as "the latest operators that generate this
> > > >>>> KStream
> > > >>>>>>>>> /
> > > >>>>>>>>>>>> KTable
> > > >>>>>>>>>>>>>>>> will
> > > >>>>>>>>>>>>>>>>> be named accordingly to this hint".
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> The only caveat, is that for all operators like
> > > >>>> `KStream#to`
> > > >>>>>>>>> and
> > > >>>>>>>>>>>>>>>>> `KStream#print` that returns void, this alternative
> > > >> would
> > > >>>> not
> > > >>>>>>>>>>> work.
> > > >>>>>>>>>>>>> But
> > > >>>>>>>>>>>>>>>> for
> > > >>>>>>>>>>>>>>>>> the current operators:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> a. KStream#print,
> > > >>>>>>>>>>>>>>>>> b. KStream#foreach,
> > > >>>>>>>>>>>>>>>>> c. KStream#to,
> > > >>>>>>>>>>>>>>>>> d. KStream#process
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> I personally felt that except `KStream#process` users
> > > >>> would
> > > >>>>>>>>> not
> > > >>>>>>>>>>>>> usually
> > > >>>>>>>>>>>>>>>>> bother to override their names, and for
> > > >> `KStream#process`
> > > >>>> we
> > > >>>>>>>>>> could
> > > >>>>>>>>>>>> add
> > > >>>>>>>>>>>>>>> an
> > > >>>>>>>>>>>>>>>>> overload variant with the additional Processed
> object.
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> 3. In your example, the processor names are still
> added
> > > >>>> with
> > > >>>>>> a
> > > >>>>>>>>>>>> suffix
> > > >>>>>>>>>>>>>>>> like
> > > >>>>>>>>>>>>>>>>> "
> > > >>>>>>>>>>>>>>>>> -0000000000", is this intentional? If yes, why (I
> > > >> thought
> > > >>>>>> with
> > > >>>>>>>>>>> user
> > > >>>>>>>>>>>>>>>>> specified processor name hints we will not add suffix
> > > >> to
> > > >>>>>>>>>>> distinguish
> > > >>>>>>>>>>>>>>>>> different nodes of the same type any more)?
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> Guozhang
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> On Tue, May 29, 2018 at 6:47 AM, Bill Bejeck <
> > > >>>>>>>>> [email protected]
> > > >>>>>>>>>>>
> > > >>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Hi Florian,
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Thanks for the KIP.  I think being able to add more
> > > >>>> context
> > > >>>>>>>>> to
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>>>>> processor names would be useful.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> I like the idea of adding a "withProcessorName" to
> > > >>>> Produced,
> > > >>>>>>>>>>>> Consumed
> > > >>>>>>>>>>>>>>>> and
> > > >>>>>>>>>>>>>>>>>> Joined.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> But instead of adding the "Processed" parameter to a
> > > >>> large
> > > >>>>>>>>>>>> percentage
> > > >>>>>>>>>>>>>>>> of
> > > >>>>>>>>>>>>>>>>>> the methods, which would result in overloaded
> methods
> > > >>>> (which
> > > >>>>>>>>> we
> > > >>>>>>>>>>>>>>> removed
> > > >>>>>>>>>>>>>>>>>> quite a bit with KIP-182) what do you think of
> adding
> > > >> a
> > > >>>>>>>>> method
> > > >>>>>>>>>>>>>>>>>> to the AbstractStream class "withName(String
> > > >>>>>> processorName)"?
> > > >>>>>>>>>> BTW
> > > >>>>>>>>>>>> I"m
> > > >>>>>>>>>>>>>>>> not
> > > >>>>>>>>>>>>>>>>>> married to the method name, it's the best I can do
> off
> > > >>> the
> > > >>>>>>>>> top
> > > >>>>>>>>>> of
> > > >>>>>>>>>>>> my
> > > >>>>>>>>>>>>>>>>> head.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> For the methods that return void, we'd have to add a
> > > >>>>>>>>> parameter,
> > > >>>>>>>>>>> but
> > > >>>>>>>>>>>>>>>> that
> > > >>>>>>>>>>>>>>>>>> would at least cut down on the number of overloaded
> > > >>>> methods
> > > >>>>>>>>> in
> > > >>>>>>>>>>> the
> > > >>>>>>>>>>>>>>> API.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Just my 2 cents.
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> Thanks,
> > > >>>>>>>>>>>>>>>>>> Bill
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> On Sun, May 27, 2018 at 4:13 PM, Florian Hussonnois
> <
> > > >>>>>>>>>>>>>>>>> [email protected]
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>> wrote:
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Hi,
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> I would like to start a new discussion on following
> > > >>> KIP :
> > > >>>>>>>>>>>>>>>>>>>
> > > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>
> > > >> 307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> This is still a draft.
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>> Looking forward for your feedback.
> > > >>>>>>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>>>>> Florian HUSSONNOIS
> > > >>>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>>>> -- Guozhang
> > > >>>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>> --
> > > >>>>>>>>>>>>>>> Florian HUSSONNOIS
> > > >>>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>>
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>> --
> > > >>>>>>>>>>>> Florian HUSSONNOIS
> > > >>>>>>>>>>>>
> > > >>>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>>
> > > >>>>>>>>>> --
> > > >>>>>>>>>> -- Guozhang
> > > >>>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>>
> > > >>>>>>>>> --
> > > >>>>>>>>> Florian HUSSONNOIS
> > > >>>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>> --
> > > >>>>>> Florian HUSSONNOIS
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> -- Guozhang
> > > >>>>>
> > > >>>>
> > > >>>>
> > > >>>> --
> > > >>>> -- Guozhang
> > > >>>>
> > > >>>
> > > >>
> > > >>
> > > >> --
> > > >> -- Guozhang
> > > >>
> > > >
> > > >
> > >
> > >
> >
>
>
> --
> Florian HUSSONNOIS
>

Re: [DISCUSS] KIP-307: Allow to define custom processor names with KStreams DSL

Reply via email to