Re: [DISCUSS] KIP-307: Allow to define custom processor names with KStreams DSL

Matthias J. Sax Sat, 12 Jan 2019 18:25:34 -0800

Just catching up on this KIP again.

One nit. The KIP says:


> In addition, the generated names have a few disadvantages to guarantee 
> topology compatibilities. In fact, adding a new operator, using a 
> third-library doing some optimization to remove some operators or upgrading 
> to a new KafkaStreams version with internal API changes may changed suffix 
> indexing for a large amount of the processor names. This will in turn change 
> the internal state store names, as well as internal topic names as well.
> 

This is not true any longer (I guess it was true, when the KIP was
initially proposed), because all stores/internal-topics can be named
since 2.1 release. I would suggest to remove the paragraph.

Overall, I like the Named/NamedOperation design.

What is unclear to me thought is, why we need new overloads for methods
that accept `Materialized`. To be more precise, I think it make sense to
add an overload that only takes `Named`, but not one that takes both
`Named` and `Materialized`. For example:

KGroupedStream#count() // exists
KGroupedStream#count(Materialized) // exits
KGroupedStream#count(Named) // added (makes sense to me)
KGroupedStream#count(Named, Materialized) // added -- why?

I would prefer to use `Materialized` to name the processor for this
case, too. Can you elaborate on the motivation?


-Matthias

On 1/11/19 3:39 PM, Florian Hussonnois wrote:
> Hi Guozhang,
> 
> I have updated the PR as well as the KIP. I should add more unit tests to
> covers all new methods.
> 
> However, I still have one test in failure. The reason is that using
> Joined.name() in both potential repartition topic and processor nodes leads
> to topology-incompatible.
> How should we deal with that ?
> 
> Thanks,
> 
> Le jeu. 10 janv. 2019 à 01:21, Guozhang Wang <wangg...@gmail.com> a écrit :
> 
>> Hello Florian,
>>
>> Just checking if have read about my previous email and if you feel happy
>> about it. We have the 2.2 KIP freeze deadline at 24th this month, while the
>> PR itself is getting quite close. So it'll be great if we can get the
>> agreement on it and get it into 2.2.0 release.
>>
>>
>> Guozhang
>>
>>
>> On Mon, Dec 17, 2018 at 2:39 PM Guozhang Wang <wangg...@gmail.com> wrote:
>>
>>> Hi Florian / John,
>>>
>>> Just wanted to throw a couple minor thoughts on the current proposal:
>>>
>>> 1) Regarding the interface / function name, I'd propose we call the
>>> interface `NamedOperation` which would be implemented by Produced /
>>> Consumed / Printed / Joined / Grouped / Suppressed (note I intentionally
>>> exclude Materialized here since its semantics is quite), and have the
>>> default class that implements `NamedOperation` as `Named`, which would be
>>> used in our adding overload functions. The main reason is to have
>>> consistency in naming.
>>>
>>> 2) As a minor tweak, I think it's better to use Joined.name() in both its
>>> possibly generate repartition topic, as well as the map processor used for
>>> group-by (currently this name is only used for the repartition topic).
>>>
>>>
>>> Florian: if you think this proposal makes sense, please feel free to go
>>> ahead and update the PR; after we made a first pass on it and feels
>>> confident about it, we can go ahead with the VOTING process. About the
>>> implementation of 2) above, this may be out of your implementation scope,
>>> so feel free to leave it out side your PR while Bill who originally worked
>>> on the Grouped KIP can make a follow-up PR for it.
>>>
>>> Guozhang
>>>
>>> On Fri, Dec 14, 2018 at 9:43 PM Guozhang Wang <wangg...@gmail.com> wrote:
>>>
>>>> Hello Florian,
>>>>
>>>> Really appreciate you for your patience.
>>>>
>>>> I know that we've discussed about the approach to adding overloaded
>>>> functions and rejected it early on. But looking deeper into the current PR
>>>> I realized that this approach has a danger of great API confusions to users
>>>> (I tried to explain my thoughts in the PR, but it was not very clear) ---
>>>> the basic idea is that, today we already have a few existing control
>>>> classes including Grouped, Joined, Suppressed that allow users to specify
>>>> serdes etc, while also a "name" which can then be used to define the
>>>> processor name / internal topic names in the topology (the static function
>>>> names are not consistent, which I think we should fix as well). And Named
>>>> interface, by extending the lambda function interfaces like ValueJoiner /
>>>> Predicate etc opens the door for another way to specify the names again.
>>>>
>>>> So in order to achieve consistency, we are left with generally two
>>>> options:
>>>>
>>>> 1) only allow users to specify names via the lambda interfaces that
>>>> extends Named interface. This means we'd better remove the naming mechanism
>>>> from the existing control objects to keep consistency.
>>>>
>>>> 2) only allow users to specify names via control classes, and we
>>>> introduce a new class (Named) for those which do not have one yet --- this
>>>> leads to the overloaded functions.
>>>>
>>>> I did a quick count on the num.of overloaded functions, and summing from
>>>> KTable (8) / KStream (15) / KGroupedStream (6) / KGroupedTable (6) /
>>>> TimeWindowedKStream (6) / SessionWindowedKStream (6) we got about 47
>>>> overloaded functions (our guess was pretty close!) -- note this is based on
>>>> John's proposal that we can let existing Grouped / Joined to extend Named
>>>> and hence we only need overloaded functions with a default NamedOperation
>>>> for those operators that do not have a control classes already.
>>>>
>>>> Thinking about this approach I feel it is not too bad compared with
>>>> either 1) above, which would require us to deprecate lot of public
>>>> functions around name(), or having a mixed mechanism for naming, which
>>>> could lead to very confusing behavior to users. Additionally, for most
>>>> users who would only want to specify the names for those stateful
>>>> operations which have internal topics / state stores and hence are more
>>>> keen to upgrade compatibility, those added overloads would be not-often
>>>> used functions for them anyways. And by letting existing control classes to
>>>> extend Named, we can have a unified method name for static constructor as
>>>> well.
>>>>
>>>>
>>>>
>>>> Guozhang
>>>>
>>>>
>>>> On Fri, Dec 14, 2018 at 10:24 AM John Roesler <j...@confluent.io> wrote:
>>>>
>>>>> Hi Florian,
>>>>>
>>>>> Sorry about the run-around of rejecting the original proposal,
>>>>> only to return to it later on. Hopefully, it's more encouraging
>>>>> than frustrating that we're coming around to your initial way of
>>>>> thinking.
>>>>>
>>>>> Thanks!
>>>>> -John
>>>>>
>>>>> On Thu, Dec 13, 2018 at 4:28 PM Florian Hussonnois <
>>>>> fhussonn...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Thanks again. I agree with your propositions.
>>>>>> Also IMHO, overloading all methods (filter, map) to accept a new
>>>>> control
>>>>>> object seems to provide a more natural development experience for
>>>>> users.
>>>>>>
>>>>>> Actually, this was the first proposition for this KIP, but we have
>>>>> rejected
>>>>>> it because this solution led to adding a lot of new methods.
>>>>>> As you mentioned it, the API has evolve since the creation of this
>>>>> KIP -
>>>>>> some existing control objects already allow to customize internal
>>>>> names. We
>>>>>> should so keep on that strategy.
>>>>>>
>>>>>> If everyone is OK with that, I will update the KIP and the PR
>>>>> accordingly;
>>>>>>
>>>>>> Thanks.
>>>>>>
>>>>>> Le jeu. 13 déc. 2018 à 18:08, John Roesler <j...@confluent.io> a
>>>>> écrit :
>>>>>>
>>>>>>> Hi again, all,
>>>>>>>
>>>>>>> Matthias, I agree with you.
>>>>>>>
>>>>>>> Florian, thanks for your response.
>>>>>>>
>>>>>>> I think your proposal is the best way to address the ask for hiding
>>>>> the
>>>>>>> name() getter. But I'd like to question that ask and instead
>>>>> propose that
>>>>>>> we just make the name() getter part of the public API.
>>>>>>>
>>>>>>> The desire to "hide" the getters causes a lot of complexity in our
>>>>> code
>>>>>>> base, and it will become completely impractical with the mixin
>>>>> strategy
>>>>>> of
>>>>>>> Named.
>>>>>>>
>>>>>>> If we were to switch strategies back to mixing Named in to the
>>>>> control
>>>>>>> objects rather than the functions, then the path forward becomes
>>>>> quite
>>>>>>> clear.
>>>>>>>
>>>>>>> On the other hand, it seems harmless for anyone who wants to be
>>>>> able to
>>>>>>> query the name from a control object after setting it, so my vote
>>>>> would
>>>>>> be
>>>>>>> simply to keep the Named interface as:
>>>>>>>
>>>>>>> public interface Named<T extends Named<T>> {
>>>>>>>   String name();
>>>>>>>   T withName(String name);
>>>>>>> }
>>>>>>>
>>>>>>> Under this proposal, we only mix Named in to the control objects,
>>>>> which
>>>>>>> means we have no need of default implementations anymore (because
>>>>> we can
>>>>>>> update all the control objects concurrently with adding this
>>>>> interface to
>>>>>>> them).
>>>>>>>
>>>>>>> This does hinge on switching over to a control-object-only strategy,
>>>>>> which
>>>>>>> introduces the need to add about 50 new control object classes,
>>>>> which
>>>>>> would
>>>>>>> only serve to implement Named. As a middle ground, maybe we could
>>>>> just
>>>>>> add
>>>>>>> one generic control object class, like:
>>>>>>>
>>>>>>> public class NamedOperation implements Named<NamedOperation> {
>>>>>>>   private final String name;
>>>>>>>   private NamedOperation(final String name) { this.name = name; }
>>>>>>>   public static NamedOperation name(final String name) {
>>>>>>>     return new NamedOperation(name);
>>>>>>>   }
>>>>>>>   public String name() { return name; }
>>>>>>>   public NamedOperation withName(final String name) {
>>>>>>>     return new NamedOperation(name);
>>>>>>>   }
>>>>>>> }
>>>>>>>
>>>>>>> And then, we'd add overloads for all the methods that don't have
>>>>> control
>>>>>>> objects already (for example, filter() ):
>>>>>>>
>>>>>>> // existing
>>>>>>> KStream<K, V> filter(Predicate<? super K, ? super V> predicate);
>>>>>>>
>>>>>>> // new
>>>>>>> KStream<K, V> filter(Predicate<? super K, ? super V> predicate,
>>>>>>> NamedOperation named);
>>>>>>>
>>>>>>> Additionally, in regard to Matthias's point about existing control
>>>>>> objects
>>>>>>> with naming semantics, they would extend Named (but not
>>>>> NamedOperation)
>>>>>> for
>>>>>>> uniformity.
>>>>>>>
>>>>>>> You provided a good approach to hide the getter with your
>>>>> SettableName
>>>>>>> class; I think what you proposed is the only way we could hide the
>>>>> name.
>>>>>>> In the end, though, it's a lot of complexity added (control object
>>>>> class
>>>>>>> hierarchy, inheritance, mutable state, internal casting) for
>>>>> something of
>>>>>>> dubious value: to be able to hide the name from someone *after they
>>>>>>> themselves have set it*.
>>>>>>>
>>>>>>> Although it'll be a pain, perhaps Matthias's suggestion to
>>>>> enumerate all
>>>>>>> the API methods is the best way to be sure we all agree on what's
>>>>> going
>>>>>> to
>>>>>>> happen.
>>>>>>>
>>>>>>> Thanks again for wrangling with this issue,
>>>>>>> -John
>>>>>>>
>>>>>>> On Thu, Dec 13, 2018 at 9:03 AM Matthias J. Sax <
>>>>> matth...@confluent.io>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Just catching up on this discussion.
>>>>>>>>
>>>>>>>> My overall personal take is, that I am not a big fan of the
>>>>> interface
>>>>>>>> `Named` that is used as a factory. I would rather prefer to add a
>>>>>>>> control object parameter to all methods that don't have one yet.
>>>>> This
>>>>>>>> KIP was started a while ago, and we added new naming capabilities
>>>>> in
>>>>>> the
>>>>>>>> meantime. Guozhang's example in the PR comment about naming in
>>>>>>>> stream-stream join shows, that we might end up in a confusion
>>>>> situation
>>>>>>>> for users if we use `Named`. Also, in 2.1, user can already name
>>>>> as
>>>>>>>> repartition-/changelog-topics and stores. Thus, KIP-307 boils
>>>>> down to
>>>>>>>> provide non-functional naming?
>>>>>>>>
>>>>>>>> Hence, for all methods that allow to specify names already, I
>>>>> don't see
>>>>>>>> any reason to change them, but use the existing API to also name
>>>>> the
>>>>>>>> processor(s) instead of allowing uses to specify a new name.
>>>>>>>>
>>>>>>>> About the inconsistency in method naming. I agree, that `as` is
>>>>> very
>>>>>>>> generic and maybe not the best choice.
>>>>>>>>
>>>>>>>> I think it might be helpful, to have a table overview in the KIP,
>>>>> that
>>>>>>>> list all existing static/non-static methods that allow to specify
>>>>> a
>>>>>>>> name, plus a columns with the new suggested naming for those
>>>>> methods?
>>>>>>>>
>>>>>>>> Thoughts?
>>>>>>>>
>>>>>>>>
>>>>>>>> -Matthias
>>>>>>>>
>>>>>>>>
>>>>>>>> On 12/12/18 12:45 AM, Florian Hussonnois wrote:
>>>>>>>>> Thank you very much for your feedbacks.
>>>>>>>>>
>>>>>>>>> Currently, there is still lot of discussions regarding the Named
>>>>>>>> interface.
>>>>>>>>> On the one hand we should provided consistency over the stream
>>>>> API
>>>>>> and
>>>>>>> on
>>>>>>>>> the other hand we should not break the semantic as John point
>>>>> it up.
>>>>>>>>>
>>>>>>>>> Guozhang, I'm sorry, but I'm little bit confused, maybe I missed
>>>>>>>> something.
>>>>>>>>> In your comment you have suggested that :
>>>>>>>>> * Produced/Consumed/Suppressed should extends Named
>>>>>>>>> * Named should have a private-package method to get the
>>>>> specified
>>>>>>>> processor
>>>>>>>>> name internally (processorName())
>>>>>>>>> * Finally we should end up with something like :  Named -> XXX
>>>>> ->
>>>>>>>>> XXXInternal or Named -> Produced -> ProducedInternal
>>>>>>>>>
>>>>>>>>> The objective behind that is to :
>>>>>>>>> * consolidate the internal method processorName()
>>>>>>>>> * consolidate the method withName that exists now existing into
>>>>>>> Produced,
>>>>>>>>> Consumed and Suppressed.
>>>>>>>>>
>>>>>>>>> But, Named is an interface so we can't define a private-package
>>>>>> method
>>>>>>> on
>>>>>>>>> it. Also, for example Produced and ProducedInternal are not in
>>>>> the
>>>>>> same
>>>>>>>>> package so having a private-package method doesn't really help.
>>>>>>>>> In addition, if we add the withName method into Named interface
>>>>> this
>>>>>>> can
>>>>>>>>> become confusing for developers because action interfaces
>>>>>> (ValueMapper,
>>>>>>>>> Reducer, etc) extend it.
>>>>>>>>> The interface would look like :
>>>>>>>>>
>>>>>>>>> public interface Named<T extends Named<T>> {
>>>>>>>>>     default String name() {
>>>>>>>>>         return null;
>>>>>>>>>     }
>>>>>>>>>     default Named<T> withName(final String name) {
>>>>>>>>>         return null;
>>>>>>>>>     }
>>>>>>>>> ...
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> So maybe instead of adding another method to Named we could
>>>>> create a
>>>>>>> new
>>>>>>>>> package-private class that could be extended by
>>>>>>>>> Produced/Consumed/Joined/Suppressed. For exemple,
>>>>>>>>> class SettableName<T extends SettableName<T>> implements Named {
>>>>>>>>>
>>>>>>>>>     protected String processorName;
>>>>>>>>>
>>>>>>>>>     SettableName(final SettableName settable) {
>>>>>>>>>         this(Objects.requireNonNull(settable, "settable can't be
>>>>>>>>> null").name());
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>     SettableName(final String processorName) {
>>>>>>>>>         this.processorName = processorName;
>>>>>>>>>     }
>>>>>>>>>
>>>>>>>>>     @Override
>>>>>>>>>     public String name() {
>>>>>>>>>         return processorName;
>>>>>>>>>     }
>>>>>>>>>     public T withName(final String processorName) {
>>>>>>>>>         this.processorName = processorName;
>>>>>>>>>         return (T)this;
>>>>>>>>>     }
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>> In that way, we will get : public class Produced implements
>>>>>>>>> SettableName<Produced> { ...
>>>>>>>>>
>>>>>>>>> WDYT?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Le mar. 11 déc. 2018 à 02:46, Guozhang Wang <wangg...@gmail.com>
>>>>> a
>>>>>>>> écrit :
>>>>>>>>>
>>>>>>>>>> I had one meta comment on the PR:
>>>>>>>>>>
>>>>> https://github.com/apache/kafka/pull/5909#discussion_r240447153
>>>>>>>>>>
>>>>>>>>>> On Mon, Dec 10, 2018 at 5:22 PM John Roesler <
>>>>> j...@confluent.io>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Florian,
>>>>>>>>>>>
>>>>>>>>>>> I hope it's ok if I ask a few questions at this late stage...
>>>>>>>>>>>
>>>>>>>>>>> Comment 1 ======
>>>>>>>>>>>
>>>>>>>>>>> It seems like the proposal is to add a new "Named" interface
>>>>> that
>>>>>> is
>>>>>>>>>>> intended to be mixed in with the existing API objects at
>>>>> various
>>>>>>>> points.
>>>>>>>>>>>
>>>>>>>>>>> Just to preface some of my comments, it looks like your KIP
>>>>> was
>>>>>>> created
>>>>>>>>>>> quite a while ago, so the API may have changed somewhat since
>>>>> you
>>>>>>>>>> started.
>>>>>>>>>>>
>>>>>>>>>>> As I see the API, there are a few different kinds of DSL
>>>>> method
>>>>>>>>>> arguments:
>>>>>>>>>>> * functions: things like Initializer, Aggregator, ValueJoiner,
>>>>>>>>>>> ForEachAction... All of these are essentially Streams-flavored
>>>>>>> Function
>>>>>>>>>>> interfaces with different arities, type bounds, and semantics.
>>>>>>>>>>> * config objects: things like Produced, Consumed, Joined,
>>>>>> Grouped...
>>>>>>>>>> These
>>>>>>>>>>> are containers for configurations, where the target of the
>>>>>>>> configuration
>>>>>>>>>> is
>>>>>>>>>>> the operation itself
>>>>>>>>>>> * raw configurations: things like a raw topic-name string and
>>>>>>>>>> Materialized:
>>>>>>>>>>> These are configurations for operations that have no config
>>>>> object,
>>>>>>> and
>>>>>>>>>> for
>>>>>>>>>>> various reasons, we didn't make one. The distinguishing
>>>>> feature is
>>>>>>> that
>>>>>>>>>> the
>>>>>>>>>>> target of the configuration is not the operation itself, but
>>>>> some
>>>>>>>> aspect
>>>>>>>>>> of
>>>>>>>>>>> it. For example, in Materialized, we are not setting the
>>>>> caching
>>>>>>>> behavior
>>>>>>>>>>> of, for example, an aggregation; we're setting the caching
>>>>> behavior
>>>>>>> of
>>>>>>>> a
>>>>>>>>>>> materialized state store attached to the aggregation.
>>>>>>>>>>>
>>>>>>>>>>> It seems like choosing to mix the Named interface in with the
>>>>>>> functions
>>>>>>>>>> has
>>>>>>>>>>> a couple of unfortunate side-effects:
>>>>>>>>>>> * Aggregator is not the only function passed to any of the
>>>>> relevant
>>>>>>>>>>> aggregate methods, so it seems a little arbitrary to pick that
>>>>>>> function
>>>>>>>>>>> over Initializer or Merger.
>>>>>>>>>>> * As you noted, branch() takes an array of Predicate, so we
>>>>> just
>>>>>>> ignore
>>>>>>>>>> the
>>>>>>>>>>> provided name(s), even though Predicate names are used
>>>>> elsewhere.
>>>>>>>>>>> * Not all things that we want to name have function arguments,
>>>>>>> notably
>>>>>>>>>>> source and sink, so we'd switch paradigms and use the config
>>>>> object
>>>>>>>>>>> instead.
>>>>>>>>>>> * Adding an extra method to the function interfaces means that
>>>>>> those
>>>>>>>> are
>>>>>>>>>> no
>>>>>>>>>>> longer SAM interfaces. You proposed to add a default
>>>>>> implementation,
>>>>>>> so
>>>>>>>>>> we
>>>>>>>>>>> could still pass a lambda if we don't want to set the name,
>>>>> but if
>>>>>> we
>>>>>>>>>> *do*
>>>>>>>>>>> want to set the name, we can no longer use lambdas.
>>>>>>>>>>>
>>>>>>>>>>> I think the obvious other choice would be to mix Named in
>>>>> with the
>>>>>>>> config
>>>>>>>>>>> objects instead, but this has one main downside of its own...
>>>>>>>>>>> * not every operator we wish to name has a config object. I
>>>>> don't
>>>>>>> know
>>>>>>>> if
>>>>>>>>>>> everyone involved is comfortable with adding a config object
>>>>> to
>>>>>> every
>>>>>>>>>>> operator that's missing one.
>>>>>>>>>>>
>>>>>>>>>>> Personally, I favor moving toward a more consistent state
>>>>> that's
>>>>>>>> forward
>>>>>>>>>>> compatible with any further changes we wish to make. I
>>>>> *think* that
>>>>>>>>>> giving
>>>>>>>>>>> every operator two forms (one with no config and one with a
>>>>> config
>>>>>>>>>> object)
>>>>>>>>>>> would be such an API.
>>>>>>>>>>>
>>>>>>>>>>> Comment 2 =========
>>>>>>>>>>>
>>>>>>>>>>> Finally, just a minor comment: the static method in Named
>>>>> wouldn't
>>>>>>> work
>>>>>>>>>>> properly as defined. Assuming that we mix Named in with
>>>>> Produced,
>>>>>> for
>>>>>>>>>>> example, we'd need to be able to use it like:
>>>>>>>>>>>>  kStream.to("out", Produced.with("myOut"))
>>>>>>>>>>> This doesn't work because with() returns a Named, but we need
>>>>> a
>>>>>>>> Produced.
>>>>>>>>>>>
>>>>>>>>>>> We can pull off a builder method in the interface, but not a
>>>>> static
>>>>>>>>>> method.
>>>>>>>>>>> To define a builder method in the interface that returns an
>>>>>> instance
>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>>> concrete subtype, you have to use the "curiously recurring
>>>>> generic"
>>>>>>>>>>> pattern.
>>>>>>>>>>>
>>>>>>>>>>> It would look like:
>>>>>>>>>>>
>>>>>>>>>>> public interface Named<N extends Named<N>> {
>>>>>>>>>>>   String name();
>>>>>>>>>>>   N withName(String name);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> You can see where the name of the pattern comes from ;)
>>>>>>>>>>> An implementation would then look like:
>>>>>>>>>>>
>>>>>>>>>>> public class Produced implements Named<Produced> {
>>>>>>>>>>>   String name() { return name; }
>>>>>>>>>>>   Produced withName(final String name) { this.name = name;
>>>>> return
>>>>>>>> this;
>>>>>>>>>> }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Note that the generic parameter gets filled in properly in the
>>>>>>>>>> implementing
>>>>>>>>>>> class, so that you get the right return type out.
>>>>>>>>>>>
>>>>>>>>>>> It doesn't work at all with a static factory method at the
>>>>>> interface
>>>>>>>>>> level,
>>>>>>>>>>> so it would be up to Produced to define a static factory if it
>>>>>> wants
>>>>>>> to
>>>>>>>>>>> present one.
>>>>>>>>>>>
>>>>>>>>>>> ======
>>>>>>>>>>>
>>>>>>>>>>> Those are my two feedbacks!
>>>>>>>>>>>
>>>>>>>>>>> I hope you find this helpful, rather than frustrating. I'm
>>>>> sorry I
>>>>>>>> didn't
>>>>>>>>>>> get a chance to comment sooner.
>>>>>>>>>>>
>>>>>>>>>>> Thanks for the KIP, I think it will be much nicer to be able
>>>>> to
>>>>>> name
>>>>>>>> the
>>>>>>>>>>> processor nodes.
>>>>>>>>>>>
>>>>>>>>>>> -John
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Nov 27, 2018 at 6:34 PM Guozhang Wang <
>>>>> wangg...@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Florian,
>>>>>>>>>>>>
>>>>>>>>>>>> I've made a pass over the PR. There are some comments that
>>>>> are
>>>>>>> related
>>>>>>>>>> to
>>>>>>>>>>>> the function names which may be affecting the KIP wiki page,
>>>>> but
>>>>>>>>>> overall
>>>>>>>>>>> I
>>>>>>>>>>>> think it looks good already.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Fri, Nov 16, 2018 at 4:21 PM Guozhang Wang <
>>>>> wangg...@gmail.com
>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks Florian! I will take a look at the PR.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Mon, Nov 12, 2018 at 2:44 PM Florian Hussonnois <
>>>>>>>>>>>> fhussonn...@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Hi Matthias,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Sorry I was absent for a while. I have started a new PR
>>>>> for this
>>>>>>>>>> KIP.
>>>>>>>>>>> It
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>> still in progress for now. I'm working on it.
>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/5909
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Le ven. 19 oct. 2018 à 20:13, Matthias J. Sax <
>>>>>>>>>> matth...@confluent.io>
>>>>>>>>>>> a
>>>>>>>>>>>>>> écrit :
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What is the status of this KIP?
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On 7/19/18 5:17 PM, Guozhang Wang wrote:
>>>>>>>>>>>>>>>> Hello Florian,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sorry for being late... Found myself keep apologizing
>>>>> for late
>>>>>>>>>>>> replies
>>>>>>>>>>>>>>>> these days. But I do want to push this KIP's progress
>>>>> forward
>>>>>>>>>> as I
>>>>>>>>>>>>>> see it
>>>>>>>>>>>>>>>> very important and helpful feature for extensibility.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> About the exceptions, I've gone through them and
>>>>> hopefully it
>>>>>> is
>>>>>>>>>>> an
>>>>>>>>>>>>>>>> exhaustive list:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> 1. KTable#toStream()
>>>>>>>>>>>>>>>> 2. KStream#merge(KStream)
>>>>>>>>>>>>>>>> 3. KStream#process() / transform() / transformValues()
>>>>>>>>>>>>>>>> 4. KGroupedTable / KGroupedStream#count()
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Here's my reasoning:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> * It is okay not letting users to override the name for
>>>>> 1/2,
>>>>>>>>>> since
>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>> too trivial to be useful for debugging, plus their
>>>>> processor
>>>>>>>>>> names
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>> not determine any related topic / store names.
>>>>>>>>>>>>>>>> * For 3, I'd vote for adding overloaded functions with
>>>>> Named.
>>>>>>>>>>>>>>>> * For 4, if users really want to name the processor she
>>>>> can
>>>>>> call
>>>>>>>>>>>>>>>> aggregate() instead, so I think it is okay to skip this
>>>>> case.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Jul 6, 2018 at 3:06 PM, Florian Hussonnois <
>>>>>>>>>>>>>>> fhussonn...@gmail.com>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> The option #3 seems to be a good alternative and I find
>>>>> the
>>>>>> API
>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>> elegant (thanks John).
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> But, we still have the need to overload some methods
>>>>> either
>>>>>>>>>>> because
>>>>>>>>>>>>>>> they do
>>>>>>>>>>>>>>>>> not accept an action instance or because they are
>>>>> translated
>>>>>> to
>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>> processors.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> For example, this is the case for methods branch() and
>>>>>> merge().
>>>>>>>>>>> We
>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>> introduce a new interface Named (or maybe a different
>>>>> name ?)
>>>>>>>>>>> with
>>>>>>>>>>>> a
>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>> name(). All action interfaces could extend this one to
>>>>>>>>>> implement
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> option
>>>>>>>>>>>>>>>>> 3).
>>>>>>>>>>>>>>>>> This would result by having the following overloads  :
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Stream<K, V> merge(final Named name, final KStream<K, V>
>>>>>>>>>> stream);
>>>>>>>>>>>>>>>>> KStream<K, V>[] branch(final Named name, final
>>>>> Predicate<?
>>>>>>>>>> super
>>>>>>>>>>>> K, ?
>>>>>>>>>>>>>>> super
>>>>>>>>>>>>>>>>> V>... predicates)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> N.B : The list above is  not exhaustive
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>>>> user's code will become :
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         KStream<String, Integer> stream =
>>>>>>>>>> builder.stream("test");
>>>>>>>>>>>>>>>>>         KStream<String, Integer>[] branches =
>>>>>>>>>>>>>>>>> stream.branch(Named.with("BRANCH-STREAM-ON-VALUE"),
>>>>>>>>>>>>>>>>>                 Predicate.named("STREAM-PAIR-VALUE",
>>>>> (k, v)
>>>>>> ->
>>>>>>>>>> v
>>>>>>>>>>> %
>>>>>>>>>>>> 2
>>>>>>>>>>>>>> ==
>>>>>>>>>>>>>>>>> 0),
>>>>>>>>>>>>>>>>>                 Predicate.named("STREAM-IMPAIR-VALUE",
>>>>> (k, v)
>>>>>>>>>> ->
>>>>>>>>>>> v
>>>>>>>>>>>> %
>>>>>>>>>>>>>> 2
>>>>>>>>>>>>>>> !=
>>>>>>>>>>>>>>>>> 0));
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>         branches[0].to("pair");
>>>>>>>>>>>>>>>>>         branches[1].to("impair");
>>>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> This is a mix of the options 3) and 1)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Le ven. 6 juil. 2018 à 22:58, Guozhang Wang <
>>>>>>>>>> wangg...@gmail.com>
>>>>>>>>>>> a
>>>>>>>>>>>>>>> écrit :
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi folks, just to summarize the options we have so far:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 1) Add a new "as" for KTable / KStream, plus adding new
>>>>>> fields
>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> operators-returns-void control objects (the current
>>>>> wiki's
>>>>>>>>>>>>>> proposal).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Pros: no more overloads.
>>>>>>>>>>>>>>>>>> Cons: a bit departing with the current high-level API
>>>>> design
>>>>>>>>>> of
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> DSL,
>>>>>>>>>>>>>>>>>> plus, the inconsistency between operators-returns-void
>>>>> and
>>>>>>>>>>>>>>>>>> operators-not-return-voids.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 2) Add overloaded functions for all operators, that
>>>>> accepts
>>>>>> a
>>>>>>>>>>> new
>>>>>>>>>>>>>>> control
>>>>>>>>>>>>>>>>>> object "Described".
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Pros: consistent with current APIs.
>>>>>>>>>>>>>>>>>> Cons: lots of overloaded functions to add.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 3) Add another default function in the interface
>>>>> (thank you
>>>>>>>>>> J8!)
>>>>>>>>>>>> as
>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>>> proposed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Pros: no overloaded functions, no "Described".
>>>>>>>>>>>>>>>>>> Cons: do we lose lambda functions really (seems not if
>>>>> we
>>>>>>>>>>> provide
>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>> "named"
>>>>>>>>>>>>>>>>>> for each func)? Plus "Described" may be more extensible
>>>>>> than a
>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>>> `String`.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> My principle of considering which one is better depends
>>>>>>>>>>> primarily
>>>>>>>>>>>> on
>>>>>>>>>>>>>>> "how
>>>>>>>>>>>>>>>>>> to make advanced users easily use the additional API,
>>>>> while
>>>>>>>>>>>> keeping
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>> hidden from normal users who do not care at all". For
>>>>> that
>>>>>>>>>>>> purpose I
>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>> 3) > 1) > 2).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> One caveat though, is that changing the interface
>>>>> would not
>>>>>> be
>>>>>>>>>>>>>>>>>> binary-compatible though source-compatible, right? I.e.
>>>>>> users
>>>>>>>>>>> need
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> recompile their code though no changes needed.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Another note: for 3), if we really want to keep
>>>>>> extensibility
>>>>>>>>>> of
>>>>>>>>>>>>>>>>> Described
>>>>>>>>>>>>>>>>>> we could do sth. like:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> public interface Predicate<K, V> {
>>>>>>>>>>>>>>>>>>     // existing method
>>>>>>>>>>>>>>>>>>     boolean test(final K key, final V value);
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>     // new default method adds the ability to name the
>>>>>>>>>> predicate
>>>>>>>>>>>>>>>>>>     default Described described() {
>>>>>>>>>>>>>>>>>>         return new Described(null);
>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> ----------
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> where user's code becomes:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> stream.filter(named("key", (k, v) -> true));   // note
>>>>>> `named`
>>>>>>>>>>> now
>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>> sets a Described("key") in "described()".
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> stream.filter(described(Described.as("key", /* any
>>>>> other
>>>>>> fancy
>>>>>>>>>>>>>>>>> parameters
>>>>>>>>>>>>>>>>>> in the future*/), (k, v) -> true));
>>>>>>>>>>>>>>>>>> ----------
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I feel it is not much likely that we'd need to extend
>>>>> it
>>>>>>>>>> further
>>>>>>>>>>>> in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> future, so just a `String` would be good enough. But
>>>>> just
>>>>>>>>>>> listing
>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>> possibilities here.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Jul 6, 2018 at 8:19 AM, John Roesler <
>>>>>>>>>> j...@confluent.io
>>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Hi Florian,
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Sorry I'm late to the party, but I missed the message
>>>>>>>>>>> originally.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Regarding the names, it's probably a good idea to
>>>>> stick to
>>>>>>>>>> the
>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>> character set we're currently using: letters,
>>>>> numbers, and
>>>>>>>>>>>> hyphens.
>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>> names are used in Kafka topics, files and folders, and
>>>>>>>>>> RocksDB
>>>>>>>>>>>>>>>>> databases,
>>>>>>>>>>>>>>>>>>> and we also need them to work with the file systems of
>>>>>>>>>> Windows,
>>>>>>>>>>>>>> Linux,
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>> MacOS. My opinion is that with a situation like that,
>>>>> it's
>>>>>>>>>>> better
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>> conservative. It might also be a good idea to impose
>>>>> an
>>>>>> upper
>>>>>>>>>>>>>> limit on
>>>>>>>>>>>>>>>>>> name
>>>>>>>>>>>>>>>>>>> length to avoid running afoul of any of those systems.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> It seems like there's a small debate between 1)
>>>>> adding a
>>>>>> new
>>>>>>>>>>>>>> method to
>>>>>>>>>>>>>>>>>>> KStream (and maybe KTable) to modify its name after
>>>>> the
>>>>>> fact,
>>>>>>>>>>> or
>>>>>>>>>>>> 2)
>>>>>>>>>>>>>>>>>>> piggy-backing on the config objects where they exist
>>>>> and
>>>>>>>>>> adding
>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>> they don't. To me, #2 is the better alternative even
>>>>> though
>>>>>>>>>> it
>>>>>>>>>>>>>>> produces
>>>>>>>>>>>>>>>>>>> more overloads and may be a bit awkward in places.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> The reason is simply that #1 is a high-level
>>>>> departure from
>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> graph-building paradigm we're using in the DSL.
>>>>> Consider:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Graph.node1(config).node2(config)
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> vs
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Graph.node1().config().node2().config()
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> We could have done either, but we picked the former. I
>>>>>> think
>>>>>>>>>>> it's
>>>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>>>> a good goal to try and stick to it so that developers
>>>>> can
>>>>>>>>>>> develop
>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>> rely
>>>>>>>>>>>>>>>>>>> on their instincts for how the DSL will behave.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I do want to present one alternative to adding new
>>>>> config
>>>>>>>>>>>> objects:
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>> just add a "name()" method to all our "action"
>>>>> interfaces.
>>>>>>>>>> For
>>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>>> I'll demonstrate how we can add a "name" to Predicate
>>>>> and
>>>>>>>>>> then
>>>>>>>>>>>> use
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> name a "KStream#filter" DSL operator:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> public interface Predicate<K, V> {
>>>>>>>>>>>>>>>>>>>     // existing method
>>>>>>>>>>>>>>>>>>>     boolean test(final K key, final V value);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     // new default method adds the ability to name the
>>>>>>>>>>> predicate
>>>>>>>>>>>>>>>>>>>     default String name() {
>>>>>>>>>>>>>>>>>>>         return null;
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     // new static factory method adds the ability to
>>>>> wrap
>>>>>>>>>>> lambda
>>>>>>>>>>>>>>>>>> predicates
>>>>>>>>>>>>>>>>>>> with a named predicate
>>>>>>>>>>>>>>>>>>>     static <K, V> Predicate<K, V> named(final String
>>>>> name,
>>>>>>>>>>> final
>>>>>>>>>>>>>>>>>>> Predicate<K, V> predicate) {
>>>>>>>>>>>>>>>>>>>         return new Predicate<K, V>() {
>>>>>>>>>>>>>>>>>>>             @Override
>>>>>>>>>>>>>>>>>>>             public boolean test(final K key, final V
>>>>>> value) {
>>>>>>>>>>>>>>>>>>>                 return predicate.test(key, value);
>>>>>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>             @Override
>>>>>>>>>>>>>>>>>>>             public String name() {
>>>>>>>>>>>>>>>>>>>                 return name;
>>>>>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>>>>>         };
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Then, here's how it would look to use it:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> // Anonymous predicates continue to work just fine
>>>>>>>>>>>>>>>>>>> stream.filter((k, v) -> true);
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> // Devs can swap in a Predicate that implements the
>>>>> name()
>>>>>>>>>>>> method.
>>>>>>>>>>>>>>>>>>> stream.filter(new Predicate<Object, Object>() {
>>>>>>>>>>>>>>>>>>>     @Override
>>>>>>>>>>>>>>>>>>>     public boolean test(final Object key, final Object
>>>>>>>>>> value) {
>>>>>>>>>>>>>>>>>>>         return true;
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>     @Override
>>>>>>>>>>>>>>>>>>>     public String name() {
>>>>>>>>>>>>>>>>>>>         return "hey";
>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>> });
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> // Or they can wrap their existing lambda using the
>>>>> static
>>>>>>>>>>>> factory
>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>> stream.filter(named("key", (k, v) -> true));
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Just a thought.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Overall, I think it's really valuable to be able to
>>>>> name
>>>>>> the
>>>>>>>>>>>>>>>>> processors,
>>>>>>>>>>>>>>>>>>> for all the reasons you mentioned in the KIP. So
>>>>> thank you
>>>>>>>>>> for
>>>>>>>>>>>>>>>>>> introducing
>>>>>>>>>>>>>>>>>>> this!
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On Thu, Jul 5, 2018 at 4:53 PM Florian Hussonnois <
>>>>>>>>>>>>>>>>> fhussonn...@gmail.com
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Hi, thank you very much for all you suggestions. I've
>>>>>>>>>> started
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> update
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>> KIP (
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>> 307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL
>>>>>>>>>>>>>>>>>>>> ).
>>>>>>>>>>>>>>>>>>>> Also, I propose to rename the Processed class into
>>>>>>>>>> Described -
>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>> more meaningful (but this is just a detail).
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> I'm OK to not enforcing uppercase for specific names
>>>>> but
>>>>>>>>>>> should
>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>> arbitrary names with whitespaces for example ?
>>>>> Currently,
>>>>>> I
>>>>>>>>>>>> can't
>>>>>>>>>>>>>>>>> tell
>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>> this can lead to some side effects ?
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Le lun. 11 juin 2018 à 01:31, Matthias J. Sax <
>>>>>>>>>>>>>> matth...@confluent.io
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> écrit :
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Just catching up on this thread.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> I like the general idea. Couple of comments:
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  - I think that adding `Processed` (or maybe a
>>>>> different
>>>>>>>>>>> name?)
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> valid proposal for stateless operators that only
>>>>> have a
>>>>>>>>>>> single
>>>>>>>>>>>>>>>>>> overload
>>>>>>>>>>>>>>>>>>>>> atm. It would align with the overall API design.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  - for all methods with multiple existing
>>>>> overloads, we
>>>>>> can
>>>>>>>>>>>>>>>>> consider
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> extend `Consumed`, `Produced`, `Materialized` etc
>>>>> to take
>>>>>>>>>> an
>>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>>>>> processor name (not sure atm how elegant this is; we
>>>>>> would
>>>>>>>>>>> need
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> "play" with the API a little bit; the advantage
>>>>> would be,
>>>>>>>>>>> that
>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>> add more overloads what seems to be key for this
>>>>> KIP)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  - operators return void: while I agree that the
>>>>> "name
>>>>>>>>>> first"
>>>>>>>>>>>>>>>>>> chaining
>>>>>>>>>>>>>>>>>>>>> idea is not very intuitive, it might still work, if
>>>>> we
>>>>>> name
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>> correctly (again, we would need to "play" with the
>>>>> API a
>>>>>>>>>>> little
>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> see)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  - for DSL operators that are translated to multiple
>>>>>> nodes:
>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>>>> make sense to use the specified operator name as
>>>>> prefix
>>>>>> and
>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>>>>> reasonable suffixes. For example, a join translates
>>>>> into
>>>>>> 5
>>>>>>>>>>>>>>>>> operators
>>>>>>>>>>>>>>>>>>>>> that could be name "name-left-store-processor",
>>>>>>>>>>>>>>>>>>>>> "name-left-join-processor",
>>>>> "name-right-store-processor",
>>>>>>>>>>>>>>>>>>>>> "name-right-join-processor", and
>>>>>>>>>> "name-join-merge-processor"
>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>>>>>>> similar). Maybe just using numbers might also work.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  - I think, we should strip the number suffixes if
>>>>> a user
>>>>>>>>>>>>>> provides
>>>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  - enforcing upper case seems to be tricky: for
>>>>> example,
>>>>>> we
>>>>>>>>>>> do
>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>> enforce upper case for store names and we cannot
>>>>> easily
>>>>>>>>>>> change
>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>> would break compatibility -- thus, for consistency
>>>>>> reasons
>>>>>>>>>> we
>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>> want to do this
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>  - for better understand of the impact of the KIP,
>>>>> it
>>>>>> would
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>>>> helpful if you would list all method names that are
>>>>>>>>>> affected
>>>>>>>>>>> in
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>> (ie, list all newly added overloads)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> On 5/31/18 6:40 PM, Guozhang Wang wrote:
>>>>>>>>>>>>>>>>>>>>>> Hi Florian,
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Re 1: I think changing the KStreamImpl /
>>>>> KTableImpl to
>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>> modifying
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> processor name after the operator is fine as long
>>>>> as we
>>>>>> do
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>> check
>>>>>>>>>>>>>>>>>>>>> again
>>>>>>>>>>>>>>>>>>>>>> when modifying that. In fact, we are having some
>>>>>> topology
>>>>>>>>>>>>>>>>>>> optimization
>>>>>>>>>>>>>>>>>>>>>> going on which may modify processor names in the
>>>>> final
>>>>>>>>>>>> topology
>>>>>>>>>>>>>>>>>>>> anyways (
>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4983).
>>>>>> Semantically
>>>>>>>>>> I
>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> easier to understand to developers than "deciding
>>>>> the
>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>> name
>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>> the next operator".
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Re 2: Yeah I'm thinking that for operators that
>>>>>> translates
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>>>>>>> processor names, we can still use the provided
>>>>> "hint" to
>>>>>>>>>>> name
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>>>>>> names, e.g. for Joins we can name them as
>>>>>> `join-foo-this`
>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> `join-foo-that` etc if user calls `as("foo")`.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Re 3: The motivation I had about removing the
>>>>> suffix is
>>>>>>>>>> that
>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>> huge
>>>>>>>>>>>>>>>>>>>>>> restrictions on topology compatibilities: consider
>>>>> if
>>>>>> user
>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>> added a
>>>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>> operator, or library does some optimization to
>>>>> remove
>>>>>> some
>>>>>>>>>>>>>>>>>> operators,
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> suffix indexing may be changed for a large amount
>>>>> of the
>>>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>>>> names:
>>>>>>>>>>>>>>>>>>>>>> this will in turn change the internal state store
>>>>> names,
>>>>>>>>>> as
>>>>>>>>>>>> well
>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>> internal topic names as well, making the new
>>>>> application
>>>>>>>>>>>>>> topology
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> incompatible with the ones. One rationale I had
>>>>> about
>>>>>> this
>>>>>>>>>>> KIP
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>> aligned this effort, moving forward we can allow
>>>>> users
>>>>>> to
>>>>>>>>>>>>>>>>> customize
>>>>>>>>>>>>>>>>>>>>>> internal names so that they can still be reused
>>>>> even
>>>>>> with
>>>>>>>>>>>>>>>>> topology
>>>>>>>>>>>>>>>>>>>>> changes
>>>>>>>>>>>>>>>>>>>>>> (e.g. KIP-230), so I think removing the suffix
>>>>> index
>>>>>> would
>>>>>>>>>>> be
>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>> applicable in the long run.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Thu, May 31, 2018 at 3:08 PM, Florian
>>>>> Hussonnois <
>>>>>>>>>>>>>>>>>>>>> fhussonn...@gmail.com>
>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi ,
>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your feedback.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> 1/
>>>>>>>>>>>>>>>>>>>>>>> I agree that overloading most of the methods with
>>>>> a
>>>>>>>>>>> Processed
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>> ideal.
>>>>>>>>>>>>>>>>>>>>>>> I've started modifying the KStream API and I got
>>>>> to the
>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>> conclusion.
>>>>>>>>>>>>>>>>>>>>>>> Also ading a new method directly to KStreamImpl
>>>>> and
>>>>>>>>>>>> KTableImpl
>>>>>>>>>>>>>>>>>>> classes
>>>>>>>>>>>>>>>>>>>>>>> seems to be a better option.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> However a processor name cannot be redefined after
>>>>>>>>>> calling
>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>> operator
>>>>>>>>>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>>>>>>>>> maybe I miss something in the code).
>>>>>>>>>>>>>>>>>>>>>>> From my understanding, this will only set the
>>>>> KStream
>>>>>>>>>> name
>>>>>>>>>>>>>>>>>> property
>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> processor name previsouly added to the topology
>>>>>> builder -
>>>>>>>>>>>>>>>>> leading
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> InvalidTopology exception.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> So the new method should actually defines the
>>>>> name of
>>>>>> the
>>>>>>>>>>>> next
>>>>>>>>>>>>>>>>>>>>> processor :
>>>>>>>>>>>>>>>>>>>>>>> Below is an example :
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> *stream.as <http://stream.as
>>>>>>>>>>>>>>>>>>> (Processed.name("MAPPE_TO_UPPERCASE")*
>>>>>>>>>>>>>>>>>>>>>>> *          .map( (k, v) -> KeyValue.pair(k,
>>>>>>>>>>>> v.toUpperCase()))*
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I think this approach could solve the cases for
>>>>> methods
>>>>>>>>>>>>>>>>> returning
>>>>>>>>>>>>>>>>>>>> void ?
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Regarding this new method we have two possible
>>>>>>>>>>>> implementations
>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>    1. Adding a method like : withName(String
>>>>>>>>>> processorName)
>>>>>>>>>>>>>>>>>>>>>>>    2. or adding a method accepting an Processed
>>>>> object
>>>>>> :
>>>>>>>>>>>>>>>>>>>> as(Processed).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I think solution 2. is preferable as the Processed
>>>>>> class
>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> enriched
>>>>>>>>>>>>>>>>>>>>>>> further (in futur).
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> 2/
>>>>>>>>>>>>>>>>>>>>>>> As Guozhang said some operators add internal
>>>>>> processors.
>>>>>>>>>>>>>>>>>>>>>>> For example the branch() method create one
>>>>>> KStreamBranch
>>>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> route
>>>>>>>>>>>>>>>>>>>>>>> records and one KStreamPassThrough processor for
>>>>> each
>>>>>>>>>>> branch.
>>>>>>>>>>>>>>>>>>>>>>> In that situation only the parent processor can be
>>>>>> named.
>>>>>>>>>>> For
>>>>>>>>>>>>>>>>>>> children
>>>>>>>>>>>>>>>>>>>>>>> processors we could keep the current behaviour
>>>>> that
>>>>>> add a
>>>>>>>>>>>>>> suffix
>>>>>>>>>>>>>>>>>>> (i.e
>>>>>>>>>>>>>>>>>>>>>>> KSTREAM-BRANCHCHILD-)
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> This also the case for the join() method that
>>>>> result to
>>>>>>>>>>>> adding
>>>>>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>>>>>>>> processors to the topology (windowing, left/right
>>>>> joins
>>>>>>>>>>> and a
>>>>>>>>>>>>>>>>>> merge
>>>>>>>>>>>>>>>>>>>>>>> processor).
>>>>>>>>>>>>>>>>>>>>>>> I think, like for the branch method users could
>>>>> only
>>>>>>>>>>> define a
>>>>>>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>>>>>>> name prefix.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> 3/
>>>>>>>>>>>>>>>>>>>>>>> I think we should  still added a suffix like
>>>>>>>>>> "-0000000000"
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>>>>>>> name and enforce uppercases as this will keep some
>>>>>>>>>>>> consistency
>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> ones generated by the API.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> 4/
>>>>>>>>>>>>>>>>>>>>>>> Yes, the KTable interface should be modified like
>>>>>> KStream
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>> custom
>>>>>>>>>>>>>>>>>>>>>>> processor names definition.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Le jeu. 31 mai 2018 à 19:18, Damian Guy <
>>>>>>>>>>>> damian....@gmail.com>
>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> écrit
>>>>>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi Florian,
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP. What about KTable and other
>>>>> DSL
>>>>>>>>>>>>>> interfaces?
>>>>>>>>>>>>>>>>>>> Will
>>>>>>>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>>>>>>> not want to be able to do the same thing?
>>>>>>>>>>>>>>>>>>>>>>>> It would be good to see a complete set of the
>>>>> public
>>>>>> API
>>>>>>>>>>>>>>>>> changes.
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>> Damian
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 May 2018 at 19:45 Guozhang Wang <
>>>>>>>>>>>>>> wangg...@gmail.com
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Hello Florian,
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP. I have some meta feedbacks
>>>>> on the
>>>>>>>>>>>>>>>>> proposal:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> 1. You mentioned that this `Processed` object
>>>>> will be
>>>>>>>>>>> added
>>>>>>>>>>>>>>>>> to a
>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>> overloaded variant of all the stateless
>>>>> operators,
>>>>>> what
>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> stateful
>>>>>>>>>>>>>>>>>>>>>>>>> operators? Would like to hear your opinions if
>>>>> you
>>>>>> have
>>>>>>>>>>>>>>>>> thought
>>>>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>>>>> that:
>>>>>>>>>>>>>>>>>>>>>>>>> note for stateful operators they will usually be
>>>>>> mapped
>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>>>>>>>>>> processor node names, so we probably need to
>>>>> come up
>>>>>>>>>> with
>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>> ways
>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> define all their names.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> 2. I share the same concern with Bill as for
>>>>> adding
>>>>>>>>>> lots
>>>>>>>>>>> of
>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>> overload
>>>>>>>>>>>>>>>>>>>>>>>>> functions into the stateless operators, as we
>>>>> have
>>>>>> just
>>>>>>>>>>>> spent
>>>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>>>>> effort in trimming them since 1.0.0 release. If
>>>>> the
>>>>>>>>>> goal
>>>>>>>>>>> is
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>>>>>>>>>> some "hints" on the generated processor node
>>>>> names,
>>>>>> not
>>>>>>>>>>>>>>>>> strictly
>>>>>>>>>>>>>>>>>>>>>>>> enforcing
>>>>>>>>>>>>>>>>>>>>>>>>> the exact names that to be generated, then how
>>>>> about
>>>>>> we
>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>> add a
>>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>> function to `KStream` and `KTable` classes like:
>>>>>>>>>>>>>>>>>> "as(Processed)",
>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> semantics as "the latest operators that
>>>>> generate this
>>>>>>>>>>>> KStream
>>>>>>>>>>>>>>>>> /
>>>>>>>>>>>>>>>>>>>> KTable
>>>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>> be named accordingly to this hint".
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> The only caveat, is that for all operators like
>>>>>>>>>>>> `KStream#to`
>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>> `KStream#print` that returns void, this
>>>>> alternative
>>>>>>>>>> would
>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>> work.
>>>>>>>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>> the current operators:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> a. KStream#print,
>>>>>>>>>>>>>>>>>>>>>>>>> b. KStream#foreach,
>>>>>>>>>>>>>>>>>>>>>>>>> c. KStream#to,
>>>>>>>>>>>>>>>>>>>>>>>>> d. KStream#process
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I personally felt that except `KStream#process`
>>>>> users
>>>>>>>>>>> would
>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>> usually
>>>>>>>>>>>>>>>>>>>>>>>>> bother to override their names, and for
>>>>>>>>>> `KStream#process`
>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>> overload variant with the additional Processed
>>>>>> object.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> 3. In your example, the processor names are
>>>>> still
>>>>>> added
>>>>>>>>>>>> with
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>> suffix
>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>>>>>>>>>>> -0000000000", is this intentional? If yes, why
>>>>> (I
>>>>>>>>>> thought
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>> specified processor name hints we will not add
>>>>> suffix
>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> distinguish
>>>>>>>>>>>>>>>>>>>>>>>>> different nodes of the same type any more)?
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 29, 2018 at 6:47 AM, Bill Bejeck <
>>>>>>>>>>>>>>>>> bbej...@gmail.com
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Florian,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.  I think being able to add
>>>>> more
>>>>>>>>>>>> context
>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> processor names would be useful.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> I like the idea of adding a
>>>>> "withProcessorName" to
>>>>>>>>>>>> Produced,
>>>>>>>>>>>>>>>>>>>> Consumed
>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> Joined.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> But instead of adding the "Processed"
>>>>> parameter to a
>>>>>>>>>>> large
>>>>>>>>>>>>>>>>>>>> percentage
>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>> the methods, which would result in overloaded
>>>>>> methods
>>>>>>>>>>>> (which
>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>> removed
>>>>>>>>>>>>>>>>>>>>>>>>>> quite a bit with KIP-182) what do you think of
>>>>>> adding
>>>>>>>>>> a
>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>>>> to the AbstractStream class "withName(String
>>>>>>>>>>>>>> processorName)"?
>>>>>>>>>>>>>>>>>> BTW
>>>>>>>>>>>>>>>>>>>> I"m
>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>> married to the method name, it's the best I
>>>>> can do
>>>>>> off
>>>>>>>>>>> the
>>>>>>>>>>>>>>>>> top
>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>>>>>>>> head.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> For the methods that return void, we'd have to
>>>>> add a
>>>>>>>>>>>>>>>>> parameter,
>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> would at least cut down on the number of
>>>>> overloaded
>>>>>>>>>>>> methods
>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> API.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Just my 2 cents.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>> Bill
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, May 27, 2018 at 4:13 PM, Florian
>>>>> Hussonnois
>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>> fhussonn...@gmail.com
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to start a new discussion on
>>>>> following
>>>>>>>>>>> KIP :
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>> 307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> This is still a draft.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward for your feedback.
>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>> Florian HUSSONNOIS
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>> Florian HUSSONNOIS
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>> Florian HUSSONNOIS
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> Florian HUSSONNOIS
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> Florian HUSSONNOIS
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> --
>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> --
>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> -- Guozhang
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Florian HUSSONNOIS
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> -- Guozhang
>>>>
>>>
>>>
>>> --
>>> -- Guozhang
>>>
>>
>>
>> --
>> -- Guozhang
>>
> 
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-307: Allow to define custom processor names with KStreams DSL

Reply via email to