Re: [DISCUSS] KIP-307: Allow to define custom processor names with KStreams DSL

Matthias J. Sax Wed, 16 Jan 2019 14:41:14 -0800

Thanks for the details John.

While I understand your argument that it is no optimal to use
`Materialized` to set the processor name, I still slightly prefer this
option, because adding more overloads seems to be even worse to me.


But I would also not block this KIP if the majority of people prefer to
add overloads instead of extending `Materialized`.


However, I cannot follow your argument about `NamedOperation#name()`
getter method. So far, all configuration classes don't have getters and
it seems to be inconsistent to add a single one now. We also don't need
any cast IMHO, as we would use the same construct as we do for all other
config classed via `NamedInternal` to access the name:

> final String name = new NamedInternal(named).name();

Maybe, it would have been better to add getters from the beginning on
(even if I think it was the right decision to not add getters). However,
this ship have sailed and if we want to add getters to avoid the
`XxxInternal()` construct, we should do it for all classes -- however,
what would a user gain if we do this? It would just be a lot of "noise"
IMHO.


@Florian: I would suggest to start a VOTE if you want to get this into
2.2 release. The open questions seem to be minor and I think we can
resolve them in parallel to the vote.



-Matthias


On 1/16/19 12:59 PM, John Roesler wrote:
> Hi Matthias,
> 
> One thing that we discussed earlier was to avoid creating ambiguity by
> conflating config objects that configure an operation (like Grouped) with
> config objects that configure an aspect of the operation (like
> Materialized).
> 
> It is natural for the Grouped config to extend Named, as doing so indicates
> that grouping operations can be named (I.e., the name applies to the
> operation itself, which in turn makes it reasonable to use the operation's
> name as a component in the related processors' and topics' names).
> 
> But what would it mean for Materialized to extend Named? Materialized only
> configures the materialization of an operation's result, not the operation
> itself, so I guess it would mean the name applies to the result of the
> operation? It doesn't really work.
> 
> Adding config objects to the DSL was an attempt to avoid overload bloat as
> more aspects of operations need to be configured.
> However, we made a mistake with Materialized, since (as noted) it doesn't
> configure the operation itself, but just one aspect of it.
> We basically bagged a bunch of parameters into one, without solving the
> problem structurally, and this is the result:
> As soon as we need to configure a *different* aspect of the operation, we
> again need to add a new overload, and the cycle begins again.
> 
> The proper solution here is to add an eponymous config object to each
> stateful operation, one which mixes in or composes the Materialized aspect
> config and the Named aspect config. But this is a large API change, and we
> decided on the middle ground of just adding Named as an optional parameter
> via new overloads for now.
> 
> A similar compromise was to go ahead and add a Named overload directly to
> all the operators that currently have no config object.
> Again, the proper thing would be to add a new config class for each
> individual operation, but it seemed like a drastic change.
> We basically said that right now, we don't think we'll ever need to
> configure another aspect of those operators than the name, and we're
> acknowledging that if we do, we'll have to created a small mess to clean up.
> It's really just a generalization of the same problem with Materialized
> operations.
> 
> To answer your question about the Named interface:
> The primary reason is that Named is an aspect that is meant to be mixed in
> with other config objects.
> For example, Grouped can extend Named.
> If we followed the pattern you've referenced, we would have a public
> interface Named with only the setter and a private class NamedInternal with
> the setter and getter.
> But would Grouped be a subclass of NamedInternal?
> Then, we could only have one kind of aspect mixin, since Java doesn't have
> multiple class inheritance, or we'd have to decide if the next thing should
> be a superclass of Named or a subclass of Named and a superclass of Grouped.
> Plus, in the implementation, instead of just casting Grouped to
> GroupedInternal (which is already unclean design), we'd also be casting
> Grouped to NamedInternal, which is super confusing.
> 
> It's far cleaner all around just to use the type system "the right way",
> which is what we've proposed.
> Any config class can mix in the Named aspect, and it inherits a contract to
> supply both the setter and the getter.
> Our implementation can actually avoid any casting in this usage, since we
> can just call grouped.name() to get the name, instead of something like
> ((NamedInternal) grouped).name().
> 
> Plus, what harm does it do to let people get back the configuration
> property that they *just set* on the config object?
> It doesn't break encapsulation.
> It would certainly make writing tests a lot easier for everyone.
> 
> All around, I would advocate for moving toward this design for all the
> config interfaces, as I've previously demonstrated how we've made an
> intractable mess out of the window config hierarchy by trying to be clever
> and hiding the getters.
> 
> I hope this helps,
> -John
> 
> 
> On Wed, Jan 16, 2019 at 12:59 AM Matthias J. Sax <[email protected]>
> wrote:
> 
>> While I understand that it should be possible to specify store name and
>> processor name independent from each other, it's still unclear to me,
>> why we cannot use the `Materialized` parameter to specify the processor
>> name:
>>
>>> // only set the node name
>>> #count(Named.as("processorName"));
>>>
>>> // only set the store name
>>> #count(Materialized.as("storeName"));
>>>
>>> // set both
>>> #count(Materialized.as("storeName").withName("processorName"));
>>
>> This this case, it might be good to rename `withName` to
>> `withProcessorName` to avoid confusion with the store name.
>>
>> However, why do we need this:
>>
>>> #count(Materialized.as("storeName"), Named.as("processorName"));
>>
>> I would prefer to not add this overload.
>>
>>
>>
>> Strictly, we could also avoid `#count(Named)`, and set the processor
>> name only via:
>>
>>> #count(Materialized.as(null).withName("processorName"));
>>
>> I admit, it's a little clumsy, but would save us one more overload.
>>
>>
>>
>> One more comment that I forgot last time: why do we add the getter
>> `Named#name()`? All other configuration classes only define setters and
>> we add getters only in the internal implementation.
>>
>>
>> -Matthias
>>
>> On 1/13/19 4:22 AM, Florian Hussonnois wrote:
>>> Hi Matthias,
>>>
>>> The reason for overloading the methods with Materialized parameter is
>>> regarding the semantic of this class.
>>> The Materialized class allow to name a queryable store. if a name is set
>>> then it will be used both to name the state-store and  the
>> changelog-topic.
>>> If no name is given, then the provided Named will be used.
>>> This allow to name the operation without having a queriable store.
>>>
>>> So if my analysis is correct, we will end up with :
>>>
>>>                                   Generated  | Named   | Joined /
>> Grouped
>>> |  Materialized
>>>
>> -------------------------------------------------------------------------------------------------
>>> Node                     |               X       |     X       |    X
>>>                  |
>>>
>> -------------------------------------------------------------------------------------------------
>>> Repartition Topic   |               X       |              |    X
>>>              |
>>>
>> -------------------------------------------------------------------------------------------------
>>> Queryable Store    |                        |              |
>>>               |     X
>>>
>> -------------------------------------------------------------------------------------------------
>>> State store             |               X      |     X      |     X
>>>                |     X
>>>
>> -------------------------------------------------------------------------------------------------
>>> Changelog Topic    |              X       |      X    |      X
>>>          |     X
>>>
>> -------------------------------------------------------------------------------------------------
>>>
>>> Le dim. 13 janv. 2019 à 03:23, Matthias J. Sax <[email protected]> a
>>> écrit :
>>>
>>>> Just catching up on this KIP again.
>>>>
>>>> One nit. The KIP says:
>>>>
>>>>> In addition, the generated names have a few disadvantages to guarantee
>>>> topology compatibilities. In fact, adding a new operator, using a
>>>> third-library doing some optimization to remove some operators or
>> upgrading
>>>> to a new KafkaStreams version with internal API changes may changed
>> suffix
>>>> indexing for a large amount of the processor names. This will in turn
>>>> change the internal state store names, as well as internal topic names
>> as
>>>> well.
>>>>>
>>>>
>>>> This is not true any longer (I guess it was true, when the KIP was
>>>> initially proposed), because all stores/internal-topics can be named
>>>> since 2.1 release. I would suggest to remove the paragraph.
>>>>
>>>> Overall, I like the Named/NamedOperation design.
>>>>
>>>> What is unclear to me thought is, why we need new overloads for methods
>>>> that accept `Materialized`. To be more precise, I think it make sense to
>>>> add an overload that only takes `Named`, but not one that takes both
>>>> `Named` and `Materialized`. For example:
>>>>
>>>> KGroupedStream#count() // exists
>>>> KGroupedStream#count(Materialized) // exits
>>>> KGroupedStream#count(Named) // added (makes sense to me)
>>>> KGroupedStream#count(Named, Materialized) // added -- why?
>>>>
>>>> I would prefer to use `Materialized` to name the processor for this
>>>> case, too. Can you elaborate on the motivation?
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 1/11/19 3:39 PM, Florian Hussonnois wrote:
>>>>> Hi Guozhang,
>>>>>
>>>>> I have updated the PR as well as the KIP. I should add more unit tests
>> to
>>>>> covers all new methods.
>>>>>
>>>>> However, I still have one test in failure. The reason is that using
>>>>> Joined.name() in both potential repartition topic and processor nodes
>>>> leads
>>>>> to topology-incompatible.
>>>>> How should we deal with that ?
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Le jeu. 10 janv. 2019 à 01:21, Guozhang Wang <[email protected]> a
>>>> écrit :
>>>>>
>>>>>> Hello Florian,
>>>>>>
>>>>>> Just checking if have read about my previous email and if you feel
>> happy
>>>>>> about it. We have the 2.2 KIP freeze deadline at 24th this month,
>> while
>>>> the
>>>>>> PR itself is getting quite close. So it'll be great if we can get the
>>>>>> agreement on it and get it into 2.2.0 release.
>>>>>>
>>>>>>
>>>>>> Guozhang
>>>>>>
>>>>>>
>>>>>> On Mon, Dec 17, 2018 at 2:39 PM Guozhang Wang <[email protected]>
>>>> wrote:
>>>>>>
>>>>>>> Hi Florian / John,
>>>>>>>
>>>>>>> Just wanted to throw a couple minor thoughts on the current proposal:
>>>>>>>
>>>>>>> 1) Regarding the interface / function name, I'd propose we call the
>>>>>>> interface `NamedOperation` which would be implemented by Produced /
>>>>>>> Consumed / Printed / Joined / Grouped / Suppressed (note I
>>>> intentionally
>>>>>>> exclude Materialized here since its semantics is quite), and have the
>>>>>>> default class that implements `NamedOperation` as `Named`, which
>> would
>>>> be
>>>>>>> used in our adding overload functions. The main reason is to have
>>>>>>> consistency in naming.
>>>>>>>
>>>>>>> 2) As a minor tweak, I think it's better to use Joined.name() in both
>>>> its
>>>>>>> possibly generate repartition topic, as well as the map processor
>> used
>>>> for
>>>>>>> group-by (currently this name is only used for the repartition
>> topic).
>>>>>>>
>>>>>>>
>>>>>>> Florian: if you think this proposal makes sense, please feel free to
>> go
>>>>>>> ahead and update the PR; after we made a first pass on it and feels
>>>>>>> confident about it, we can go ahead with the VOTING process. About
>> the
>>>>>>> implementation of 2) above, this may be out of your implementation
>>>> scope,
>>>>>>> so feel free to leave it out side your PR while Bill who originally
>>>> worked
>>>>>>> on the Grouped KIP can make a follow-up PR for it.
>>>>>>>
>>>>>>> Guozhang
>>>>>>>
>>>>>>> On Fri, Dec 14, 2018 at 9:43 PM Guozhang Wang <[email protected]>
>>>> wrote:
>>>>>>>
>>>>>>>> Hello Florian,
>>>>>>>>
>>>>>>>> Really appreciate you for your patience.
>>>>>>>>
>>>>>>>> I know that we've discussed about the approach to adding overloaded
>>>>>>>> functions and rejected it early on. But looking deeper into the
>>>> current PR
>>>>>>>> I realized that this approach has a danger of great API confusions
>> to
>>>> users
>>>>>>>> (I tried to explain my thoughts in the PR, but it was not very
>> clear)
>>>> ---
>>>>>>>> the basic idea is that, today we already have a few existing control
>>>>>>>> classes including Grouped, Joined, Suppressed that allow users to
>>>> specify
>>>>>>>> serdes etc, while also a "name" which can then be used to define the
>>>>>>>> processor name / internal topic names in the topology (the static
>>>> function
>>>>>>>> names are not consistent, which I think we should fix as well). And
>>>> Named
>>>>>>>> interface, by extending the lambda function interfaces like
>>>> ValueJoiner /
>>>>>>>> Predicate etc opens the door for another way to specify the names
>>>> again.
>>>>>>>>
>>>>>>>> So in order to achieve consistency, we are left with generally two
>>>>>>>> options:
>>>>>>>>
>>>>>>>> 1) only allow users to specify names via the lambda interfaces that
>>>>>>>> extends Named interface. This means we'd better remove the naming
>>>> mechanism
>>>>>>>> from the existing control objects to keep consistency.
>>>>>>>>
>>>>>>>> 2) only allow users to specify names via control classes, and we
>>>>>>>> introduce a new class (Named) for those which do not have one yet
>> ---
>>>> this
>>>>>>>> leads to the overloaded functions.
>>>>>>>>
>>>>>>>> I did a quick count on the num.of overloaded functions, and summing
>>>> from
>>>>>>>> KTable (8) / KStream (15) / KGroupedStream (6) / KGroupedTable (6) /
>>>>>>>> TimeWindowedKStream (6) / SessionWindowedKStream (6) we got about 47
>>>>>>>> overloaded functions (our guess was pretty close!) -- note this is
>>>> based on
>>>>>>>> John's proposal that we can let existing Grouped / Joined to extend
>>>> Named
>>>>>>>> and hence we only need overloaded functions with a default
>>>> NamedOperation
>>>>>>>> for those operators that do not have a control classes already.
>>>>>>>>
>>>>>>>> Thinking about this approach I feel it is not too bad compared with
>>>>>>>> either 1) above, which would require us to deprecate lot of public
>>>>>>>> functions around name(), or having a mixed mechanism for naming,
>> which
>>>>>>>> could lead to very confusing behavior to users. Additionally, for
>> most
>>>>>>>> users who would only want to specify the names for those stateful
>>>>>>>> operations which have internal topics / state stores and hence are
>>>> more
>>>>>>>> keen to upgrade compatibility, those added overloads would be
>>>> not-often
>>>>>>>> used functions for them anyways. And by letting existing control
>>>> classes to
>>>>>>>> extend Named, we can have a unified method name for static
>>>> constructor as
>>>>>>>> well.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Guozhang
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Dec 14, 2018 at 10:24 AM John Roesler <[email protected]>
>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Florian,
>>>>>>>>>
>>>>>>>>> Sorry about the run-around of rejecting the original proposal,
>>>>>>>>> only to return to it later on. Hopefully, it's more encouraging
>>>>>>>>> than frustrating that we're coming around to your initial way of
>>>>>>>>> thinking.
>>>>>>>>>
>>>>>>>>> Thanks!
>>>>>>>>> -John
>>>>>>>>>
>>>>>>>>> On Thu, Dec 13, 2018 at 4:28 PM Florian Hussonnois <
>>>>>>>>> [email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>> Thanks again. I agree with your propositions.
>>>>>>>>>> Also IMHO, overloading all methods (filter, map) to accept a new
>>>>>>>>> control
>>>>>>>>>> object seems to provide a more natural development experience for
>>>>>>>>> users.
>>>>>>>>>>
>>>>>>>>>> Actually, this was the first proposition for this KIP, but we have
>>>>>>>>> rejected
>>>>>>>>>> it because this solution led to adding a lot of new methods.
>>>>>>>>>> As you mentioned it, the API has evolve since the creation of this
>>>>>>>>> KIP -
>>>>>>>>>> some existing control objects already allow to customize internal
>>>>>>>>> names. We
>>>>>>>>>> should so keep on that strategy.
>>>>>>>>>>
>>>>>>>>>> If everyone is OK with that, I will update the KIP and the PR
>>>>>>>>> accordingly;
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>
>>>>>>>>>> Le jeu. 13 déc. 2018 à 18:08, John Roesler <[email protected]> a
>>>>>>>>> écrit :
>>>>>>>>>>
>>>>>>>>>>> Hi again, all,
>>>>>>>>>>>
>>>>>>>>>>> Matthias, I agree with you.
>>>>>>>>>>>
>>>>>>>>>>> Florian, thanks for your response.
>>>>>>>>>>>
>>>>>>>>>>> I think your proposal is the best way to address the ask for
>> hiding
>>>>>>>>> the
>>>>>>>>>>> name() getter. But I'd like to question that ask and instead
>>>>>>>>> propose that
>>>>>>>>>>> we just make the name() getter part of the public API.
>>>>>>>>>>>
>>>>>>>>>>> The desire to "hide" the getters causes a lot of complexity in
>> our
>>>>>>>>> code
>>>>>>>>>>> base, and it will become completely impractical with the mixin
>>>>>>>>> strategy
>>>>>>>>>> of
>>>>>>>>>>> Named.
>>>>>>>>>>>
>>>>>>>>>>> If we were to switch strategies back to mixing Named in to the
>>>>>>>>> control
>>>>>>>>>>> objects rather than the functions, then the path forward becomes
>>>>>>>>> quite
>>>>>>>>>>> clear.
>>>>>>>>>>>
>>>>>>>>>>> On the other hand, it seems harmless for anyone who wants to be
>>>>>>>>> able to
>>>>>>>>>>> query the name from a control object after setting it, so my vote
>>>>>>>>> would
>>>>>>>>>> be
>>>>>>>>>>> simply to keep the Named interface as:
>>>>>>>>>>>
>>>>>>>>>>> public interface Named<T extends Named<T>> {
>>>>>>>>>>>   String name();
>>>>>>>>>>>   T withName(String name);
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> Under this proposal, we only mix Named in to the control objects,
>>>>>>>>> which
>>>>>>>>>>> means we have no need of default implementations anymore (because
>>>>>>>>> we can
>>>>>>>>>>> update all the control objects concurrently with adding this
>>>>>>>>> interface to
>>>>>>>>>>> them).
>>>>>>>>>>>
>>>>>>>>>>> This does hinge on switching over to a control-object-only
>>>> strategy,
>>>>>>>>>> which
>>>>>>>>>>> introduces the need to add about 50 new control object classes,
>>>>>>>>> which
>>>>>>>>>> would
>>>>>>>>>>> only serve to implement Named. As a middle ground, maybe we could
>>>>>>>>> just
>>>>>>>>>> add
>>>>>>>>>>> one generic control object class, like:
>>>>>>>>>>>
>>>>>>>>>>> public class NamedOperation implements Named<NamedOperation> {
>>>>>>>>>>>   private final String name;
>>>>>>>>>>>   private NamedOperation(final String name) { this.name = name;
>> }
>>>>>>>>>>>   public static NamedOperation name(final String name) {
>>>>>>>>>>>     return new NamedOperation(name);
>>>>>>>>>>>   }
>>>>>>>>>>>   public String name() { return name; }
>>>>>>>>>>>   public NamedOperation withName(final String name) {
>>>>>>>>>>>     return new NamedOperation(name);
>>>>>>>>>>>   }
>>>>>>>>>>> }
>>>>>>>>>>>
>>>>>>>>>>> And then, we'd add overloads for all the methods that don't have
>>>>>>>>> control
>>>>>>>>>>> objects already (for example, filter() ):
>>>>>>>>>>>
>>>>>>>>>>> // existing
>>>>>>>>>>> KStream<K, V> filter(Predicate<? super K, ? super V> predicate);
>>>>>>>>>>>
>>>>>>>>>>> // new
>>>>>>>>>>> KStream<K, V> filter(Predicate<? super K, ? super V> predicate,
>>>>>>>>>>> NamedOperation named);
>>>>>>>>>>>
>>>>>>>>>>> Additionally, in regard to Matthias's point about existing
>> control
>>>>>>>>>> objects
>>>>>>>>>>> with naming semantics, they would extend Named (but not
>>>>>>>>> NamedOperation)
>>>>>>>>>> for
>>>>>>>>>>> uniformity.
>>>>>>>>>>>
>>>>>>>>>>> You provided a good approach to hide the getter with your
>>>>>>>>> SettableName
>>>>>>>>>>> class; I think what you proposed is the only way we could hide
>> the
>>>>>>>>> name.
>>>>>>>>>>> In the end, though, it's a lot of complexity added (control
>> object
>>>>>>>>> class
>>>>>>>>>>> hierarchy, inheritance, mutable state, internal casting) for
>>>>>>>>> something of
>>>>>>>>>>> dubious value: to be able to hide the name from someone *after
>> they
>>>>>>>>>>> themselves have set it*.
>>>>>>>>>>>
>>>>>>>>>>> Although it'll be a pain, perhaps Matthias's suggestion to
>>>>>>>>> enumerate all
>>>>>>>>>>> the API methods is the best way to be sure we all agree on what's
>>>>>>>>> going
>>>>>>>>>> to
>>>>>>>>>>> happen.
>>>>>>>>>>>
>>>>>>>>>>> Thanks again for wrangling with this issue,
>>>>>>>>>>> -John
>>>>>>>>>>>
>>>>>>>>>>> On Thu, Dec 13, 2018 at 9:03 AM Matthias J. Sax <
>>>>>>>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Just catching up on this discussion.
>>>>>>>>>>>>
>>>>>>>>>>>> My overall personal take is, that I am not a big fan of the
>>>>>>>>> interface
>>>>>>>>>>>> `Named` that is used as a factory. I would rather prefer to add
>> a
>>>>>>>>>>>> control object parameter to all methods that don't have one yet.
>>>>>>>>> This
>>>>>>>>>>>> KIP was started a while ago, and we added new naming
>> capabilities
>>>>>>>>> in
>>>>>>>>>> the
>>>>>>>>>>>> meantime. Guozhang's example in the PR comment about naming in
>>>>>>>>>>>> stream-stream join shows, that we might end up in a confusion
>>>>>>>>> situation
>>>>>>>>>>>> for users if we use `Named`. Also, in 2.1, user can already name
>>>>>>>>> as
>>>>>>>>>>>> repartition-/changelog-topics and stores. Thus, KIP-307 boils
>>>>>>>>> down to
>>>>>>>>>>>> provide non-functional naming?
>>>>>>>>>>>>
>>>>>>>>>>>> Hence, for all methods that allow to specify names already, I
>>>>>>>>> don't see
>>>>>>>>>>>> any reason to change them, but use the existing API to also name
>>>>>>>>> the
>>>>>>>>>>>> processor(s) instead of allowing uses to specify a new name.
>>>>>>>>>>>>
>>>>>>>>>>>> About the inconsistency in method naming. I agree, that `as` is
>>>>>>>>> very
>>>>>>>>>>>> generic and maybe not the best choice.
>>>>>>>>>>>>
>>>>>>>>>>>> I think it might be helpful, to have a table overview in the
>> KIP,
>>>>>>>>> that
>>>>>>>>>>>> list all existing static/non-static methods that allow to
>> specify
>>>>>>>>> a
>>>>>>>>>>>> name, plus a columns with the new suggested naming for those
>>>>>>>>> methods?
>>>>>>>>>>>>
>>>>>>>>>>>> Thoughts?
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On 12/12/18 12:45 AM, Florian Hussonnois wrote:
>>>>>>>>>>>>> Thank you very much for your feedbacks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Currently, there is still lot of discussions regarding the
>> Named
>>>>>>>>>>>> interface.
>>>>>>>>>>>>> On the one hand we should provided consistency over the stream
>>>>>>>>> API
>>>>>>>>>> and
>>>>>>>>>>> on
>>>>>>>>>>>>> the other hand we should not break the semantic as John point
>>>>>>>>> it up.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Guozhang, I'm sorry, but I'm little bit confused, maybe I
>> missed
>>>>>>>>>>>> something.
>>>>>>>>>>>>> In your comment you have suggested that :
>>>>>>>>>>>>> * Produced/Consumed/Suppressed should extends Named
>>>>>>>>>>>>> * Named should have a private-package method to get the
>>>>>>>>> specified
>>>>>>>>>>>> processor
>>>>>>>>>>>>> name internally (processorName())
>>>>>>>>>>>>> * Finally we should end up with something like :  Named -> XXX
>>>>>>>>> ->
>>>>>>>>>>>>> XXXInternal or Named -> Produced -> ProducedInternal
>>>>>>>>>>>>>
>>>>>>>>>>>>> The objective behind that is to :
>>>>>>>>>>>>> * consolidate the internal method processorName()
>>>>>>>>>>>>> * consolidate the method withName that exists now existing into
>>>>>>>>>>> Produced,
>>>>>>>>>>>>> Consumed and Suppressed.
>>>>>>>>>>>>>
>>>>>>>>>>>>> But, Named is an interface so we can't define a private-package
>>>>>>>>>> method
>>>>>>>>>>> on
>>>>>>>>>>>>> it. Also, for example Produced and ProducedInternal are not in
>>>>>>>>> the
>>>>>>>>>> same
>>>>>>>>>>>>> package so having a private-package method doesn't really help.
>>>>>>>>>>>>> In addition, if we add the withName method into Named interface
>>>>>>>>> this
>>>>>>>>>>> can
>>>>>>>>>>>>> become confusing for developers because action interfaces
>>>>>>>>>> (ValueMapper,
>>>>>>>>>>>>> Reducer, etc) extend it.
>>>>>>>>>>>>> The interface would look like :
>>>>>>>>>>>>>
>>>>>>>>>>>>> public interface Named<T extends Named<T>> {
>>>>>>>>>>>>>     default String name() {
>>>>>>>>>>>>>         return null;
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>     default Named<T> withName(final String name) {
>>>>>>>>>>>>>         return null;
>>>>>>>>>>>>>     }
>>>>>>>>>>>>> ...
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> So maybe instead of adding another method to Named we could
>>>>>>>>> create a
>>>>>>>>>>> new
>>>>>>>>>>>>> package-private class that could be extended by
>>>>>>>>>>>>> Produced/Consumed/Joined/Suppressed. For exemple,
>>>>>>>>>>>>> class SettableName<T extends SettableName<T>> implements Named
>> {
>>>>>>>>>>>>>
>>>>>>>>>>>>>     protected String processorName;
>>>>>>>>>>>>>
>>>>>>>>>>>>>     SettableName(final SettableName settable) {
>>>>>>>>>>>>>         this(Objects.requireNonNull(settable, "settable can't
>> be
>>>>>>>>>>>>> null").name());
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>
>>>>>>>>>>>>>     SettableName(final String processorName) {
>>>>>>>>>>>>>         this.processorName = processorName;
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>
>>>>>>>>>>>>>     @Override
>>>>>>>>>>>>>     public String name() {
>>>>>>>>>>>>>         return processorName;
>>>>>>>>>>>>>     }
>>>>>>>>>>>>>     public T withName(final String processorName) {
>>>>>>>>>>>>>         this.processorName = processorName;
>>>>>>>>>>>>>         return (T)this;
>>>>>>>>>>>>>     }
>>>>>>>>>>>>> }
>>>>>>>>>>>>>
>>>>>>>>>>>>> In that way, we will get : public class Produced implements
>>>>>>>>>>>>> SettableName<Produced> { ...
>>>>>>>>>>>>>
>>>>>>>>>>>>> WDYT?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Le mar. 11 déc. 2018 à 02:46, Guozhang Wang <
>> [email protected]>
>>>>>>>>> a
>>>>>>>>>>>> écrit :
>>>>>>>>>>>>>
>>>>>>>>>>>>>> I had one meta comment on the PR:
>>>>>>>>>>>>>>
>>>>>>>>> https://github.com/apache/kafka/pull/5909#discussion_r240447153
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Mon, Dec 10, 2018 at 5:22 PM John Roesler <
>>>>>>>>> [email protected]>
>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Hi Florian,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I hope it's ok if I ask a few questions at this late stage...
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Comment 1 ======
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It seems like the proposal is to add a new "Named" interface
>>>>>>>>> that
>>>>>>>>>> is
>>>>>>>>>>>>>>> intended to be mixed in with the existing API objects at
>>>>>>>>> various
>>>>>>>>>>>> points.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Just to preface some of my comments, it looks like your KIP
>>>>>>>>> was
>>>>>>>>>>> created
>>>>>>>>>>>>>>> quite a while ago, so the API may have changed somewhat since
>>>>>>>>> you
>>>>>>>>>>>>>> started.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> As I see the API, there are a few different kinds of DSL
>>>>>>>>> method
>>>>>>>>>>>>>> arguments:
>>>>>>>>>>>>>>> * functions: things like Initializer, Aggregator,
>> ValueJoiner,
>>>>>>>>>>>>>>> ForEachAction... All of these are essentially
>> Streams-flavored
>>>>>>>>>>> Function
>>>>>>>>>>>>>>> interfaces with different arities, type bounds, and
>> semantics.
>>>>>>>>>>>>>>> * config objects: things like Produced, Consumed, Joined,
>>>>>>>>>> Grouped...
>>>>>>>>>>>>>> These
>>>>>>>>>>>>>>> are containers for configurations, where the target of the
>>>>>>>>>>>> configuration
>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>> the operation itself
>>>>>>>>>>>>>>> * raw configurations: things like a raw topic-name string and
>>>>>>>>>>>>>> Materialized:
>>>>>>>>>>>>>>> These are configurations for operations that have no config
>>>>>>>>> object,
>>>>>>>>>>> and
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>> various reasons, we didn't make one. The distinguishing
>>>>>>>>> feature is
>>>>>>>>>>> that
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> target of the configuration is not the operation itself, but
>>>>>>>>> some
>>>>>>>>>>>> aspect
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>> it. For example, in Materialized, we are not setting the
>>>>>>>>> caching
>>>>>>>>>>>> behavior
>>>>>>>>>>>>>>> of, for example, an aggregation; we're setting the caching
>>>>>>>>> behavior
>>>>>>>>>>> of
>>>>>>>>>>>> a
>>>>>>>>>>>>>>> materialized state store attached to the aggregation.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It seems like choosing to mix the Named interface in with the
>>>>>>>>>>> functions
>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>> a couple of unfortunate side-effects:
>>>>>>>>>>>>>>> * Aggregator is not the only function passed to any of the
>>>>>>>>> relevant
>>>>>>>>>>>>>>> aggregate methods, so it seems a little arbitrary to pick
>> that
>>>>>>>>>>> function
>>>>>>>>>>>>>>> over Initializer or Merger.
>>>>>>>>>>>>>>> * As you noted, branch() takes an array of Predicate, so we
>>>>>>>>> just
>>>>>>>>>>> ignore
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> provided name(s), even though Predicate names are used
>>>>>>>>> elsewhere.
>>>>>>>>>>>>>>> * Not all things that we want to name have function
>> arguments,
>>>>>>>>>>> notably
>>>>>>>>>>>>>>> source and sink, so we'd switch paradigms and use the config
>>>>>>>>> object
>>>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>>> * Adding an extra method to the function interfaces means
>> that
>>>>>>>>>> those
>>>>>>>>>>>> are
>>>>>>>>>>>>>> no
>>>>>>>>>>>>>>> longer SAM interfaces. You proposed to add a default
>>>>>>>>>> implementation,
>>>>>>>>>>> so
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>> could still pass a lambda if we don't want to set the name,
>>>>>>>>> but if
>>>>>>>>>> we
>>>>>>>>>>>>>> *do*
>>>>>>>>>>>>>>> want to set the name, we can no longer use lambdas.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I think the obvious other choice would be to mix Named in
>>>>>>>>> with the
>>>>>>>>>>>> config
>>>>>>>>>>>>>>> objects instead, but this has one main downside of its own...
>>>>>>>>>>>>>>> * not every operator we wish to name has a config object. I
>>>>>>>>> don't
>>>>>>>>>>> know
>>>>>>>>>>>> if
>>>>>>>>>>>>>>> everyone involved is comfortable with adding a config object
>>>>>>>>> to
>>>>>>>>>> every
>>>>>>>>>>>>>>> operator that's missing one.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Personally, I favor moving toward a more consistent state
>>>>>>>>> that's
>>>>>>>>>>>> forward
>>>>>>>>>>>>>>> compatible with any further changes we wish to make. I
>>>>>>>>> *think* that
>>>>>>>>>>>>>> giving
>>>>>>>>>>>>>>> every operator two forms (one with no config and one with a
>>>>>>>>> config
>>>>>>>>>>>>>> object)
>>>>>>>>>>>>>>> would be such an API.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Comment 2 =========
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Finally, just a minor comment: the static method in Named
>>>>>>>>> wouldn't
>>>>>>>>>>> work
>>>>>>>>>>>>>>> properly as defined. Assuming that we mix Named in with
>>>>>>>>> Produced,
>>>>>>>>>> for
>>>>>>>>>>>>>>> example, we'd need to be able to use it like:
>>>>>>>>>>>>>>>>  kStream.to("out", Produced.with("myOut"))
>>>>>>>>>>>>>>> This doesn't work because with() returns a Named, but we need
>>>>>>>>> a
>>>>>>>>>>>> Produced.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> We can pull off a builder method in the interface, but not a
>>>>>>>>> static
>>>>>>>>>>>>>> method.
>>>>>>>>>>>>>>> To define a builder method in the interface that returns an
>>>>>>>>>> instance
>>>>>>>>>>> of
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> concrete subtype, you have to use the "curiously recurring
>>>>>>>>> generic"
>>>>>>>>>>>>>>> pattern.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It would look like:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> public interface Named<N extends Named<N>> {
>>>>>>>>>>>>>>>   String name();
>>>>>>>>>>>>>>>   N withName(String name);
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> You can see where the name of the pattern comes from ;)
>>>>>>>>>>>>>>> An implementation would then look like:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> public class Produced implements Named<Produced> {
>>>>>>>>>>>>>>>   String name() { return name; }
>>>>>>>>>>>>>>>   Produced withName(final String name) { this.name = name;
>>>>>>>>> return
>>>>>>>>>>>> this;
>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Note that the generic parameter gets filled in properly in
>> the
>>>>>>>>>>>>>> implementing
>>>>>>>>>>>>>>> class, so that you get the right return type out.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> It doesn't work at all with a static factory method at the
>>>>>>>>>> interface
>>>>>>>>>>>>>> level,
>>>>>>>>>>>>>>> so it would be up to Produced to define a static factory if
>> it
>>>>>>>>>> wants
>>>>>>>>>>> to
>>>>>>>>>>>>>>> present one.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> ======
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Those are my two feedbacks!
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I hope you find this helpful, rather than frustrating. I'm
>>>>>>>>> sorry I
>>>>>>>>>>>> didn't
>>>>>>>>>>>>>>> get a chance to comment sooner.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Thanks for the KIP, I think it will be much nicer to be able
>>>>>>>>> to
>>>>>>>>>> name
>>>>>>>>>>>> the
>>>>>>>>>>>>>>> processor nodes.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Nov 27, 2018 at 6:34 PM Guozhang Wang <
>>>>>>>>> [email protected]>
>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hi Florian,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I've made a pass over the PR. There are some comments that
>>>>>>>>> are
>>>>>>>>>>> related
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>> the function names which may be affecting the KIP wiki page,
>>>>>>>>> but
>>>>>>>>>>>>>> overall
>>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>> think it looks good already.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Fri, Nov 16, 2018 at 4:21 PM Guozhang Wang <
>>>>>>>>> [email protected]
>>>>>>>>>>>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Thanks Florian! I will take a look at the PR.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Mon, Nov 12, 2018 at 2:44 PM Florian Hussonnois <
>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Hi Matthias,
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sorry I was absent for a while. I have started a new PR
>>>>>>>>> for this
>>>>>>>>>>>>>> KIP.
>>>>>>>>>>>>>>> It
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> still in progress for now. I'm working on it.
>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/5909
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Le ven. 19 oct. 2018 à 20:13, Matthias J. Sax <
>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>> écrit :
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> What is the status of this KIP?
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On 7/19/18 5:17 PM, Guozhang Wang wrote:
>>>>>>>>>>>>>>>>>>>> Hello Florian,
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Sorry for being late... Found myself keep apologizing
>>>>>>>>> for late
>>>>>>>>>>>>>>>> replies
>>>>>>>>>>>>>>>>>>>> these days. But I do want to push this KIP's progress
>>>>>>>>> forward
>>>>>>>>>>>>>> as I
>>>>>>>>>>>>>>>>>> see it
>>>>>>>>>>>>>>>>>>>> very important and helpful feature for extensibility.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> About the exceptions, I've gone through them and
>>>>>>>>> hopefully it
>>>>>>>>>> is
>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>> exhaustive list:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> 1. KTable#toStream()
>>>>>>>>>>>>>>>>>>>> 2. KStream#merge(KStream)
>>>>>>>>>>>>>>>>>>>> 3. KStream#process() / transform() / transformValues()
>>>>>>>>>>>>>>>>>>>> 4. KGroupedTable / KGroupedStream#count()
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Here's my reasoning:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> * It is okay not letting users to override the name for
>>>>>>>>> 1/2,
>>>>>>>>>>>>>> since
>>>>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>>> too trivial to be useful for debugging, plus their
>>>>>>>>> processor
>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>> not determine any related topic / store names.
>>>>>>>>>>>>>>>>>>>> * For 3, I'd vote for adding overloaded functions with
>>>>>>>>> Named.
>>>>>>>>>>>>>>>>>>>> * For 4, if users really want to name the processor she
>>>>>>>>> can
>>>>>>>>>> call
>>>>>>>>>>>>>>>>>>>> aggregate() instead, so I think it is okay to skip this
>>>>>>>>> case.
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>> On Fri, Jul 6, 2018 at 3:06 PM, Florian Hussonnois <
>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> The option #3 seems to be a good alternative and I find
>>>>>>>>> the
>>>>>>>>>> API
>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>> elegant (thanks John).
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> But, we still have the need to overload some methods
>>>>>>>>> either
>>>>>>>>>>>>>>> because
>>>>>>>>>>>>>>>>>>> they do
>>>>>>>>>>>>>>>>>>>>> not accept an action instance or because they are
>>>>>>>>> translated
>>>>>>>>>> to
>>>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>>>>>> processors.
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> For example, this is the case for methods branch() and
>>>>>>>>>> merge().
>>>>>>>>>>>>>>> We
>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>> introduce a new interface Named (or maybe a different
>>>>>>>>> name ?)
>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>> name(). All action interfaces could extend this one to
>>>>>>>>>>>>>> implement
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> option
>>>>>>>>>>>>>>>>>>>>> 3).
>>>>>>>>>>>>>>>>>>>>> This would result by having the following overloads  :
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Stream<K, V> merge(final Named name, final KStream<K,
>> V>
>>>>>>>>>>>>>> stream);
>>>>>>>>>>>>>>>>>>>>> KStream<K, V>[] branch(final Named name, final
>>>>>>>>> Predicate<?
>>>>>>>>>>>>>> super
>>>>>>>>>>>>>>>> K, ?
>>>>>>>>>>>>>>>>>>> super
>>>>>>>>>>>>>>>>>>>>> V>... predicates)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> N.B : The list above is  not exhaustive
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>>>>>>>> user's code will become :
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>         KStream<String, Integer> stream =
>>>>>>>>>>>>>> builder.stream("test");
>>>>>>>>>>>>>>>>>>>>>         KStream<String, Integer>[] branches =
>>>>>>>>>>>>>>>>>>>>> stream.branch(Named.with("BRANCH-STREAM-ON-VALUE"),
>>>>>>>>>>>>>>>>>>>>>                 Predicate.named("STREAM-PAIR-VALUE",
>>>>>>>>> (k, v)
>>>>>>>>>> ->
>>>>>>>>>>>>>> v
>>>>>>>>>>>>>>> %
>>>>>>>>>>>>>>>> 2
>>>>>>>>>>>>>>>>>> ==
>>>>>>>>>>>>>>>>>>>>> 0),
>>>>>>>>>>>>>>>>>>>>>                 Predicate.named("STREAM-IMPAIR-VALUE",
>>>>>>>>> (k, v)
>>>>>>>>>>>>>> ->
>>>>>>>>>>>>>>> v
>>>>>>>>>>>>>>>> %
>>>>>>>>>>>>>>>>>> 2
>>>>>>>>>>>>>>>>>>> !=
>>>>>>>>>>>>>>>>>>>>> 0));
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>         branches[0].to("pair");
>>>>>>>>>>>>>>>>>>>>>         branches[1].to("impair");
>>>>>>>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> This is a mix of the options 3) and 1)
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> Le ven. 6 juil. 2018 à 22:58, Guozhang Wang <
>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>> écrit :
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Hi folks, just to summarize the options we have so
>> far:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 1) Add a new "as" for KTable / KStream, plus adding
>> new
>>>>>>>>>> fields
>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>> operators-returns-void control objects (the current
>>>>>>>>> wiki's
>>>>>>>>>>>>>>>>>> proposal).
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Pros: no more overloads.
>>>>>>>>>>>>>>>>>>>>>> Cons: a bit departing with the current high-level API
>>>>>>>>> design
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>> DSL,
>>>>>>>>>>>>>>>>>>>>>> plus, the inconsistency between operators-returns-void
>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> operators-not-return-voids.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 2) Add overloaded functions for all operators, that
>>>>>>>>> accepts
>>>>>>>>>> a
>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>> control
>>>>>>>>>>>>>>>>>>>>>> object "Described".
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Pros: consistent with current APIs.
>>>>>>>>>>>>>>>>>>>>>> Cons: lots of overloaded functions to add.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> 3) Add another default function in the interface
>>>>>>>>> (thank you
>>>>>>>>>>>>>> J8!)
>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>> John
>>>>>>>>>>>>>>>>>>>>>> proposed.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Pros: no overloaded functions, no "Described".
>>>>>>>>>>>>>>>>>>>>>> Cons: do we lose lambda functions really (seems not if
>>>>>>>>> we
>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> "named"
>>>>>>>>>>>>>>>>>>>>>> for each func)? Plus "Described" may be more
>> extensible
>>>>>>>>>> than a
>>>>>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>>>>>>> `String`.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> My principle of considering which one is better
>> depends
>>>>>>>>>>>>>>> primarily
>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> "how
>>>>>>>>>>>>>>>>>>>>>> to make advanced users easily use the additional API,
>>>>>>>>> while
>>>>>>>>>>>>>>>> keeping
>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>> hidden from normal users who do not care at all". For
>>>>>>>>> that
>>>>>>>>>>>>>>>> purpose I
>>>>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>> 3) > 1) > 2).
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> One caveat though, is that changing the interface
>>>>>>>>> would not
>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>> binary-compatible though source-compatible, right?
>> I.e.
>>>>>>>>>> users
>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>> recompile their code though no changes needed.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Another note: for 3), if we really want to keep
>>>>>>>>>> extensibility
>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> Described
>>>>>>>>>>>>>>>>>>>>>> we could do sth. like:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ---------
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> public interface Predicate<K, V> {
>>>>>>>>>>>>>>>>>>>>>>     // existing method
>>>>>>>>>>>>>>>>>>>>>>     boolean test(final K key, final V value);
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>     // new default method adds the ability to name the
>>>>>>>>>>>>>> predicate
>>>>>>>>>>>>>>>>>>>>>>     default Described described() {
>>>>>>>>>>>>>>>>>>>>>>         return new Described(null);
>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> ----------
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> where user's code becomes:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> stream.filter(named("key", (k, v) -> true));   // note
>>>>>>>>>> `named`
>>>>>>>>>>>>>>> now
>>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>> sets a Described("key") in "described()".
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> stream.filter(described(Described.as("key", /* any
>>>>>>>>> other
>>>>>>>>>> fancy
>>>>>>>>>>>>>>>>>>>>> parameters
>>>>>>>>>>>>>>>>>>>>>> in the future*/), (k, v) -> true));
>>>>>>>>>>>>>>>>>>>>>> ----------
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> I feel it is not much likely that we'd need to extend
>>>>>>>>> it
>>>>>>>>>>>>>> further
>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> future, so just a `String` would be good enough. But
>>>>>>>>> just
>>>>>>>>>>>>>>> listing
>>>>>>>>>>>>>>>>>> all
>>>>>>>>>>>>>>>>>>>>>> possibilities here.
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> On Fri, Jul 6, 2018 at 8:19 AM, John Roesler <
>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Hi Florian,
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Sorry I'm late to the party, but I missed the message
>>>>>>>>>>>>>>> originally.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Regarding the names, it's probably a good idea to
>>>>>>>>> stick to
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>> character set we're currently using: letters,
>>>>>>>>> numbers, and
>>>>>>>>>>>>>>>> hyphens.
>>>>>>>>>>>>>>>>>>> The
>>>>>>>>>>>>>>>>>>>>>>> names are used in Kafka topics, files and folders,
>> and
>>>>>>>>>>>>>> RocksDB
>>>>>>>>>>>>>>>>>>>>> databases,
>>>>>>>>>>>>>>>>>>>>>>> and we also need them to work with the file systems
>> of
>>>>>>>>>>>>>> Windows,
>>>>>>>>>>>>>>>>>> Linux,
>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>> MacOS. My opinion is that with a situation like that,
>>>>>>>>> it's
>>>>>>>>>>>>>>> better
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>> conservative. It might also be a good idea to impose
>>>>>>>>> an
>>>>>>>>>> upper
>>>>>>>>>>>>>>>>>> limit on
>>>>>>>>>>>>>>>>>>>>>> name
>>>>>>>>>>>>>>>>>>>>>>> length to avoid running afoul of any of those
>> systems.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> ---
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> It seems like there's a small debate between 1)
>>>>>>>>> adding a
>>>>>>>>>> new
>>>>>>>>>>>>>>>>>> method to
>>>>>>>>>>>>>>>>>>>>>>> KStream (and maybe KTable) to modify its name after
>>>>>>>>> the
>>>>>>>>>> fact,
>>>>>>>>>>>>>>> or
>>>>>>>>>>>>>>>> 2)
>>>>>>>>>>>>>>>>>>>>>>> piggy-backing on the config objects where they exist
>>>>>>>>> and
>>>>>>>>>>>>>> adding
>>>>>>>>>>>>>>>> one
>>>>>>>>>>>>>>>>>>>>> where
>>>>>>>>>>>>>>>>>>>>>>> they don't. To me, #2 is the better alternative even
>>>>>>>>> though
>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>> produces
>>>>>>>>>>>>>>>>>>>>>>> more overloads and may be a bit awkward in places.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> The reason is simply that #1 is a high-level
>>>>>>>>> departure from
>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>> graph-building paradigm we're using in the DSL.
>>>>>>>>> Consider:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Graph.node1(config).node2(config)
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> vs
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Graph.node1().config().node2().config()
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> We could have done either, but we picked the former.
>> I
>>>>>>>>>> think
>>>>>>>>>>>>>>> it's
>>>>>>>>>>>>>>>>>>>>>> probably
>>>>>>>>>>>>>>>>>>>>>>> a good goal to try and stick to it so that developers
>>>>>>>>> can
>>>>>>>>>>>>>>> develop
>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>> rely
>>>>>>>>>>>>>>>>>>>>>>> on their instincts for how the DSL will behave.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> I do want to present one alternative to adding new
>>>>>>>>> config
>>>>>>>>>>>>>>>> objects:
>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>>>> just add a "name()" method to all our "action"
>>>>>>>>> interfaces.
>>>>>>>>>>>>>> For
>>>>>>>>>>>>>>>>>>> example,
>>>>>>>>>>>>>>>>>>>>>>> I'll demonstrate how we can add a "name" to Predicate
>>>>>>>>> and
>>>>>>>>>>>>>> then
>>>>>>>>>>>>>>>> use
>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> name a "KStream#filter" DSL operator:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> public interface Predicate<K, V> {
>>>>>>>>>>>>>>>>>>>>>>>     // existing method
>>>>>>>>>>>>>>>>>>>>>>>     boolean test(final K key, final V value);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     // new default method adds the ability to name
>> the
>>>>>>>>>>>>>>> predicate
>>>>>>>>>>>>>>>>>>>>>>>     default String name() {
>>>>>>>>>>>>>>>>>>>>>>>         return null;
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     // new static factory method adds the ability to
>>>>>>>>> wrap
>>>>>>>>>>>>>>> lambda
>>>>>>>>>>>>>>>>>>>>>> predicates
>>>>>>>>>>>>>>>>>>>>>>> with a named predicate
>>>>>>>>>>>>>>>>>>>>>>>     static <K, V> Predicate<K, V> named(final String
>>>>>>>>> name,
>>>>>>>>>>>>>>> final
>>>>>>>>>>>>>>>>>>>>>>> Predicate<K, V> predicate) {
>>>>>>>>>>>>>>>>>>>>>>>         return new Predicate<K, V>() {
>>>>>>>>>>>>>>>>>>>>>>>             @Override
>>>>>>>>>>>>>>>>>>>>>>>             public boolean test(final K key, final V
>>>>>>>>>> value) {
>>>>>>>>>>>>>>>>>>>>>>>                 return predicate.test(key, value);
>>>>>>>>>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>             @Override
>>>>>>>>>>>>>>>>>>>>>>>             public String name() {
>>>>>>>>>>>>>>>>>>>>>>>                 return name;
>>>>>>>>>>>>>>>>>>>>>>>             }
>>>>>>>>>>>>>>>>>>>>>>>         };
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>> }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Then, here's how it would look to use it:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> // Anonymous predicates continue to work just fine
>>>>>>>>>>>>>>>>>>>>>>> stream.filter((k, v) -> true);
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> // Devs can swap in a Predicate that implements the
>>>>>>>>> name()
>>>>>>>>>>>>>>>> method.
>>>>>>>>>>>>>>>>>>>>>>> stream.filter(new Predicate<Object, Object>() {
>>>>>>>>>>>>>>>>>>>>>>>     @Override
>>>>>>>>>>>>>>>>>>>>>>>     public boolean test(final Object key, final
>> Object
>>>>>>>>>>>>>> value) {
>>>>>>>>>>>>>>>>>>>>>>>         return true;
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>     @Override
>>>>>>>>>>>>>>>>>>>>>>>     public String name() {
>>>>>>>>>>>>>>>>>>>>>>>         return "hey";
>>>>>>>>>>>>>>>>>>>>>>>     }
>>>>>>>>>>>>>>>>>>>>>>> });
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> // Or they can wrap their existing lambda using the
>>>>>>>>> static
>>>>>>>>>>>>>>>> factory
>>>>>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>> stream.filter(named("key", (k, v) -> true));
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Just a thought.
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Overall, I think it's really valuable to be able to
>>>>>>>>> name
>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> processors,
>>>>>>>>>>>>>>>>>>>>>>> for all the reasons you mentioned in the KIP. So
>>>>>>>>> thank you
>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>> introducing
>>>>>>>>>>>>>>>>>>>>>>> this!
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>> -John
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> On Thu, Jul 5, 2018 at 4:53 PM Florian Hussonnois <
>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Hi, thank you very much for all you suggestions.
>> I've
>>>>>>>>>>>>>> started
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> update
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>> KIP (
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>> 307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL
>>>>>>>>>>>>>>>>>>>>>>>> ).
>>>>>>>>>>>>>>>>>>>>>>>> Also, I propose to rename the Processed class into
>>>>>>>>>>>>>> Described -
>>>>>>>>>>>>>>>>>> this
>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>> more meaningful (but this is just a detail).
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> I'm OK to not enforcing uppercase for specific names
>>>>>>>>> but
>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>>>>> arbitrary names with whitespaces for example ?
>>>>>>>>> Currently,
>>>>>>>>>> I
>>>>>>>>>>>>>>>> can't
>>>>>>>>>>>>>>>>>>>>> tell
>>>>>>>>>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>>>>>>>>>> this can lead to some side effects ?
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> Le lun. 11 juin 2018 à 01:31, Matthias J. Sax <
>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> écrit :
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> Just catching up on this thread.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> I like the general idea. Couple of comments:
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  - I think that adding `Processed` (or maybe a
>>>>>>>>> different
>>>>>>>>>>>>>>> name?)
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>>> valid proposal for stateless operators that only
>>>>>>>>> have a
>>>>>>>>>>>>>>> single
>>>>>>>>>>>>>>>>>>>>>> overload
>>>>>>>>>>>>>>>>>>>>>>>>> atm. It would align with the overall API design.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  - for all methods with multiple existing
>>>>>>>>> overloads, we
>>>>>>>>>> can
>>>>>>>>>>>>>>>>>>>>> consider
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> extend `Consumed`, `Produced`, `Materialized` etc
>>>>>>>>> to take
>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>> additional
>>>>>>>>>>>>>>>>>>>>>>>>> processor name (not sure atm how elegant this is;
>> we
>>>>>>>>>> would
>>>>>>>>>>>>>>> need
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> "play" with the API a little bit; the advantage
>>>>>>>>> would be,
>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>> add more overloads what seems to be key for this
>>>>>>>>> KIP)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  - operators return void: while I agree that the
>>>>>>>>> "name
>>>>>>>>>>>>>> first"
>>>>>>>>>>>>>>>>>>>>>> chaining
>>>>>>>>>>>>>>>>>>>>>>>>> idea is not very intuitive, it might still work, if
>>>>>>>>> we
>>>>>>>>>> name
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>>> correctly (again, we would need to "play" with the
>>>>>>>>> API a
>>>>>>>>>>>>>>> little
>>>>>>>>>>>>>>>>>> bit
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> see)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  - for DSL operators that are translated to
>> multiple
>>>>>>>>>> nodes:
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>>>>>>>> make sense to use the specified operator name as
>>>>>>>>> prefix
>>>>>>>>>> and
>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>>>>>>>>> reasonable suffixes. For example, a join translates
>>>>>>>>> into
>>>>>>>>>> 5
>>>>>>>>>>>>>>>>>>>>> operators
>>>>>>>>>>>>>>>>>>>>>>>>> that could be name "name-left-store-processor",
>>>>>>>>>>>>>>>>>>>>>>>>> "name-left-join-processor",
>>>>>>>>> "name-right-store-processor",
>>>>>>>>>>>>>>>>>>>>>>>>> "name-right-join-processor", and
>>>>>>>>>>>>>> "name-join-merge-processor"
>>>>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>>>>>>>>>>> similar). Maybe just using numbers might also work.
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  - I think, we should strip the number suffixes if
>>>>>>>>> a user
>>>>>>>>>>>>>>>>>> provides
>>>>>>>>>>>>>>>>>>>>>>> names
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  - enforcing upper case seems to be tricky: for
>>>>>>>>> example,
>>>>>>>>>> we
>>>>>>>>>>>>>>> do
>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>> enforce upper case for store names and we cannot
>>>>>>>>> easily
>>>>>>>>>>>>>>> change
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>>>> would break compatibility -- thus, for consistency
>>>>>>>>>> reasons
>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>> might
>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>> want to do this
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>  - for better understand of the impact of the KIP,
>>>>>>>>> it
>>>>>>>>>> would
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>>>>>>>> helpful if you would list all method names that are
>>>>>>>>>>>>>> affected
>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>>>>>>>>> (ie, list all newly added overloads)
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> -Matthias
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>> On 5/31/18 6:40 PM, Guozhang Wang wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Florian,
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Re 1: I think changing the KStreamImpl /
>>>>>>>>> KTableImpl to
>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>>>> modifying
>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> processor name after the operator is fine as long
>>>>>>>>> as we
>>>>>>>>>> do
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>> check
>>>>>>>>>>>>>>>>>>>>>>>>> again
>>>>>>>>>>>>>>>>>>>>>>>>>> when modifying that. In fact, we are having some
>>>>>>>>>> topology
>>>>>>>>>>>>>>>>>>>>>>> optimization
>>>>>>>>>>>>>>>>>>>>>>>>>> going on which may modify processor names in the
>>>>>>>>> final
>>>>>>>>>>>>>>>> topology
>>>>>>>>>>>>>>>>>>>>>>>> anyways (
>>>>>>>>>>>>>>>>>>>>>>>>>> https://github.com/apache/kafka/pull/4983).
>>>>>>>>>> Semantically
>>>>>>>>>>>>>> I
>>>>>>>>>>>>>>>>>> think
>>>>>>>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>>>>> easier to understand to developers than "deciding
>>>>>>>>> the
>>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>>>>>> name
>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>> the next operator".
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Re 2: Yeah I'm thinking that for operators that
>>>>>>>>>> translates
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>>>>>>>>>>> processor names, we can still use the provided
>>>>>>>>> "hint" to
>>>>>>>>>>>>>>> name
>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>>>>>>>>>> names, e.g. for Joins we can name them as
>>>>>>>>>> `join-foo-this`
>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>> `join-foo-that` etc if user calls `as("foo")`.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Re 3: The motivation I had about removing the
>>>>>>>>> suffix is
>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>>>>>>>> has
>>>>>>>>>>>>>>>>>>>>>>>> huge
>>>>>>>>>>>>>>>>>>>>>>>>>> restrictions on topology compatibilities: consider
>>>>>>>>> if
>>>>>>>>>> user
>>>>>>>>>>>>>>>> code
>>>>>>>>>>>>>>>>>>>>>>> added a
>>>>>>>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>>> operator, or library does some optimization to
>>>>>>>>> remove
>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>> operators,
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>> suffix indexing may be changed for a large amount
>>>>>>>>> of the
>>>>>>>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>>>>>>>> names:
>>>>>>>>>>>>>>>>>>>>>>>>>> this will in turn change the internal state store
>>>>>>>>> names,
>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>> well
>>>>>>>>>>>>>>>>>>>>> as
>>>>>>>>>>>>>>>>>>>>>>>>>> internal topic names as well, making the new
>>>>>>>>> application
>>>>>>>>>>>>>>>>>> topology
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>>> incompatible with the ones. One rationale I had
>>>>>>>>> about
>>>>>>>>>> this
>>>>>>>>>>>>>>> KIP
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>> aligned this effort, moving forward we can allow
>>>>>>>>> users
>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> customize
>>>>>>>>>>>>>>>>>>>>>>>>>> internal names so that they can still be reused
>>>>>>>>> even
>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>> topology
>>>>>>>>>>>>>>>>>>>>>>>>> changes
>>>>>>>>>>>>>>>>>>>>>>>>>> (e.g. KIP-230), so I think removing the suffix
>>>>>>>>> index
>>>>>>>>>> would
>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>> more
>>>>>>>>>>>>>>>>>>>>>>>>>> applicable in the long run.
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>> On Thu, May 31, 2018 at 3:08 PM, Florian
>>>>>>>>> Hussonnois <
>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi ,
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thank you very much for your feedback.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> 1/
>>>>>>>>>>>>>>>>>>>>>>>>>>> I agree that overloading most of the methods with
>>>>>>>>> a
>>>>>>>>>>>>>>> Processed
>>>>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>> ideal.
>>>>>>>>>>>>>>>>>>>>>>>>>>> I've started modifying the KStream API and I got
>>>>>>>>> to the
>>>>>>>>>>>>>>> same
>>>>>>>>>>>>>>>>>>>>>>>> conclusion.
>>>>>>>>>>>>>>>>>>>>>>>>>>> Also ading a new method directly to KStreamImpl
>>>>>>>>> and
>>>>>>>>>>>>>>>> KTableImpl
>>>>>>>>>>>>>>>>>>>>>>> classes
>>>>>>>>>>>>>>>>>>>>>>>>>>> seems to be a better option.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> However a processor name cannot be redefined
>> after
>>>>>>>>>>>>>> calling
>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>> operator
>>>>>>>>>>>>>>>>>>>>>>>>> (or
>>>>>>>>>>>>>>>>>>>>>>>>>>> maybe I miss something in the code).
>>>>>>>>>>>>>>>>>>>>>>>>>>> From my understanding, this will only set the
>>>>>>>>> KStream
>>>>>>>>>>>>>> name
>>>>>>>>>>>>>>>>>>>>>> property
>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> processor name previsouly added to the topology
>>>>>>>>>> builder -
>>>>>>>>>>>>>>>>>>>>> leading
>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>> InvalidTopology exception.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> So the new method should actually defines the
>>>>>>>>> name of
>>>>>>>>>> the
>>>>>>>>>>>>>>>> next
>>>>>>>>>>>>>>>>>>>>>>>>> processor :
>>>>>>>>>>>>>>>>>>>>>>>>>>> Below is an example :
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> *stream.as <http://stream.as
>>>>>>>>>>>>>>>>>>>>>>> (Processed.name("MAPPE_TO_UPPERCASE")*
>>>>>>>>>>>>>>>>>>>>>>>>>>> *          .map( (k, v) -> KeyValue.pair(k,
>>>>>>>>>>>>>>>> v.toUpperCase()))*
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I think this approach could solve the cases for
>>>>>>>>> methods
>>>>>>>>>>>>>>>>>>>>> returning
>>>>>>>>>>>>>>>>>>>>>>>> void ?
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Regarding this new method we have two possible
>>>>>>>>>>>>>>>> implementations
>>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>    1. Adding a method like : withName(String
>>>>>>>>>>>>>> processorName)
>>>>>>>>>>>>>>>>>>>>>>>>>>>    2. or adding a method accepting an Processed
>>>>>>>>> object
>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>> as(Processed).
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> I think solution 2. is preferable as the
>> Processed
>>>>>>>>>> class
>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>> be
>>>>>>>>>>>>>>>>>>>>>>>>> enriched
>>>>>>>>>>>>>>>>>>>>>>>>>>> further (in futur).
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> 2/
>>>>>>>>>>>>>>>>>>>>>>>>>>> As Guozhang said some operators add internal
>>>>>>>>>> processors.
>>>>>>>>>>>>>>>>>>>>>>>>>>> For example the branch() method create one
>>>>>>>>>> KStreamBranch
>>>>>>>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>> route
>>>>>>>>>>>>>>>>>>>>>>>>>>> records and one KStreamPassThrough processor for
>>>>>>>>> each
>>>>>>>>>>>>>>> branch.
>>>>>>>>>>>>>>>>>>>>>>>>>>> In that situation only the parent processor can
>> be
>>>>>>>>>> named.
>>>>>>>>>>>>>>> For
>>>>>>>>>>>>>>>>>>>>>>> children
>>>>>>>>>>>>>>>>>>>>>>>>>>> processors we could keep the current behaviour
>>>>>>>>> that
>>>>>>>>>> add a
>>>>>>>>>>>>>>>>>> suffix
>>>>>>>>>>>>>>>>>>>>>>> (i.e
>>>>>>>>>>>>>>>>>>>>>>>>>>> KSTREAM-BRANCHCHILD-)
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> This also the case for the join() method that
>>>>>>>>> result to
>>>>>>>>>>>>>>>> adding
>>>>>>>>>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>>>>>>>>>>>> processors to the topology (windowing, left/right
>>>>>>>>> joins
>>>>>>>>>>>>>>> and a
>>>>>>>>>>>>>>>>>>>>>> merge
>>>>>>>>>>>>>>>>>>>>>>>>>>> processor).
>>>>>>>>>>>>>>>>>>>>>>>>>>> I think, like for the branch method users could
>>>>>>>>> only
>>>>>>>>>>>>>>> define a
>>>>>>>>>>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>>>>>>>>>>> name prefix.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> 3/
>>>>>>>>>>>>>>>>>>>>>>>>>>> I think we should  still added a suffix like
>>>>>>>>>>>>>> "-0000000000"
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>> processor
>>>>>>>>>>>>>>>>>>>>>>>>>>> name and enforce uppercases as this will keep
>> some
>>>>>>>>>>>>>>>> consistency
>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> ones generated by the API.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> 4/
>>>>>>>>>>>>>>>>>>>>>>>>>>> Yes, the KTable interface should be modified like
>>>>>>>>>> KStream
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>> allow
>>>>>>>>>>>>>>>>>>>>>>>>> custom
>>>>>>>>>>>>>>>>>>>>>>>>>>> processor names definition.
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> Le jeu. 31 mai 2018 à 19:18, Damian Guy <
>>>>>>>>>>>>>>>> [email protected]>
>>>>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> écrit
>>>>>>>>>>>>>>>>>>>>>>>>> :
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Florian,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP. What about KTable and other
>>>>>>>>> DSL
>>>>>>>>>>>>>>>>>> interfaces?
>>>>>>>>>>>>>>>>>>>>>>> Will
>>>>>>>>>>>>>>>>>>>>>>>>> they
>>>>>>>>>>>>>>>>>>>>>>>>>>>> not want to be able to do the same thing?
>>>>>>>>>>>>>>>>>>>>>>>>>>>> It would be good to see a complete set of the
>>>>>>>>> public
>>>>>>>>>> API
>>>>>>>>>>>>>>>>>>>>> changes.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Cheers,
>>>>>>>>>>>>>>>>>>>>>>>>>>>> Damian
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Wed, 30 May 2018 at 19:45 Guozhang Wang <
>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hello Florian,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP. I have some meta feedbacks
>>>>>>>>> on the
>>>>>>>>>>>>>>>>>>>>> proposal:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 1. You mentioned that this `Processed` object
>>>>>>>>> will be
>>>>>>>>>>>>>>> added
>>>>>>>>>>>>>>>>>>>>> to a
>>>>>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> overloaded variant of all the stateless
>>>>>>>>> operators,
>>>>>>>>>> what
>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>> stateful
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> operators? Would like to hear your opinions if
>>>>>>>>> you
>>>>>>>>>> have
>>>>>>>>>>>>>>>>>>>>> thought
>>>>>>>>>>>>>>>>>>>>>>>> about
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> note for stateful operators they will usually
>> be
>>>>>>>>>> mapped
>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> multiple
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processor node names, so we probably need to
>>>>>>>>> come up
>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>>> ways
>>>>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> define all their names.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 2. I share the same concern with Bill as for
>>>>>>>>> adding
>>>>>>>>>>>>>> lots
>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>>>> overload
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> functions into the stateless operators, as we
>>>>>>>>> have
>>>>>>>>>> just
>>>>>>>>>>>>>>>> spent
>>>>>>>>>>>>>>>>>>>>>>> quite
>>>>>>>>>>>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> effort in trimming them since 1.0.0 release. If
>>>>>>>>> the
>>>>>>>>>>>>>> goal
>>>>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>>>>>>>> provide
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> some "hints" on the generated processor node
>>>>>>>>> names,
>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>> strictly
>>>>>>>>>>>>>>>>>>>>>>>>>>>> enforcing
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the exact names that to be generated, then how
>>>>>>>>> about
>>>>>>>>>> we
>>>>>>>>>>>>>>>> just
>>>>>>>>>>>>>>>>>>>>>> add a
>>>>>>>>>>>>>>>>>>>>>>>> new
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> function to `KStream` and `KTable` classes
>> like:
>>>>>>>>>>>>>>>>>>>>>> "as(Processed)",
>>>>>>>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> semantics as "the latest operators that
>>>>>>>>> generate this
>>>>>>>>>>>>>>>> KStream
>>>>>>>>>>>>>>>>>>>>> /
>>>>>>>>>>>>>>>>>>>>>>>> KTable
>>>>>>>>>>>>>>>>>>>>>>>>>>>> will
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> be named accordingly to this hint".
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> The only caveat, is that for all operators like
>>>>>>>>>>>>>>>> `KStream#to`
>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> `KStream#print` that returns void, this
>>>>>>>>> alternative
>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>> work.
>>>>>>>>>>>>>>>>>>>>>>>>> But
>>>>>>>>>>>>>>>>>>>>>>>>>>>> for
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the current operators:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> a. KStream#print,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> b. KStream#foreach,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> c. KStream#to,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> d. KStream#process
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I personally felt that except `KStream#process`
>>>>>>>>> users
>>>>>>>>>>>>>>> would
>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>> usually
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> bother to override their names, and for
>>>>>>>>>>>>>> `KStream#process`
>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>> could
>>>>>>>>>>>>>>>>>>>>>>>> add
>>>>>>>>>>>>>>>>>>>>>>>>>>> an
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> overload variant with the additional Processed
>>>>>>>>>> object.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3. In your example, the processor names are
>>>>>>>>> still
>>>>>>>>>> added
>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>>>>> suffix
>>>>>>>>>>>>>>>>>>>>>>>>>>>> like
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> "
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -0000000000", is this intentional? If yes, why
>>>>>>>>> (I
>>>>>>>>>>>>>> thought
>>>>>>>>>>>>>>>>>> with
>>>>>>>>>>>>>>>>>>>>>>> user
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> specified processor name hints we will not add
>>>>>>>>> suffix
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> distinguish
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> different nodes of the same type any more)?
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Tue, May 29, 2018 at 6:47 AM, Bill Bejeck <
>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi Florian,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks for the KIP.  I think being able to add
>>>>>>>>> more
>>>>>>>>>>>>>>>> context
>>>>>>>>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> processor names would be useful.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I like the idea of adding a
>>>>>>>>> "withProcessorName" to
>>>>>>>>>>>>>>>> Produced,
>>>>>>>>>>>>>>>>>>>>>>>> Consumed
>>>>>>>>>>>>>>>>>>>>>>>>>>>> and
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Joined.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> But instead of adding the "Processed"
>>>>>>>>> parameter to a
>>>>>>>>>>>>>>> large
>>>>>>>>>>>>>>>>>>>>>>>> percentage
>>>>>>>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> the methods, which would result in overloaded
>>>>>>>>>> methods
>>>>>>>>>>>>>>>> (which
>>>>>>>>>>>>>>>>>>>>> we
>>>>>>>>>>>>>>>>>>>>>>>>>>> removed
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> quite a bit with KIP-182) what do you think of
>>>>>>>>>> adding
>>>>>>>>>>>>>> a
>>>>>>>>>>>>>>>>>>>>> method
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> to the AbstractStream class "withName(String
>>>>>>>>>>>>>>>>>> processorName)"?
>>>>>>>>>>>>>>>>>>>>>> BTW
>>>>>>>>>>>>>>>>>>>>>>>> I"m
>>>>>>>>>>>>>>>>>>>>>>>>>>>> not
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> married to the method name, it's the best I
>>>>>>>>> can do
>>>>>>>>>> off
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>> top
>>>>>>>>>>>>>>>>>>>>>> of
>>>>>>>>>>>>>>>>>>>>>>>> my
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> head.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> For the methods that return void, we'd have to
>>>>>>>>> add a
>>>>>>>>>>>>>>>>>>>>> parameter,
>>>>>>>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>>>>>>>>>>>>> that
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> would at least cut down on the number of
>>>>>>>>> overloaded
>>>>>>>>>>>>>>>> methods
>>>>>>>>>>>>>>>>>>>>> in
>>>>>>>>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>>>>>>>>>>>>> API.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Just my 2 cents.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Bill
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> On Sun, May 27, 2018 at 4:13 PM, Florian
>>>>>>>>> Hussonnois
>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> [email protected]
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I would like to start a new discussion on
>>>>>>>>> following
>>>>>>>>>>>>>>> KIP :
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>> 307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> This is still a draft.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Looking forward for your feedback.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Florian HUSSONNOIS
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>>>>> Florian HUSSONNOIS
>>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>>>> Florian HUSSONNOIS
>>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>>>>> Florian HUSSONNOIS
>>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>>> Florian HUSSONNOIS
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> --
>>>>>>>>>>>>>> -- Guozhang
>>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Florian HUSSONNOIS
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> -- Guozhang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> -- Guozhang
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> -- Guozhang
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-307: Allow to define custom processor names with KStreams DSL

Reply via email to