Hi again, all,

I have started the voting thread. Please cast your votes (or voice
your objections)! The vote will remain open at least 72 hours. Once it
closes, I can send the PR pretty quickly.

Thanks for all you help ironing out the details on this feature.
-John

On Mon, Jul 15, 2019 at 5:09 PM John Roesler <j...@confluent.io> wrote:
>
> Hey all,
>
> It sounds like there's general agreement now on this KIP, so I updated
> the KIP to fit in with Guozhang's overall proposed package structure.
> Specifically, the proposed name for the new Processor interface is
> "org.apache.kafka.streams.processor.api.Processor".
>
> If there are no objections, then I plan to start the vote tomorrow!
>
> Thanks, all, for your contributions.
> -John
>
> On Thu, Jul 11, 2019 at 1:50 PM Matthias J. Sax <matth...@confluent.io> wrote:
> >
> > Side remark:
> >
> > > Now that "flat transform" is a specific
> > >> part of the API it seems okay to steer folks in that direction (to never
> > >> use context.process in a transformer), but it should be called out
> > >> explicitly in javadocs.  Currently Transformer (which is used for both
> > >> transform() and flatTransform() ) doesn't really call out the ambiguity:
> >
> > Would you want to do a PR for address this? We are always eager to
> > improve the JavaDocs!
> >
> >
> > -Matthias
> >
> > On 7/7/19 11:26 AM, Paul Whalen wrote:
> > > First of all, +1 on the whole idea, my team has run into (admittedly 
> > > minor,
> > > but definitely annoying) issues because of the weaker typing.  We're heavy
> > > users of the PAPI and have Processors that, while not hundreds of lines
> > > long, are certainly quite hefty and call context.forward() in many places.
> > >
> > > After reading the KIP and discussion a few times, I've convinced myself
> > > that any initial concerns I had aren't really concerns at all (state store
> > > types, for one).  One thing I will mention:  changing *Transformer* to 
> > > have
> > > ProcessorContext<Void, Void> gave me pause, because I have code that does
> > > context.forward in transformers.  Now that "flat transform" is a specific
> > > part of the API it seems okay to steer folks in that direction (to never
> > > use context.process in a transformer), but it should be called out
> > > explicitly in javadocs.  Currently Transformer (which is used for both
> > > transform() and flatTransform() ) doesn't really call out the ambiguity:
> > > https://github.com/apache/kafka/blob/ca641b3e2e48c14ff308181c775775408f5f35f7/streams/src/main/java/org/apache/kafka/streams/kstream/Transformer.java#L75-L77,
> > > and for migrating users (from before flatTransform) it could be confusing.
> > >
> > > Side note, I'd like to plug KIP-401 (there is a discussion thread and a
> > > voting thread) which also relates to using the PAPI.  It seems like there
> > > is some interest and it is in a votable state with the majority of
> > > implementation complete.
> > >
> > > Paul
> > >
> > > On Fri, Jun 28, 2019 at 2:02 PM Bill Bejeck <bbej...@gmail.com> wrote:
> > >
> > >> Sorry for coming late to the party.
> > >>
> > >> As for the naming I'm in favor of RecordProcessor as well.
> > >>
> > >> I agree that we should not take on doing all of the package movements as
> > >> part of this KIP, especially as John has pointed out, it will be an
> > >> opportunity to discuss some clean-up on individual classes which I 
> > >> envision
> > >> becoming another somewhat involved process.
> > >>
> > >> For the end goal, if possible, here's what I propose.
> > >>
> > >>    1. We keep the scope of the KIP the same, *but we only implement* *it 
> > >> in
> > >>    phases*
> > >>    2. Phase one could include what Guozhang had proposed earlier namely
> > >>    1. > 1.a) modifying ProcessorContext only with the output types on
> > >>       forward.
> > >>       > 1.b) modifying Transformer signature to have generics of
> > >>       ProcessorContext,
> > >>       > and then lift the restricting of not using punctuate: if user did
> > >>       not
> > >>       > follow the enforced typing and just code without generics, they
> > >>       will get
> > >>       > warning at compile time and get run-time error if they forward
> > >>       wrong-typed
> > >>       > records, which I think would be acceptable.
> > >>    3. Then we could tackle other pieces in an incremental manner as we 
> > >> see
> > >>    what makes sense
> > >>
> > >> Just my 2cents
> > >>
> > >> -Bill
> > >>
> > >> On Mon, Jun 24, 2019 at 10:22 PM Guozhang Wang <wangg...@gmail.com> 
> > >> wrote:
> > >>
> > >>> Hi John,
> > >>>
> > >>> Yeah I think we should not do all the repackaging as part of this KIP as
> > >>> well (we can just do the movement of the Processor / ProcessorSupplier),
> > >>> but I think we need to discuss the end goal here since otherwise we may
> > >> do
> > >>> the repackaging of Processor in this KIP, but only later on realizing
> > >> that
> > >>> other re-packagings are not our favorite solutions.
> > >>>
> > >>>
> > >>> Guozhang
> > >>>
> > >>> On Mon, Jun 24, 2019 at 3:06 PM John Roesler <j...@confluent.io> wrote:
> > >>>
> > >>>> Hey Guozhang,
> > >>>>
> > >>>> Thanks for the idea! I'm wondering if we could take a middle ground
> > >>>> and take your proposed layout as a "roadmap", while only actually
> > >>>> moving the classes that are already involved in this KIP.
> > >>>>
> > >>>> The reason I ask is not just to control the scope of this KIP, but
> > >>>> also, I think that if we move other classes to new packages, we might
> > >>>> also want to take the opportunity to clean up other things about them.
> > >>>> But each one of those would become a discussion point of its own, so
> > >>>> it seems the discussion would become intractable. FWIW, I do like your
> > >>>> idea for precisely this reason, it creates opportunities for us to
> > >>>> consider other changes that we are simply not able to make without
> > >>>> breaking source compatibility.
> > >>>>
> > >>>> If the others feel "kind of favorable" with this overall vision, maybe
> > >>>> we can make one or more Jira tickets to capture it, and then just
> > >>>> alter _this_ proposal to `processor.api.Processor` (etc).
> > >>>>
> > >>>> WDYT?
> > >>>> -John
> > >>>>
> > >>>> On Sun, Jun 23, 2019 at 7:17 PM Guozhang Wang <wangg...@gmail.com>
> > >>> wrote:
> > >>>>>
> > >>>>> Hello John,
> > >>>>>
> > >>>>> Thanks for your detailed explanation, I've done some quick checks on
> > >>> some
> > >>>>> existing examples that heavily used Processor and the results also
> > >>> makes
> > >>>> me
> > >>>>> worried about my previous statements that "the breakage would not be
> > >>>> big".
> > >>>>> I agree we should maintain compatibility.
> > >>>>>
> > >>>>> About the naming itself, I'm actually a bit inclined into
> > >> sub-packages
> > >>>> than
> > >>>>> renamed new classes, and my motivations are that our current
> > >> packaging
> > >>> is
> > >>>>> already quite coarsen grained and sometimes ill-placed, and hence
> > >> maybe
> > >>>> we
> > >>>>> can take this change along with some clean up on packages (but again,
> > >>> we
> > >>>>> should follow the deprecate - removal path). What I'm thinking is:
> > >>>>>
> > >>>>> -------------------
> > >>>>>
> > >>>>> processor/: StateRestoreCallback/AbstractNotifyingRestoreCallback,
> > >>>> (deprecated
> > >>>>> later, same meaning for other cross-throughs), ProcessContest,
> > >>>>> RecordContext, Punctuator, PunctuationType, To, Cancellable (are the
> > >>> only
> > >>>>> things left)
> > >>>>>
> > >>>>> (new) processor/api/: Processor, ProcessorSupplier (and of course,
> > >>> these
> > >>>>> two classes can be strong typed)
> > >>>>>
> > >>>>> state/: StateStore, BatchingStateRestoreCallback,
> > >>>>> AbstractNotifyingBatchingRestoreCallback (moved from processor/),
> > >>>>> PartitionGrouper, WindowStoreIterator, StateSerdes (this one can be
> > >>> moved
> > >>>>> into state/internals), TimestampedByteStore (we can move this to
> > >>>> internals
> > >>>>> since store types would use vat by default, see below),
> > >>> ValueAndTimestamp
> > >>>>>
> > >>>>> (new) state/factory/: Stores, StoreBuilder, StoreSupplier; *BUT* the
> > >>> new
> > >>>>> Stores would not have timestampedXXBuilder APIs since the default
> > >>>>> StoreSupplier / StoreBuilder value types are ValueAndTimestamp
> > >> already.
> > >>>>>
> > >>>>> (new) state/queryable/: QueryableStoreType, QueryableStoreTypes,
> > >>> HostInfo
> > >>>>>
> > >>>>> (new) state/keyValue/: KeyValueXXX classes, and also the same for
> > >>>>> state/sessionWindow and state/timeWindow; *BUT* here we use
> > >>>>> ValueAndTimestamp as value types of those APIs directly, and also
> > >>>>> TimestampedKeyValue/WindowStore would be deprecated.
> > >>>>>
> > >>>>> (new) kstream/api/: KStream, KTable, GroupedKStream (renamed from
> > >>>>> KGroupedStream), GroupedKTable (renamed from KGroupedTable),
> > >>>>> TimeWindowedKStream, SessionWindowedKStream, GlobalKTable
> > >>>>>
> > >>>>> (new) kstream/operator/: Aggregator, ForeachFunction,  ... , Merger
> > >> and
> > >>>>> Grouped, Joined, Materialized, ... , Printed and Transformer,
> > >>>>> TransformerSupplier.
> > >>>>>
> > >>>>> (new) kstream/window/: Window, Windows, Windowed, TimeWindows,
> > >>>>> SessionWindows, UnlimitedWindows, JoinWindows, WindowedSerdes,
> > >>>>> Time/SessionWindowedSerialized/Deserializer.
> > >>>>>
> > >>>>> (new) configure/: RocksDBConfigSetter, TopicNameExtractor,
> > >>>>> TimestampExtractor, UsePreviousTimeOnInvalidTimestamp,
> > >>>>> WallclockTimestampExtractor, ExtractRecordMetadataTimestamp,
> > >>>>> FailOnInvalidTimestamp, LogAndSkipOnInvalidTimestamp,
> > >>>> StateRestoreListener,
> > >>>>>
> > >>>>> (new) metadata/: StreamsMetadata, ThreadMetadata, TaskMetadata,
> > >> TaskId
> > >>>>>
> > >>>>> Still, any xxx/internals packages are declared as inner classes, but
> > >>>> other
> > >>>>> xxx/yyy packages are declared as public APIs.
> > >>>>>
> > >>>>> -------------------
> > >>>>>
> > >>>>> This is a very wild thought and I can totally understand if people
> > >> feel
> > >>>>> this is too much since it definitely enlarges the scope of this KIP a
> > >>> lot
> > >>>>> :) just trying to play a devil's advocate here to do major
> > >> refactoring
> > >>>> and
> > >>>>> avoid renaming Processor classes.
> > >>>>>
> > >>>>>
> > >>>>> Guozhang
> > >>>>>
> > >>>>>
> > >>>>> On Fri, Jun 21, 2019 at 9:51 PM Matthias J. Sax <
> > >> matth...@confluent.io
> > >>>>
> > >>>>> wrote:
> > >>>>>>
> > >>>>>> I think `RecordProcessor` is a good name.
> > >>>>>>
> > >>>>>>
> > >>>>>> -Matthias
> > >>>>>>
> > >>>>>> On 6/21/19 5:09 PM, John Roesler wrote:
> > >>>>>>> After kicking the naming around a bit more, it seems like any
> > >>> package
> > >>>>>>> name change is a bit "weird" because it fragments the package and
> > >>>>>>> directory structure. If we can come up with a reasonable name for
> > >>> the
> > >>>>>>> interface after all, it seems like the better choice.
> > >>>>>>>
> > >>>>>>> The real challenge is that the existing name "Processor" seems
> > >> just
> > >>>>>>> about perfect. In picking a new name, we need to consider the
> > >>>> ultimate
> > >>>>>>> state, after the deprecation period, when we entirely remove
> > >>>>>>> Processor. In this context, TypedProcessor seems a little odd to
> > >>> me,
> > >>>>>>> because it seems to imply that there should also be an "untyped
> > >>>>>>> processor".
> > >>>>>>>
> > >>>>>>> After kicking around a few other ideas, what does everyone think
> > >>>> about
> > >>>>>>> "RecordProcessor"? I _think_ maybe it stands on its own just
> > >> fine,
> > >>>>>>> because it's a thing that processes... records?
> > >>>>>>>
> > >>>>>>> If others agree with this, I can change the proposal to
> > >>>> RecordProcessor.
> > >>>>>>>
> > >>>>>>> Thanks,
> > >>>>>>> -John
> > >>>>>>>
> > >>>>>>> On Fri, Jun 21, 2019 at 6:42 PM John Roesler <j...@confluent.io>
> > >>>> wrote:
> > >>>>>>>>
> > >>>>>>>> Hi all,
> > >>>>>>>>
> > >>>>>>>> I've updated the KIP with the feedback so far.
> > >>>>>>>>
> > >>>>>>>> The naming question is still the biggest (only?) outstanding
> > >>> issue.
> > >>>> It
> > >>>>>>>> would be good to hear some more thoughts on it.
> > >>>>>>>>
> > >>>>>>>> As we stand now, there's one vote for changing the package name
> > >> to
> > >>>>>>>> something like 'typedprocessor', one for changing the interface
> > >> to
> > >>>>>>>> TypedProcessor (as in the PoC), and one for just changing the
> > >>>>>>>> Processor interface in-place, breaking source compatibility.
> > >>>>>>>>
> > >>>>>>>> How can we resolve this decision?
> > >>>>>>>>
> > >>>>>>>> Thanks,
> > >>>>>>>> -John
> > >>>>>>>>
> > >>>>>>>> On Thu, Jun 20, 2019 at 5:44 PM John Roesler <j...@confluent.io
> > >>>
> > >>>> wrote:
> > >>>>>>>>>
> > >>>>>>>>> Thanks for the feedback, Guozhang and Matthias,
> > >>>>>>>>>
> > >>>>>>>>> Regarding motivation: I'll update the wiki. Briefly:
> > >>>>>>>>> * Any processor can benefit. Imagine a pure user of the
> > >>>> ProcessorAPI
> > >>>>>>>>> who has very complex processing logic. I have seen several
> > >>>> processor
> > >>>>>>>>> implementation that are hundreds of lines long and call
> > >>>>>>>>> `context.forward` in many different locations and branches. In
> > >>>> such an
> > >>>>>>>>> implementation, it would be very easy to have a bug in a rarely
> > >>>> used
> > >>>>>>>>> branch that forwards the wrong kind of value. This would
> > >>>> structurally
> > >>>>>>>>> prevent that from happening.
> > >>>>>>>>> * Also, anyone who heavily uses the ProcessorAPI would likely
> > >>> have
> > >>>>>>>>> developed helper methods to wire together processors, just as
> > >> we
> > >>>> have
> > >>>>>>>>> in the DSL implementation. This change would enable them to
> > >>> ensure
> > >>>> at
> > >>>>>>>>> compile time that they are actually wiring together compatible
> > >>>> types.
> > >>>>>>>>> This was actually _my_ original motivation, since I found it
> > >> very
> > >>>>>>>>> difficult and time consuming to follow the Streams DSL internal
> > >>>>>>>>> builders.
> > >>>>>>>>>
> > >>>>>>>>> Regarding breaking the source compatibility of Processor: I
> > >> would
> > >>>>>>>>> _love_ to side-step the naming problem, but I really don't know
> > >>> if
> > >>>>>>>>> it's excusable to break compatibility. I suspect that our
> > >> oldest
> > >>>> and
> > >>>>>>>>> dearest friends are using the ProcessorAPI in some form or
> > >>> another,
> > >>>>>>>>> and all their source code would break. It sucks to have to
> > >>> create a
> > >>>>>>>>> whole new interface to get around this, but it feels like the
> > >>> right
> > >>>>>>>>> thing to do. Would be nice to get even more feedback on this
> > >>> point,
> > >>>>>>>>> though.
> > >>>>>>>>>
> > >>>>>>>>> Regarding the types of stores, as I said in my response to
> > >>> Sophie,
> > >>>>>>>>> it's not an issue.
> > >>>>>>>>>
> > >>>>>>>>> Regarding the change to StreamsBuilder, it doesn't pin the
> > >> types
> > >>> in
> > >>>>>>>>> any way, since all the types are bounded by Object only, and
> > >>> there
> > >>>> are
> > >>>>>>>>> no extra constraints between arguments (each type is used only
> > >>>> once in
> > >>>>>>>>> one argument). But maybe I missed the point you were asking
> > >>> about.
> > >>>>>>>>> Since the type takes generic paramters, we should allow users
> > >> to
> > >>>> pass
> > >>>>>>>>> in parameterized arguments. Otherwise, they would _have to_
> > >> give
> > >>>> us a
> > >>>>>>>>> raw type, and they would be forced to get a "rawtyes" warning
> > >>> from
> > >>>> the
> > >>>>>>>>> compiler. So, it's our obligation in any API that accepts a
> > >>>>>>>>> parameterized-type parameter to allow people to actually pass a
> > >>>>>>>>> parameterized type, even if we don't actually use the
> > >> parameters.
> > >>>>>>>>>
> > >>>>>>>>> The naming question is a complex one, as I took pains to detail
> > >>>>>>>>> previously. Please don't just pick out one minor point, call it
> > >>>> weak,
> > >>>>>>>>> and then claim that it invalidates the whole decision. I don't
> > >>>> think
> > >>>>>>>>> there's a clear best choice, so I'm more than happy for someone
> > >>> to
> > >>>>>>>>> advocate for renaming the class instead of the package. Can you
> > >>>>>>>>> provide some reasons why you think that would be better?
> > >>>>>>>>>
> > >>>>>>>>> Regarding the deprecated methods, you're absolutely right. I'll
> > >>>>> update the KIP.
> > >>>>>>>>>
> > >>>>>>>>> Thanks again for all the feedback!
> > >>>>>>>>> -John
> > >>>>>>>>>
> > >>>>>>>>> On Thu, Jun 20, 2019 at 4:34 PM Matthias J. Sax <
> > >>>> matth...@confluent.io>
> > >>>>> wrote:
> > >>>>>>>>>>
> > >>>>>>>>>> Just want to second what Sophie said about the stores. The
> > >> type
> > >>>> of a
> > >>>>>>>>>> used stores is completely independent of input/output types.
> > >>>>>>>>>>
> > >>>>>>>>>> This related to change `addGlobalStore()` method. Why do you
> > >>> want
> > >>>> to
> > >>>>> pin
> > >>>>>>>>>> the types? In fact, people request the ability to filter() and
> > >>>> maybe
> > >>>>>>>>>> even map() the data before they are put into the global store.
> > >>>>> Limiting
> > >>>>>>>>>> the types seems to be a step backward here?
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> Also, the pack name is questionable.
> > >>>>>>>>>>
> > >>>>>>>>>>> This wouldn't be the first project to do something like
> > >> this...
> > >>>>>>>>>>
> > >>>>>>>>>> Not a strong argument. I would actually propose to not a a new
> > >>>>> package,
> > >>>>>>>>>> but just a new class `TypedProcessor`.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> For `ProcessorContext#forward` methods -- some of those
> > >> methods
> > >>>> are
> > >>>>>>>>>> already deprecated. While the will still be affected, it would
> > >>> be
> > >>>>> worth
> > >>>>>>>>>> to mark them as deprecated in the wiki page, too.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> @Guozhang: I dont' think we should break source compatibility
> > >>> in a
> > >>>>> minor
> > >>>>>>>>>> release.
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> -Matthias
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>>>>> On 6/20/19 1:43 PM, Guozhang Wang wrote:
> > >>>>>>>>>>> Hi John,
> > >>>>>>>>>>>
> > >>>>>>>>>>> Thanks for KIP! I've a few comments below:
> > >>>>>>>>>>>
> > >>>>>>>>>>> 1. So far the "Motivation" section is very general, and the
> > >>> only
> > >>>>> concrete
> > >>>>>>>>>>> example that I have in mind is `TransformValues#punctuate`.
> > >> Do
> > >>> we
> > >>>>> have any
> > >>>>>>>>>>> other concrete issues that drive this KIP? If not then I feel
> > >>>>> better to
> > >>>>>>>>>>> narrow the scope of this KIP to:
> > >>>>>>>>>>>
> > >>>>>>>>>>> 1.a) modifying ProcessorContext only with the output types on
> > >>>>> forward.
> > >>>>>>>>>>> 1.b) modifying Transformer signature to have generics of
> > >>>>> ProcessorContext,
> > >>>>>>>>>>> and then lift the restricting of not using punctuate: if user
> > >>> did
> > >>>>> not
> > >>>>>>>>>>> follow the enforced typing and just code without generics,
> > >> they
> > >>>>> will get
> > >>>>>>>>>>> warning at compile time and get run-time error if they
> > >> forward
> > >>>>> wrong-typed
> > >>>>>>>>>>> records, which I think would be acceptable.
> > >>>>>>>>>>>
> > >>>>>>>>>>> I feel this would be a good solution for this specific issue;
> > >>>>> again, feel
> > >>>>>>>>>>> free to update the wiki page with other known issues that
> > >>> cannot
> > >>>> be
> > >>>>>>>>>>> resolved.
> > >>>>>>>>>>>
> > >>>>>>>>>>> 2. If, we want to go with the current scope then my next
> > >>> question
> > >>>>> would be,
> > >>>>>>>>>>> how much breakage we would introducing if we just modify the
> > >>>>> Processor
> > >>>>>>>>>>> signature directly? My feeling is that DSL users would be
> > >> most
> > >>>>> likely not
> > >>>>>>>>>>> affected and PAPI users only need to modify a few lines on
> > >>> class
> > >>>>>>>>>>> declaration. I feel it worth doing some research on this part
> > >>> and
> > >>>>> then
> > >>>>>>>>>>> decide if we really want to bite the bullet of duplicated
> > >>>> Processor
> > >>>>> /
> > >>>>>>>>>>> ProcessorSupplier classes for maintaining compatibility.
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> Guozhang
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>> On Wed, Jun 19, 2019 at 12:21 PM John Roesler <
> > >>> j...@confluent.io
> > >>>>>
> > >>>>> wrote:
> > >>>>>>>>>>>
> > >>>>>>>>>>>> Hi all,
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> In response to the feedback so far, I changed the package
> > >> name
> > >>>> from
> > >>>>>>>>>>>> `processor2` to `processor.generic`.
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> Thanks,
> > >>>>>>>>>>>> -John
> > >>>>>>>>>>>>
> > >>>>>>>>>>>> On Mon, Jun 17, 2019 at 4:49 PM John Roesler <
> > >>> j...@confluent.io
> > >>>>>
> > >>>>> wrote:
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> Thanks for the feedback, Sophie!
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I actually felt a little uneasy when I wrote that remark,
> > >>>> because
> > >>>>> it's
> > >>>>>>>>>>>>> not restricted at all in the API, it's just available to
> > >> you
> > >>> if
> > >>>>> you
> > >>>>>>>>>>>>> choose to give your stores and context the same parameters.
> > >>>> So, I
> > >>>>>>>>>>>>> think your use case is valid, and also perfectly
> > >> permissable
> > >>>>> under the
> > >>>>>>>>>>>>> current KIP. Sorry for sowing confusion on my own
> > >> discussion
> > >>>>> thread!
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I'm not crazy about the package name, either. I went with
> > >> it
> > >>>> only
> > >>>>>>>>>>>>> because there's seemingly nothing special about the new
> > >>> package
> > >>>>> except
> > >>>>>>>>>>>>> that it can't have the same name as the old one. Otherwise,
> > >>> the
> > >>>>>>>>>>>>> existing "processor" and "Processor" names for the package
> > >>> and
> > >>>>> class
> > >>>>>>>>>>>>> are perfectly satisfying. Rather than pile on additional
> > >>>>> semantics, it
> > >>>>>>>>>>>>> seemed cleaner to just add a number to the package name.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> This wouldn't be the first project to do something like
> > >>> this...
> > >>>>> Apache
> > >>>>>>>>>>>>> Commons, for example, has added a "2" to the end of some of
> > >>>> their
> > >>>>>>>>>>>>> packages for exactly the same reason.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> I'm open to any suggestions. For example, we could do
> > >>> something
> > >>>>> like
> > >>>>>>>>>>>>> org.apache.kafka.streams.typedprocessor.Processor or
> > >>>>>>>>>>>>> org.apache.kafka.streams.processor.typed.Processor , which
> > >>>> would
> > >>>>> have
> > >>>>>>>>>>>>> just about the same effect. One microscopic thought is
> > >> that,
> > >>> if
> > >>>>>>>>>>>>> there's another interface in the "processor" package that
> > >> we
> > >>>> wish
> > >>>>> to
> > >>>>>>>>>>>>> do the same thing to, would _could_ pile it in to
> > >>> "processor2",
> > >>>>> but we
> > >>>>>>>>>>>>> couldn't do the same if we use a package that has "typed"
> > >> in
> > >>>> the
> > >>>>> name,
> > >>>>>>>>>>>>> unless that change is _also_ related to types in some way.
> > >>> But
> > >>>>> this
> > >>>>>>>>>>>>> seems like a very minor concern.
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> What's your preference?
> > >>>>>>>>>>>>> -John
> > >>>>>>>>>>>>>
> > >>>>>>>>>>>>> On Mon, Jun 17, 2019 at 3:56 PM Sophie Blee-Goldman <
> > >>>>> sop...@confluent.io>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> Hey John, thanks for writing this up! I like the proposal
> > >>> but
> > >>>>> there's
> > >>>>>>>>>>>> one
> > >>>>>>>>>>>>>> point that I think may be too restrictive:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> "A processor that happens to use a typed store is actually
> > >>>>> emitting the
> > >>>>>>>>>>>>>> same types that it is storing."
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I can imagine someone could want to leverage this new type
> > >>>> safety
> > >>>>>>>>>>>> without
> > >>>>>>>>>>>>>> also limiting how they can interact with/use their store.
> > >> As
> > >>>> an
> > >>>>>>>>>>>> (admittedly
> > >>>>>>>>>>>>>> contrived) example, say you have an input stream of
> > >>> purchases
> > >>>> of
> > >>>>> a
> > >>>>>>>>>>>> certain
> > >>>>>>>>>>>>>> type (entertainment, food, etc), and on seeing a new
> > >> record
> > >>>> you
> > >>>>> want to
> > >>>>>>>>>>>>>> output how many types of purchase a shopper has made more
> > >>>> than 5
> > >>>>>>>>>>>> purchases
> > >>>>>>>>>>>>>> of in the last month. Your state store will probably be
> > >>>> holding
> > >>>>> some
> > >>>>>>>>>>>> more
> > >>>>>>>>>>>>>> complicated PurchaseHistory object (keyed by user), but
> > >> your
> > >>>>> output is
> > >>>>>>>>>>>> just
> > >>>>>>>>>>>>>> a <User, Long>
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I'm also not crazy about "processor2" as the package name
> > >>> ...
> > >>>>> not sure
> > >>>>>>>>>>>> what
> > >>>>>>>>>>>>>> a better one would be though (something with "typed"?)
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:47 PM John Roesler <
> > >>>> j...@confluent.io>
> > >>>>>>>>>>>> wrote:
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Hi all,
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> I'd like to propose KIP-478 (
> > >>>>>>>>>>>> https://cwiki.apache.org/confluence/x/2SkLBw
> > >>>>>>>>>>>>>>> ).
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> This proposal would add output type bounds to the
> > >> Processor
> > >>>>> interface
> > >>>>>>>>>>>>>>> in Kafka Streams, which enables static checking of a
> > >> number
> > >>>> of
> > >>>>> useful
> > >>>>>>>>>>>>>>> properties:
> > >>>>>>>>>>>>>>> * A processor B that consumes the output of processor A
> > >> is
> > >>>>> actually
> > >>>>>>>>>>>>>>> expecting the same types that processor A produces.
> > >>>>>>>>>>>>>>> * A processor that happens to use a typed store is
> > >> actually
> > >>>>> emitting
> > >>>>>>>>>>>>>>> the same types that it is storing.
> > >>>>>>>>>>>>>>> * A processor is simply forwarding the expected types in
> > >>> all
> > >>>>> code
> > >>>>>>>>>>>> paths.
> > >>>>>>>>>>>>>>> * Processors added via the Streams DSL, which are not
> > >>>> permitted
> > >>>>> to
> > >>>>>>>>>>>>>>> forward results at all are statically prevented from
> > >> doing
> > >>> so
> > >>>>> by the
> > >>>>>>>>>>>>>>> compiler
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Internally, we can use the above properties to achieve a
> > >>> much
> > >>>>> higher
> > >>>>>>>>>>>>>>> level of confidence in the Streams DSL implementation's
> > >>>>> correctness.
> > >>>>>>>>>>>>>>> Actually, while doing the POC, I found a few bugs and
> > >>>> mistakes,
> > >>>>> which
> > >>>>>>>>>>>>>>> become structurally impossible with KIP-478.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Additionally, the stronger types dramatically improve the
> > >>>>>>>>>>>>>>> self-documentation of our Streams internal
> > >> implementations,
> > >>>>> which
> > >>>>>>>>>>>>>>> makes it much easier for new contributors to ramp up with
> > >>>>> confidence.
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>> Thanks so much for your consideration!
> > >>>>>>>>>>>>>>> -John
> > >>>>>>>>>>>>>>>
> > >>>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>>
> > >>>>>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> -- Guozhang
> > >>>>
> > >>>
> > >>>
> > >>> --
> > >>> -- Guozhang
> > >>>
> > >>
> > >
> >

Reply via email to