Hi again, all, I have started the voting thread. Please cast your votes (or voice your objections)! The vote will remain open at least 72 hours. Once it closes, I can send the PR pretty quickly.
Thanks for all you help ironing out the details on this feature. -John On Mon, Jul 15, 2019 at 5:09 PM John Roesler <j...@confluent.io> wrote: > > Hey all, > > It sounds like there's general agreement now on this KIP, so I updated > the KIP to fit in with Guozhang's overall proposed package structure. > Specifically, the proposed name for the new Processor interface is > "org.apache.kafka.streams.processor.api.Processor". > > If there are no objections, then I plan to start the vote tomorrow! > > Thanks, all, for your contributions. > -John > > On Thu, Jul 11, 2019 at 1:50 PM Matthias J. Sax <matth...@confluent.io> wrote: > > > > Side remark: > > > > > Now that "flat transform" is a specific > > >> part of the API it seems okay to steer folks in that direction (to never > > >> use context.process in a transformer), but it should be called out > > >> explicitly in javadocs. Currently Transformer (which is used for both > > >> transform() and flatTransform() ) doesn't really call out the ambiguity: > > > > Would you want to do a PR for address this? We are always eager to > > improve the JavaDocs! > > > > > > -Matthias > > > > On 7/7/19 11:26 AM, Paul Whalen wrote: > > > First of all, +1 on the whole idea, my team has run into (admittedly > > > minor, > > > but definitely annoying) issues because of the weaker typing. We're heavy > > > users of the PAPI and have Processors that, while not hundreds of lines > > > long, are certainly quite hefty and call context.forward() in many places. > > > > > > After reading the KIP and discussion a few times, I've convinced myself > > > that any initial concerns I had aren't really concerns at all (state store > > > types, for one). One thing I will mention: changing *Transformer* to > > > have > > > ProcessorContext<Void, Void> gave me pause, because I have code that does > > > context.forward in transformers. Now that "flat transform" is a specific > > > part of the API it seems okay to steer folks in that direction (to never > > > use context.process in a transformer), but it should be called out > > > explicitly in javadocs. Currently Transformer (which is used for both > > > transform() and flatTransform() ) doesn't really call out the ambiguity: > > > https://github.com/apache/kafka/blob/ca641b3e2e48c14ff308181c775775408f5f35f7/streams/src/main/java/org/apache/kafka/streams/kstream/Transformer.java#L75-L77, > > > and for migrating users (from before flatTransform) it could be confusing. > > > > > > Side note, I'd like to plug KIP-401 (there is a discussion thread and a > > > voting thread) which also relates to using the PAPI. It seems like there > > > is some interest and it is in a votable state with the majority of > > > implementation complete. > > > > > > Paul > > > > > > On Fri, Jun 28, 2019 at 2:02 PM Bill Bejeck <bbej...@gmail.com> wrote: > > > > > >> Sorry for coming late to the party. > > >> > > >> As for the naming I'm in favor of RecordProcessor as well. > > >> > > >> I agree that we should not take on doing all of the package movements as > > >> part of this KIP, especially as John has pointed out, it will be an > > >> opportunity to discuss some clean-up on individual classes which I > > >> envision > > >> becoming another somewhat involved process. > > >> > > >> For the end goal, if possible, here's what I propose. > > >> > > >> 1. We keep the scope of the KIP the same, *but we only implement* *it > > >> in > > >> phases* > > >> 2. Phase one could include what Guozhang had proposed earlier namely > > >> 1. > 1.a) modifying ProcessorContext only with the output types on > > >> forward. > > >> > 1.b) modifying Transformer signature to have generics of > > >> ProcessorContext, > > >> > and then lift the restricting of not using punctuate: if user did > > >> not > > >> > follow the enforced typing and just code without generics, they > > >> will get > > >> > warning at compile time and get run-time error if they forward > > >> wrong-typed > > >> > records, which I think would be acceptable. > > >> 3. Then we could tackle other pieces in an incremental manner as we > > >> see > > >> what makes sense > > >> > > >> Just my 2cents > > >> > > >> -Bill > > >> > > >> On Mon, Jun 24, 2019 at 10:22 PM Guozhang Wang <wangg...@gmail.com> > > >> wrote: > > >> > > >>> Hi John, > > >>> > > >>> Yeah I think we should not do all the repackaging as part of this KIP as > > >>> well (we can just do the movement of the Processor / ProcessorSupplier), > > >>> but I think we need to discuss the end goal here since otherwise we may > > >> do > > >>> the repackaging of Processor in this KIP, but only later on realizing > > >> that > > >>> other re-packagings are not our favorite solutions. > > >>> > > >>> > > >>> Guozhang > > >>> > > >>> On Mon, Jun 24, 2019 at 3:06 PM John Roesler <j...@confluent.io> wrote: > > >>> > > >>>> Hey Guozhang, > > >>>> > > >>>> Thanks for the idea! I'm wondering if we could take a middle ground > > >>>> and take your proposed layout as a "roadmap", while only actually > > >>>> moving the classes that are already involved in this KIP. > > >>>> > > >>>> The reason I ask is not just to control the scope of this KIP, but > > >>>> also, I think that if we move other classes to new packages, we might > > >>>> also want to take the opportunity to clean up other things about them. > > >>>> But each one of those would become a discussion point of its own, so > > >>>> it seems the discussion would become intractable. FWIW, I do like your > > >>>> idea for precisely this reason, it creates opportunities for us to > > >>>> consider other changes that we are simply not able to make without > > >>>> breaking source compatibility. > > >>>> > > >>>> If the others feel "kind of favorable" with this overall vision, maybe > > >>>> we can make one or more Jira tickets to capture it, and then just > > >>>> alter _this_ proposal to `processor.api.Processor` (etc). > > >>>> > > >>>> WDYT? > > >>>> -John > > >>>> > > >>>> On Sun, Jun 23, 2019 at 7:17 PM Guozhang Wang <wangg...@gmail.com> > > >>> wrote: > > >>>>> > > >>>>> Hello John, > > >>>>> > > >>>>> Thanks for your detailed explanation, I've done some quick checks on > > >>> some > > >>>>> existing examples that heavily used Processor and the results also > > >>> makes > > >>>> me > > >>>>> worried about my previous statements that "the breakage would not be > > >>>> big". > > >>>>> I agree we should maintain compatibility. > > >>>>> > > >>>>> About the naming itself, I'm actually a bit inclined into > > >> sub-packages > > >>>> than > > >>>>> renamed new classes, and my motivations are that our current > > >> packaging > > >>> is > > >>>>> already quite coarsen grained and sometimes ill-placed, and hence > > >> maybe > > >>>> we > > >>>>> can take this change along with some clean up on packages (but again, > > >>> we > > >>>>> should follow the deprecate - removal path). What I'm thinking is: > > >>>>> > > >>>>> ------------------- > > >>>>> > > >>>>> processor/: StateRestoreCallback/AbstractNotifyingRestoreCallback, > > >>>> (deprecated > > >>>>> later, same meaning for other cross-throughs), ProcessContest, > > >>>>> RecordContext, Punctuator, PunctuationType, To, Cancellable (are the > > >>> only > > >>>>> things left) > > >>>>> > > >>>>> (new) processor/api/: Processor, ProcessorSupplier (and of course, > > >>> these > > >>>>> two classes can be strong typed) > > >>>>> > > >>>>> state/: StateStore, BatchingStateRestoreCallback, > > >>>>> AbstractNotifyingBatchingRestoreCallback (moved from processor/), > > >>>>> PartitionGrouper, WindowStoreIterator, StateSerdes (this one can be > > >>> moved > > >>>>> into state/internals), TimestampedByteStore (we can move this to > > >>>> internals > > >>>>> since store types would use vat by default, see below), > > >>> ValueAndTimestamp > > >>>>> > > >>>>> (new) state/factory/: Stores, StoreBuilder, StoreSupplier; *BUT* the > > >>> new > > >>>>> Stores would not have timestampedXXBuilder APIs since the default > > >>>>> StoreSupplier / StoreBuilder value types are ValueAndTimestamp > > >> already. > > >>>>> > > >>>>> (new) state/queryable/: QueryableStoreType, QueryableStoreTypes, > > >>> HostInfo > > >>>>> > > >>>>> (new) state/keyValue/: KeyValueXXX classes, and also the same for > > >>>>> state/sessionWindow and state/timeWindow; *BUT* here we use > > >>>>> ValueAndTimestamp as value types of those APIs directly, and also > > >>>>> TimestampedKeyValue/WindowStore would be deprecated. > > >>>>> > > >>>>> (new) kstream/api/: KStream, KTable, GroupedKStream (renamed from > > >>>>> KGroupedStream), GroupedKTable (renamed from KGroupedTable), > > >>>>> TimeWindowedKStream, SessionWindowedKStream, GlobalKTable > > >>>>> > > >>>>> (new) kstream/operator/: Aggregator, ForeachFunction, ... , Merger > > >> and > > >>>>> Grouped, Joined, Materialized, ... , Printed and Transformer, > > >>>>> TransformerSupplier. > > >>>>> > > >>>>> (new) kstream/window/: Window, Windows, Windowed, TimeWindows, > > >>>>> SessionWindows, UnlimitedWindows, JoinWindows, WindowedSerdes, > > >>>>> Time/SessionWindowedSerialized/Deserializer. > > >>>>> > > >>>>> (new) configure/: RocksDBConfigSetter, TopicNameExtractor, > > >>>>> TimestampExtractor, UsePreviousTimeOnInvalidTimestamp, > > >>>>> WallclockTimestampExtractor, ExtractRecordMetadataTimestamp, > > >>>>> FailOnInvalidTimestamp, LogAndSkipOnInvalidTimestamp, > > >>>> StateRestoreListener, > > >>>>> > > >>>>> (new) metadata/: StreamsMetadata, ThreadMetadata, TaskMetadata, > > >> TaskId > > >>>>> > > >>>>> Still, any xxx/internals packages are declared as inner classes, but > > >>>> other > > >>>>> xxx/yyy packages are declared as public APIs. > > >>>>> > > >>>>> ------------------- > > >>>>> > > >>>>> This is a very wild thought and I can totally understand if people > > >> feel > > >>>>> this is too much since it definitely enlarges the scope of this KIP a > > >>> lot > > >>>>> :) just trying to play a devil's advocate here to do major > > >> refactoring > > >>>> and > > >>>>> avoid renaming Processor classes. > > >>>>> > > >>>>> > > >>>>> Guozhang > > >>>>> > > >>>>> > > >>>>> On Fri, Jun 21, 2019 at 9:51 PM Matthias J. Sax < > > >> matth...@confluent.io > > >>>> > > >>>>> wrote: > > >>>>>> > > >>>>>> I think `RecordProcessor` is a good name. > > >>>>>> > > >>>>>> > > >>>>>> -Matthias > > >>>>>> > > >>>>>> On 6/21/19 5:09 PM, John Roesler wrote: > > >>>>>>> After kicking the naming around a bit more, it seems like any > > >>> package > > >>>>>>> name change is a bit "weird" because it fragments the package and > > >>>>>>> directory structure. If we can come up with a reasonable name for > > >>> the > > >>>>>>> interface after all, it seems like the better choice. > > >>>>>>> > > >>>>>>> The real challenge is that the existing name "Processor" seems > > >> just > > >>>>>>> about perfect. In picking a new name, we need to consider the > > >>>> ultimate > > >>>>>>> state, after the deprecation period, when we entirely remove > > >>>>>>> Processor. In this context, TypedProcessor seems a little odd to > > >>> me, > > >>>>>>> because it seems to imply that there should also be an "untyped > > >>>>>>> processor". > > >>>>>>> > > >>>>>>> After kicking around a few other ideas, what does everyone think > > >>>> about > > >>>>>>> "RecordProcessor"? I _think_ maybe it stands on its own just > > >> fine, > > >>>>>>> because it's a thing that processes... records? > > >>>>>>> > > >>>>>>> If others agree with this, I can change the proposal to > > >>>> RecordProcessor. > > >>>>>>> > > >>>>>>> Thanks, > > >>>>>>> -John > > >>>>>>> > > >>>>>>> On Fri, Jun 21, 2019 at 6:42 PM John Roesler <j...@confluent.io> > > >>>> wrote: > > >>>>>>>> > > >>>>>>>> Hi all, > > >>>>>>>> > > >>>>>>>> I've updated the KIP with the feedback so far. > > >>>>>>>> > > >>>>>>>> The naming question is still the biggest (only?) outstanding > > >>> issue. > > >>>> It > > >>>>>>>> would be good to hear some more thoughts on it. > > >>>>>>>> > > >>>>>>>> As we stand now, there's one vote for changing the package name > > >> to > > >>>>>>>> something like 'typedprocessor', one for changing the interface > > >> to > > >>>>>>>> TypedProcessor (as in the PoC), and one for just changing the > > >>>>>>>> Processor interface in-place, breaking source compatibility. > > >>>>>>>> > > >>>>>>>> How can we resolve this decision? > > >>>>>>>> > > >>>>>>>> Thanks, > > >>>>>>>> -John > > >>>>>>>> > > >>>>>>>> On Thu, Jun 20, 2019 at 5:44 PM John Roesler <j...@confluent.io > > >>> > > >>>> wrote: > > >>>>>>>>> > > >>>>>>>>> Thanks for the feedback, Guozhang and Matthias, > > >>>>>>>>> > > >>>>>>>>> Regarding motivation: I'll update the wiki. Briefly: > > >>>>>>>>> * Any processor can benefit. Imagine a pure user of the > > >>>> ProcessorAPI > > >>>>>>>>> who has very complex processing logic. I have seen several > > >>>> processor > > >>>>>>>>> implementation that are hundreds of lines long and call > > >>>>>>>>> `context.forward` in many different locations and branches. In > > >>>> such an > > >>>>>>>>> implementation, it would be very easy to have a bug in a rarely > > >>>> used > > >>>>>>>>> branch that forwards the wrong kind of value. This would > > >>>> structurally > > >>>>>>>>> prevent that from happening. > > >>>>>>>>> * Also, anyone who heavily uses the ProcessorAPI would likely > > >>> have > > >>>>>>>>> developed helper methods to wire together processors, just as > > >> we > > >>>> have > > >>>>>>>>> in the DSL implementation. This change would enable them to > > >>> ensure > > >>>> at > > >>>>>>>>> compile time that they are actually wiring together compatible > > >>>> types. > > >>>>>>>>> This was actually _my_ original motivation, since I found it > > >> very > > >>>>>>>>> difficult and time consuming to follow the Streams DSL internal > > >>>>>>>>> builders. > > >>>>>>>>> > > >>>>>>>>> Regarding breaking the source compatibility of Processor: I > > >> would > > >>>>>>>>> _love_ to side-step the naming problem, but I really don't know > > >>> if > > >>>>>>>>> it's excusable to break compatibility. I suspect that our > > >> oldest > > >>>> and > > >>>>>>>>> dearest friends are using the ProcessorAPI in some form or > > >>> another, > > >>>>>>>>> and all their source code would break. It sucks to have to > > >>> create a > > >>>>>>>>> whole new interface to get around this, but it feels like the > > >>> right > > >>>>>>>>> thing to do. Would be nice to get even more feedback on this > > >>> point, > > >>>>>>>>> though. > > >>>>>>>>> > > >>>>>>>>> Regarding the types of stores, as I said in my response to > > >>> Sophie, > > >>>>>>>>> it's not an issue. > > >>>>>>>>> > > >>>>>>>>> Regarding the change to StreamsBuilder, it doesn't pin the > > >> types > > >>> in > > >>>>>>>>> any way, since all the types are bounded by Object only, and > > >>> there > > >>>> are > > >>>>>>>>> no extra constraints between arguments (each type is used only > > >>>> once in > > >>>>>>>>> one argument). But maybe I missed the point you were asking > > >>> about. > > >>>>>>>>> Since the type takes generic paramters, we should allow users > > >> to > > >>>> pass > > >>>>>>>>> in parameterized arguments. Otherwise, they would _have to_ > > >> give > > >>>> us a > > >>>>>>>>> raw type, and they would be forced to get a "rawtyes" warning > > >>> from > > >>>> the > > >>>>>>>>> compiler. So, it's our obligation in any API that accepts a > > >>>>>>>>> parameterized-type parameter to allow people to actually pass a > > >>>>>>>>> parameterized type, even if we don't actually use the > > >> parameters. > > >>>>>>>>> > > >>>>>>>>> The naming question is a complex one, as I took pains to detail > > >>>>>>>>> previously. Please don't just pick out one minor point, call it > > >>>> weak, > > >>>>>>>>> and then claim that it invalidates the whole decision. I don't > > >>>> think > > >>>>>>>>> there's a clear best choice, so I'm more than happy for someone > > >>> to > > >>>>>>>>> advocate for renaming the class instead of the package. Can you > > >>>>>>>>> provide some reasons why you think that would be better? > > >>>>>>>>> > > >>>>>>>>> Regarding the deprecated methods, you're absolutely right. I'll > > >>>>> update the KIP. > > >>>>>>>>> > > >>>>>>>>> Thanks again for all the feedback! > > >>>>>>>>> -John > > >>>>>>>>> > > >>>>>>>>> On Thu, Jun 20, 2019 at 4:34 PM Matthias J. Sax < > > >>>> matth...@confluent.io> > > >>>>> wrote: > > >>>>>>>>>> > > >>>>>>>>>> Just want to second what Sophie said about the stores. The > > >> type > > >>>> of a > > >>>>>>>>>> used stores is completely independent of input/output types. > > >>>>>>>>>> > > >>>>>>>>>> This related to change `addGlobalStore()` method. Why do you > > >>> want > > >>>> to > > >>>>> pin > > >>>>>>>>>> the types? In fact, people request the ability to filter() and > > >>>> maybe > > >>>>>>>>>> even map() the data before they are put into the global store. > > >>>>> Limiting > > >>>>>>>>>> the types seems to be a step backward here? > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> Also, the pack name is questionable. > > >>>>>>>>>> > > >>>>>>>>>>> This wouldn't be the first project to do something like > > >> this... > > >>>>>>>>>> > > >>>>>>>>>> Not a strong argument. I would actually propose to not a a new > > >>>>> package, > > >>>>>>>>>> but just a new class `TypedProcessor`. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> For `ProcessorContext#forward` methods -- some of those > > >> methods > > >>>> are > > >>>>>>>>>> already deprecated. While the will still be affected, it would > > >>> be > > >>>>> worth > > >>>>>>>>>> to mark them as deprecated in the wiki page, too. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> @Guozhang: I dont' think we should break source compatibility > > >>> in a > > >>>>> minor > > >>>>>>>>>> release. > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> -Matthias > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> > > >>>>>>>>>> On 6/20/19 1:43 PM, Guozhang Wang wrote: > > >>>>>>>>>>> Hi John, > > >>>>>>>>>>> > > >>>>>>>>>>> Thanks for KIP! I've a few comments below: > > >>>>>>>>>>> > > >>>>>>>>>>> 1. So far the "Motivation" section is very general, and the > > >>> only > > >>>>> concrete > > >>>>>>>>>>> example that I have in mind is `TransformValues#punctuate`. > > >> Do > > >>> we > > >>>>> have any > > >>>>>>>>>>> other concrete issues that drive this KIP? If not then I feel > > >>>>> better to > > >>>>>>>>>>> narrow the scope of this KIP to: > > >>>>>>>>>>> > > >>>>>>>>>>> 1.a) modifying ProcessorContext only with the output types on > > >>>>> forward. > > >>>>>>>>>>> 1.b) modifying Transformer signature to have generics of > > >>>>> ProcessorContext, > > >>>>>>>>>>> and then lift the restricting of not using punctuate: if user > > >>> did > > >>>>> not > > >>>>>>>>>>> follow the enforced typing and just code without generics, > > >> they > > >>>>> will get > > >>>>>>>>>>> warning at compile time and get run-time error if they > > >> forward > > >>>>> wrong-typed > > >>>>>>>>>>> records, which I think would be acceptable. > > >>>>>>>>>>> > > >>>>>>>>>>> I feel this would be a good solution for this specific issue; > > >>>>> again, feel > > >>>>>>>>>>> free to update the wiki page with other known issues that > > >>> cannot > > >>>> be > > >>>>>>>>>>> resolved. > > >>>>>>>>>>> > > >>>>>>>>>>> 2. If, we want to go with the current scope then my next > > >>> question > > >>>>> would be, > > >>>>>>>>>>> how much breakage we would introducing if we just modify the > > >>>>> Processor > > >>>>>>>>>>> signature directly? My feeling is that DSL users would be > > >> most > > >>>>> likely not > > >>>>>>>>>>> affected and PAPI users only need to modify a few lines on > > >>> class > > >>>>>>>>>>> declaration. I feel it worth doing some research on this part > > >>> and > > >>>>> then > > >>>>>>>>>>> decide if we really want to bite the bullet of duplicated > > >>>> Processor > > >>>>> / > > >>>>>>>>>>> ProcessorSupplier classes for maintaining compatibility. > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> Guozhang > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> On Wed, Jun 19, 2019 at 12:21 PM John Roesler < > > >>> j...@confluent.io > > >>>>> > > >>>>> wrote: > > >>>>>>>>>>> > > >>>>>>>>>>>> Hi all, > > >>>>>>>>>>>> > > >>>>>>>>>>>> In response to the feedback so far, I changed the package > > >> name > > >>>> from > > >>>>>>>>>>>> `processor2` to `processor.generic`. > > >>>>>>>>>>>> > > >>>>>>>>>>>> Thanks, > > >>>>>>>>>>>> -John > > >>>>>>>>>>>> > > >>>>>>>>>>>> On Mon, Jun 17, 2019 at 4:49 PM John Roesler < > > >>> j...@confluent.io > > >>>>> > > >>>>> wrote: > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> Thanks for the feedback, Sophie! > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I actually felt a little uneasy when I wrote that remark, > > >>>> because > > >>>>> it's > > >>>>>>>>>>>>> not restricted at all in the API, it's just available to > > >> you > > >>> if > > >>>>> you > > >>>>>>>>>>>>> choose to give your stores and context the same parameters. > > >>>> So, I > > >>>>>>>>>>>>> think your use case is valid, and also perfectly > > >> permissable > > >>>>> under the > > >>>>>>>>>>>>> current KIP. Sorry for sowing confusion on my own > > >> discussion > > >>>>> thread! > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I'm not crazy about the package name, either. I went with > > >> it > > >>>> only > > >>>>>>>>>>>>> because there's seemingly nothing special about the new > > >>> package > > >>>>> except > > >>>>>>>>>>>>> that it can't have the same name as the old one. Otherwise, > > >>> the > > >>>>>>>>>>>>> existing "processor" and "Processor" names for the package > > >>> and > > >>>>> class > > >>>>>>>>>>>>> are perfectly satisfying. Rather than pile on additional > > >>>>> semantics, it > > >>>>>>>>>>>>> seemed cleaner to just add a number to the package name. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> This wouldn't be the first project to do something like > > >>> this... > > >>>>> Apache > > >>>>>>>>>>>>> Commons, for example, has added a "2" to the end of some of > > >>>> their > > >>>>>>>>>>>>> packages for exactly the same reason. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> I'm open to any suggestions. For example, we could do > > >>> something > > >>>>> like > > >>>>>>>>>>>>> org.apache.kafka.streams.typedprocessor.Processor or > > >>>>>>>>>>>>> org.apache.kafka.streams.processor.typed.Processor , which > > >>>> would > > >>>>> have > > >>>>>>>>>>>>> just about the same effect. One microscopic thought is > > >> that, > > >>> if > > >>>>>>>>>>>>> there's another interface in the "processor" package that > > >> we > > >>>> wish > > >>>>> to > > >>>>>>>>>>>>> do the same thing to, would _could_ pile it in to > > >>> "processor2", > > >>>>> but we > > >>>>>>>>>>>>> couldn't do the same if we use a package that has "typed" > > >> in > > >>>> the > > >>>>> name, > > >>>>>>>>>>>>> unless that change is _also_ related to types in some way. > > >>> But > > >>>>> this > > >>>>>>>>>>>>> seems like a very minor concern. > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> What's your preference? > > >>>>>>>>>>>>> -John > > >>>>>>>>>>>>> > > >>>>>>>>>>>>> On Mon, Jun 17, 2019 at 3:56 PM Sophie Blee-Goldman < > > >>>>> sop...@confluent.io> > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> Hey John, thanks for writing this up! I like the proposal > > >>> but > > >>>>> there's > > >>>>>>>>>>>> one > > >>>>>>>>>>>>>> point that I think may be too restrictive: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> "A processor that happens to use a typed store is actually > > >>>>> emitting the > > >>>>>>>>>>>>>> same types that it is storing." > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I can imagine someone could want to leverage this new type > > >>>> safety > > >>>>>>>>>>>> without > > >>>>>>>>>>>>>> also limiting how they can interact with/use their store. > > >> As > > >>>> an > > >>>>>>>>>>>> (admittedly > > >>>>>>>>>>>>>> contrived) example, say you have an input stream of > > >>> purchases > > >>>> of > > >>>>> a > > >>>>>>>>>>>> certain > > >>>>>>>>>>>>>> type (entertainment, food, etc), and on seeing a new > > >> record > > >>>> you > > >>>>> want to > > >>>>>>>>>>>>>> output how many types of purchase a shopper has made more > > >>>> than 5 > > >>>>>>>>>>>> purchases > > >>>>>>>>>>>>>> of in the last month. Your state store will probably be > > >>>> holding > > >>>>> some > > >>>>>>>>>>>> more > > >>>>>>>>>>>>>> complicated PurchaseHistory object (keyed by user), but > > >> your > > >>>>> output is > > >>>>>>>>>>>> just > > >>>>>>>>>>>>>> a <User, Long> > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> I'm also not crazy about "processor2" as the package name > > >>> ... > > >>>>> not sure > > >>>>>>>>>>>> what > > >>>>>>>>>>>>>> a better one would be though (something with "typed"?) > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:47 PM John Roesler < > > >>>> j...@confluent.io> > > >>>>>>>>>>>> wrote: > > >>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Hi all, > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> I'd like to propose KIP-478 ( > > >>>>>>>>>>>> https://cwiki.apache.org/confluence/x/2SkLBw > > >>>>>>>>>>>>>>> ). > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> This proposal would add output type bounds to the > > >> Processor > > >>>>> interface > > >>>>>>>>>>>>>>> in Kafka Streams, which enables static checking of a > > >> number > > >>>> of > > >>>>> useful > > >>>>>>>>>>>>>>> properties: > > >>>>>>>>>>>>>>> * A processor B that consumes the output of processor A > > >> is > > >>>>> actually > > >>>>>>>>>>>>>>> expecting the same types that processor A produces. > > >>>>>>>>>>>>>>> * A processor that happens to use a typed store is > > >> actually > > >>>>> emitting > > >>>>>>>>>>>>>>> the same types that it is storing. > > >>>>>>>>>>>>>>> * A processor is simply forwarding the expected types in > > >>> all > > >>>>> code > > >>>>>>>>>>>> paths. > > >>>>>>>>>>>>>>> * Processors added via the Streams DSL, which are not > > >>>> permitted > > >>>>> to > > >>>>>>>>>>>>>>> forward results at all are statically prevented from > > >> doing > > >>> so > > >>>>> by the > > >>>>>>>>>>>>>>> compiler > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Internally, we can use the above properties to achieve a > > >>> much > > >>>>> higher > > >>>>>>>>>>>>>>> level of confidence in the Streams DSL implementation's > > >>>>> correctness. > > >>>>>>>>>>>>>>> Actually, while doing the POC, I found a few bugs and > > >>>> mistakes, > > >>>>> which > > >>>>>>>>>>>>>>> become structurally impossible with KIP-478. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Additionally, the stronger types dramatically improve the > > >>>>>>>>>>>>>>> self-documentation of our Streams internal > > >> implementations, > > >>>>> which > > >>>>>>>>>>>>>>> makes it much easier for new contributors to ramp up with > > >>>>> confidence. > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>>>>> Thanks so much for your consideration! > > >>>>>>>>>>>>>>> -John > > >>>>>>>>>>>>>>> > > >>>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>>> > > >>>>>>>>>> > > >>>>>> > > >>>>> > > >>>>> > > >>>>> -- > > >>>>> -- Guozhang > > >>>> > > >>> > > >>> > > >>> -- > > >>> -- Guozhang > > >>> > > >> > > > > >