Thanks for the details! Also talked to Guozhang about a potential upgrade path. This KIP seems not to put us into an bad position to provide a clean upgrade path if we change the `ProcessorContext` in the future.
Thus, I think we can move forward. -Matthias On 7/24/19 3:32 PM, John Roesler wrote: > Hey again Matthias, > > I think it might help to frame the evaluation of the Context question if we > have a "spitball" proposal for what change we might want to make to the > context. > > Currently, the ProcessorContext is referenced in the following public > interfaces: > > org.apache.kafka.streams.errors.DeserializationExceptionHandler#handle > org.apache.kafka.streams.kstream.Transformer#init > org.apache.kafka.streams.kstream.ValueTransformer#init > org.apache.kafka.streams.kstream.ValueTransformerWithKey#init > org.apache.kafka.streams.processor.Processor#init > org.apache.kafka.streams.processor.StateStore#init > > We can sub-divide the ProcessorContext into broad categories: > General Information: > * a handle on the config > * information about the execution context (what is the task id, the > application id, etc) > Record Information: > * extra information about the current record > Store Support: > * the ability to register state restore callbacks > Processor Support: > * the ability to schedule punctuations > * the ability to get registered state stores > * the ability to schedule punctuations > * the ability to forward records > * the ability to request commits > > We could imagine slicing the Processor Context into four new component > interfaces, and making ProcessorContext just implement them. Then, we could > mix-and-match those new component interfaces for use elsewhere. > > E.g.,: > org.apache.kafka.streams.errors.DeserializationExceptionHandler#handle > * only gets the informational context > > org.apache.kafka.streams.kstream.Transformer#init > org.apache.kafka.streams.kstream.ValueTransformer#init > org.apache.kafka.streams.kstream.ValueTransformerWithKey#init > * information context > * the ability to get registered state stores > Also > * the ability to schedule punctuations > * restricted ability to forward (only obeying the rules of the particular > interface, for example) > Or maybe just: > * no ability to foraward > * the ability to schedule special punctuators that can return one or more > keys or values when fired > > org.apache.kafka.streams.processor.Processor#init > * all the contexts, except the ability to register state restore callbacks > > org.apache.kafka.streams.processor.StateStore#init > * information contexts > * the ability to register state restore callbacks > * maybe punctuations and forwards, could be discussed further > > > The operative question for us right now is whether there is a clean path to > something like this from the current KIP, or whether we'd be forced to > deprecate an interface we are only just now adding. Note that the only > interfaces we're adding right now are : > * org.apache.kafka.streams.processor.api.Processor > * org.apache.kafka.streams.processor.api.ProcessorSupplier > And the only thing we need to make the above spitball proposal compatible > with these proposed interfaces is to deprecate the ability to register > state restore callbacks from the ProcessorContext. > > Otherwise, we would at that time be able to propose new Transformer > interfaces that take (e.g.) TransformerContexts, likewise with > DeserializationExceptionHandler and StateStore. > > In other words, I _think_ that we have a clean migration path to address > the Context problem in follow-on work. But clearly my motivation may be > corrupt. What do you think? > > Thanks, > -John > > > On Wed, Jul 24, 2019 at 5:06 PM John Roesler <j...@confluent.io> wrote: > >> Hey Matthias, >> >> I agree, it's worth double-checking to make sure that the upgrade path >> would be smooth. There's no point in putting ourselves in an awkward jam. >> I'll look into it and report back. >> >> Regarding the global store logic, I confirmed that the "state update >> processor" shouldn't be forwarding anything, so we can safely bound its >> output type to `Void`. I've updated the KIP. >> >> Thanks, >> -John >> >> On Wed, Jul 24, 2019 at 3:08 PM Matthias J. Sax <matth...@confluent.io> >> wrote: >> >>> If we don't fix the `ProcessorContext` now, how would an upgrade path >>> look like? >>> >>> We woudl deprecate existing `init()` and add a new `init()`, and during >>> runtime need to call both? This sound rather error prone to me and might >>> be confusing to users? Hence, it might be beneficial to fix it right now. >>> >>> If my concerns are not valid, and we think that the upgrade path will >>> smooth, we can of course do a follow up KIP. Another possibility would >>> be, to still do an extra KIP but ensure that both KIPs are contained in >>> the same release. >>> >>> WDYT? >>> >>> >>> -Matthias >>> >>> On 7/24/19 11:55 AM, John Roesler wrote: >>>> Hey Matthias, >>>> >>>> Thanks for the review! >>>> >>>> I agree about ProcessorContext, it could certainly be split up to >>> improve >>>> compile-time clues about what is or is not permitted (like, do you just >>>> want to be able to see the extra record context vs. forawrding vs. >>>> registering state stores, as you said). But, similar to the ideas around >>>> transforms, we can hopefully make that a separate design effort outside >>> of >>>> this KIP. Is that ok with you? >>>> >>>> Note that, unlike the current Processor API, KIP-478 proposes to >>> provide a >>>> default no-op implementation of init(), which means we can deprecate it >>>> later and replace it with one taking a cleaner "context" abstraction, as >>>> you proposed. >>>> >>>> It's just that the typing change as proposed is already a very large >>> design >>>> and implementation scope. I fear that adding in new flavors of >>>> ProcessorContext would make is much harder to actually consider the >>> design, >>>> and certainly stretch out the implementation phase as well. >>>> >>>> Regarding the documentation of non-goals, that's very good feedback. >>> I'll >>>> update the KIP. >>>> >>>> Regarding addGlobalStore... I'll look into it. >>>> >>>> Thanks! >>>> -John >>>> >>>> >>>> >>>> On Wed, Jul 24, 2019 at 12:27 PM Matthias J. Sax <matth...@confluent.io >>>> >>>> wrote: >>>> >>>>> I have concerns about the latest proposal from Guozhang. However, as >>>>> John said it's beyond the scope of this KIP and thus, I don't go into >>>>> details. I agree thought, that the current "transformer APIs" are not >>>>> ideal and could be improved. >>>>> >>>>> >>>>> An orthogonal though is that we should split the current >>>>> `ProcessorContext` into multiple interfaces. Atm, the context can be >>> use >>>>> to: >>>>> >>>>> - access metadata >>>>> - schedule punctuation >>>>> - get state stores >>>>> - register state stores >>>>> - forward output data >>>>> >>>>> (1) registering state stores is only required if one implements a >>> custom >>>>> store, but not for a regular `Processor` implementation -- hence, it's >>> a >>>>> leaking abstraction >>>>> >>>>> (2) for `ValueTransformer` and `flatValueTransformer` we don't want to >>>>> allow forwarding key-value pairs, and hence need to throw an RTE for >>>>> this case atm >>>>> >>>>> (3) Why do we expose `keySerde()`, `valueSerde()`, and `stateDir()` >>>>> explicitly? We have already `appConfigs()` to allow users to access the >>>>> configuration. >>>>> >>>>> Overall, it seems that `ProcessorContext` is rather convoluted. >>> Because, >>>>> we add a new `Processor` abstraction, it seems like a good opportunity >>>>> to improve the interface and to not pass `ProcessroContext` into the >>> new >>>>> `Processor#init()` method, but an improved interface. >>>>> >>>>> Thoughts? >>>>> >>>>> >>>>> >>>>> One more nits about the KIP: >>>>> >>>>> I think, we should clearly state, that this change does not provide >>> type >>>>> safety for PAPI users. The following example would compile without any >>>>> errors or warning, even if the types don't match: >>>>> >>>>>> Topology t = new Topology(); >>>>>> t.addSource("s", ...); >>>>>> t.addProcessor("p1", new ProcessorSupplier<KIn, VIn, FooKey, >>>>> BarValue>()..., "s"); >>>>>> t.addProcessor("p2", new ProcessorSupplier<NotFooKey, NotBarValue, >>> KOut, >>>>> VOut>()..., "p1"); >>>>> >>>>> Just want to make sure users understand the impact/scope of the change, >>>>> especially what is _not_ achieved. >>>>> >>>>> >>>>> About `addGlobalStore()` -- should the return types be `Void` similar >>> to >>>>> `KStream#process()`? >>>>> >>>>> >>>>> >>>>> -Matthias >>>>> >>>>> >>>>> On 7/24/19 9:11 AM, Guozhang Wang wrote: >>>>>> Sounds good to me, thanks John! >>>>>> >>>>>> >>>>>> Guozhang >>>>>> >>>>>> On Wed, Jul 24, 2019 at 7:40 AM John Roesler <j...@confluent.io> >>> wrote: >>>>>> >>>>>>> Hey Guozhang, >>>>>>> >>>>>>> Thanks for the thought! It sounds related to what I was thinking in >>>>>>> https://issues.apache.org/jira/browse/KAFKA-8396 , but a little >>>>> "extra"... >>>>>>> >>>>>>> I proposed to eliminate ValueTransformer, but I believe you're >>> right; we >>>>>>> could eliminate Transformer also and just use Processor in the >>>>> transform() >>>>>>> methods. >>>>>>> >>>>>>> To your first bullet, regarding transform/flatTransform... I'd argue >>>>> that >>>>>>> the difference isn't material and if we switch to just using >>>>>>> context.forward instead of returns, then we just need one and people >>> can >>>>>>> call forward as much as they want. It certainly warrants further >>>>>>> discussion, though... >>>>>>> >>>>>>> To the second point, yes, I'm thinking that we can eschew the >>>>>>> ValueTransformer and instead do something like ignore the forwarded >>> key >>>>> or >>>>>>> check the key for serial identity, etc. >>>>>>> >>>>>>> The ultimate advantage of these ideas is that we reduce the number of >>>>>>> interface variants and we also give people just one way to pass >>> values >>>>>>> forward instead of two. >>>>>>> >>>>>>> Of course, it's beyond the scope of this KIP, but this KIP is a >>>>>>> precondition for these further improvements. >>>>>>> >>>>>>> I'm copying your comment onto the ticket for posterity. >>>>>>> >>>>>>> Thanks! >>>>>>> -John >>>>>>> >>>>>>> On Tue, Jul 23, 2019 at 5:38 PM Guozhang Wang <wangg...@gmail.com> >>>>> wrote: >>>>>>> >>>>>>>> Hi John, >>>>>>>> >>>>>>>> Just a wild thought about Transformer: now with the new >>> Processor<KIn, >>>>>>>> KOut, VIn, VOut>#init(ProcessorContext<KOut, VOut>), do we still >>> need a >>>>>>>> Transformer (and even ValueTransformer / ValueTransformerWithKey)? >>>>>>>> >>>>>>>> What if: >>>>>>>> >>>>>>>> * We just make KStream#transform to get a ProcessorSupplier as well, >>>>> and >>>>>>>> inside `process()` we check that at most one `context.forward()` is >>>>>>> called, >>>>>>>> and then take it as the return value. >>>>>>>> * We would still use ValueTransformer for KStream#transformValue, >>> or we >>>>>>> can >>>>>>>> also use a `ProcessorSupplier where we allow at most one >>>>>>>> `context.forward()` AND we ignore whatever passed in as key but just >>>>> use >>>>>>>> the original key. >>>>>>>> >>>>>>>> >>>>>>>> Guozhang >>>>>>>> >>>>>>>> >>>>>>>> On Tue, Jul 16, 2019 at 9:03 AM John Roesler <j...@confluent.io> >>>>> wrote: >>>>>>>> >>>>>>>>> Hi again, all, >>>>>>>>> >>>>>>>>> I have started the voting thread. Please cast your votes (or voice >>>>>>>>> your objections)! The vote will remain open at least 72 hours. >>> Once it >>>>>>>>> closes, I can send the PR pretty quickly. >>>>>>>>> >>>>>>>>> Thanks for all you help ironing out the details on this feature. >>>>>>>>> -John >>>>>>>>> >>>>>>>>> On Mon, Jul 15, 2019 at 5:09 PM John Roesler <j...@confluent.io> >>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> Hey all, >>>>>>>>>> >>>>>>>>>> It sounds like there's general agreement now on this KIP, so I >>>>>>> updated >>>>>>>>>> the KIP to fit in with Guozhang's overall proposed package >>> structure. >>>>>>>>>> Specifically, the proposed name for the new Processor interface is >>>>>>>>>> "org.apache.kafka.streams.processor.api.Processor". >>>>>>>>>> >>>>>>>>>> If there are no objections, then I plan to start the vote >>> tomorrow! >>>>>>>>>> >>>>>>>>>> Thanks, all, for your contributions. >>>>>>>>>> -John >>>>>>>>>> >>>>>>>>>> On Thu, Jul 11, 2019 at 1:50 PM Matthias J. Sax < >>>>>>> matth...@confluent.io >>>>>>>>> >>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Side remark: >>>>>>>>>>> >>>>>>>>>>>> Now that "flat transform" is a specific >>>>>>>>>>>>> part of the API it seems okay to steer folks in that direction >>>>>>> (to >>>>>>>>> never >>>>>>>>>>>>> use context.process in a transformer), but it should be called >>>>>>> out >>>>>>>>>>>>> explicitly in javadocs. Currently Transformer (which is used >>>>>>> for >>>>>>>>> both >>>>>>>>>>>>> transform() and flatTransform() ) doesn't really call out the >>>>>>>>> ambiguity: >>>>>>>>>>> >>>>>>>>>>> Would you want to do a PR for address this? We are always eager >>> to >>>>>>>>>>> improve the JavaDocs! >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -Matthias >>>>>>>>>>> >>>>>>>>>>> On 7/7/19 11:26 AM, Paul Whalen wrote: >>>>>>>>>>>> First of all, +1 on the whole idea, my team has run into >>>>>>>> (admittedly >>>>>>>>> minor, >>>>>>>>>>>> but definitely annoying) issues because of the weaker typing. >>>>>>>> We're >>>>>>>>> heavy >>>>>>>>>>>> users of the PAPI and have Processors that, while not hundreds >>> of >>>>>>>>> lines >>>>>>>>>>>> long, are certainly quite hefty and call context.forward() in >>>>>>> many >>>>>>>>> places. >>>>>>>>>>>> >>>>>>>>>>>> After reading the KIP and discussion a few times, I've convinced >>>>>>>>> myself >>>>>>>>>>>> that any initial concerns I had aren't really concerns at all >>>>>>>> (state >>>>>>>>> store >>>>>>>>>>>> types, for one). One thing I will mention: changing >>>>>>> *Transformer* >>>>>>>>> to have >>>>>>>>>>>> ProcessorContext<Void, Void> gave me pause, because I have code >>>>>>>> that >>>>>>>>> does >>>>>>>>>>>> context.forward in transformers. Now that "flat transform" is a >>>>>>>>> specific >>>>>>>>>>>> part of the API it seems okay to steer folks in that direction >>>>>>> (to >>>>>>>>> never >>>>>>>>>>>> use context.process in a transformer), but it should be called >>>>>>> out >>>>>>>>>>>> explicitly in javadocs. Currently Transformer (which is used >>> for >>>>>>>>> both >>>>>>>>>>>> transform() and flatTransform() ) doesn't really call out the >>>>>>>>> ambiguity: >>>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>> >>> https://github.com/apache/kafka/blob/ca641b3e2e48c14ff308181c775775408f5f35f7/streams/src/main/java/org/apache/kafka/streams/kstream/Transformer.java#L75-L77 >>>>>>>>> , >>>>>>>>>>>> and for migrating users (from before flatTransform) it could be >>>>>>>>> confusing. >>>>>>>>>>>> >>>>>>>>>>>> Side note, I'd like to plug KIP-401 (there is a discussion >>> thread >>>>>>>>> and a >>>>>>>>>>>> voting thread) which also relates to using the PAPI. It seems >>>>>>> like >>>>>>>>> there >>>>>>>>>>>> is some interest and it is in a votable state with the majority >>>>>>> of >>>>>>>>>>>> implementation complete. >>>>>>>>>>>> >>>>>>>>>>>> Paul >>>>>>>>>>>> >>>>>>>>>>>> On Fri, Jun 28, 2019 at 2:02 PM Bill Bejeck <bbej...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Sorry for coming late to the party. >>>>>>>>>>>>> >>>>>>>>>>>>> As for the naming I'm in favor of RecordProcessor as well. >>>>>>>>>>>>> >>>>>>>>>>>>> I agree that we should not take on doing all of the package >>>>>>>>> movements as >>>>>>>>>>>>> part of this KIP, especially as John has pointed out, it will >>> be >>>>>>>> an >>>>>>>>>>>>> opportunity to discuss some clean-up on individual classes >>>>>>> which I >>>>>>>>> envision >>>>>>>>>>>>> becoming another somewhat involved process. >>>>>>>>>>>>> >>>>>>>>>>>>> For the end goal, if possible, here's what I propose. >>>>>>>>>>>>> >>>>>>>>>>>>> 1. We keep the scope of the KIP the same, *but we only >>>>>>>>> implement* *it in >>>>>>>>>>>>> phases* >>>>>>>>>>>>> 2. Phase one could include what Guozhang had proposed >>> earlier >>>>>>>>> namely >>>>>>>>>>>>> 1. > 1.a) modifying ProcessorContext only with the output >>>>>>> types >>>>>>>>> on >>>>>>>>>>>>> forward. >>>>>>>>>>>>> > 1.b) modifying Transformer signature to have generics >>> of >>>>>>>>>>>>> ProcessorContext, >>>>>>>>>>>>> > and then lift the restricting of not using punctuate: >>> if >>>>>>>>> user did >>>>>>>>>>>>> not >>>>>>>>>>>>> > follow the enforced typing and just code without >>>>>>> generics, >>>>>>>>> they >>>>>>>>>>>>> will get >>>>>>>>>>>>> > warning at compile time and get run-time error if they >>>>>>>>> forward >>>>>>>>>>>>> wrong-typed >>>>>>>>>>>>> > records, which I think would be acceptable. >>>>>>>>>>>>> 3. Then we could tackle other pieces in an incremental >>> manner >>>>>>>> as >>>>>>>>> we see >>>>>>>>>>>>> what makes sense >>>>>>>>>>>>> >>>>>>>>>>>>> Just my 2cents >>>>>>>>>>>>> >>>>>>>>>>>>> -Bill >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Jun 24, 2019 at 10:22 PM Guozhang Wang < >>>>>>>> wangg...@gmail.com> >>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Yeah I think we should not do all the repackaging as part of >>>>>>> this >>>>>>>>> KIP as >>>>>>>>>>>>>> well (we can just do the movement of the Processor / >>>>>>>>> ProcessorSupplier), >>>>>>>>>>>>>> but I think we need to discuss the end goal here since >>>>>>> otherwise >>>>>>>>> we may >>>>>>>>>>>>> do >>>>>>>>>>>>>> the repackaging of Processor in this KIP, but only later on >>>>>>>>> realizing >>>>>>>>>>>>> that >>>>>>>>>>>>>> other re-packagings are not our favorite solutions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Guozhang >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Jun 24, 2019 at 3:06 PM John Roesler < >>>>>>> j...@confluent.io> >>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hey Guozhang, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks for the idea! I'm wondering if we could take a middle >>>>>>>>> ground >>>>>>>>>>>>>>> and take your proposed layout as a "roadmap", while only >>>>>>>> actually >>>>>>>>>>>>>>> moving the classes that are already involved in this KIP. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> The reason I ask is not just to control the scope of this >>> KIP, >>>>>>>> but >>>>>>>>>>>>>>> also, I think that if we move other classes to new packages, >>>>>>> we >>>>>>>>> might >>>>>>>>>>>>>>> also want to take the opportunity to clean up other things >>>>>>> about >>>>>>>>> them. >>>>>>>>>>>>>>> But each one of those would become a discussion point of its >>>>>>>> own, >>>>>>>>> so >>>>>>>>>>>>>>> it seems the discussion would become intractable. FWIW, I do >>>>>>>> like >>>>>>>>> your >>>>>>>>>>>>>>> idea for precisely this reason, it creates opportunities for >>>>>>> us >>>>>>>> to >>>>>>>>>>>>>>> consider other changes that we are simply not able to make >>>>>>>> without >>>>>>>>>>>>>>> breaking source compatibility. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> If the others feel "kind of favorable" with this overall >>>>>>> vision, >>>>>>>>> maybe >>>>>>>>>>>>>>> we can make one or more Jira tickets to capture it, and then >>>>>>>> just >>>>>>>>>>>>>>> alter _this_ proposal to `processor.api.Processor` (etc). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> WDYT? >>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Sun, Jun 23, 2019 at 7:17 PM Guozhang Wang < >>>>>>>> wangg...@gmail.com >>>>>>>>>> >>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Hello John, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Thanks for your detailed explanation, I've done some quick >>>>>>>>> checks on >>>>>>>>>>>>>> some >>>>>>>>>>>>>>>> existing examples that heavily used Processor and the >>> results >>>>>>>>> also >>>>>>>>>>>>>> makes >>>>>>>>>>>>>>> me >>>>>>>>>>>>>>>> worried about my previous statements that "the breakage >>> would >>>>>>>>> not be >>>>>>>>>>>>>>> big". >>>>>>>>>>>>>>>> I agree we should maintain compatibility. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> About the naming itself, I'm actually a bit inclined into >>>>>>>>>>>>> sub-packages >>>>>>>>>>>>>>> than >>>>>>>>>>>>>>>> renamed new classes, and my motivations are that our current >>>>>>>>>>>>> packaging >>>>>>>>>>>>>> is >>>>>>>>>>>>>>>> already quite coarsen grained and sometimes ill-placed, and >>>>>>>> hence >>>>>>>>>>>>> maybe >>>>>>>>>>>>>>> we >>>>>>>>>>>>>>>> can take this change along with some clean up on packages >>>>>>> (but >>>>>>>>> again, >>>>>>>>>>>>>> we >>>>>>>>>>>>>>>> should follow the deprecate - removal path). What I'm >>>>>>> thinking >>>>>>>>> is: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ------------------- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> processor/: >>>>>>>>> StateRestoreCallback/AbstractNotifyingRestoreCallback, >>>>>>>>>>>>>>> (deprecated >>>>>>>>>>>>>>>> later, same meaning for other cross-throughs), >>>>>>> ProcessContest, >>>>>>>>>>>>>>>> RecordContext, Punctuator, PunctuationType, To, Cancellable >>>>>>>> (are >>>>>>>>> the >>>>>>>>>>>>>> only >>>>>>>>>>>>>>>> things left) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (new) processor/api/: Processor, ProcessorSupplier (and of >>>>>>>>> course, >>>>>>>>>>>>>> these >>>>>>>>>>>>>>>> two classes can be strong typed) >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> state/: StateStore, BatchingStateRestoreCallback, >>>>>>>>>>>>>>>> AbstractNotifyingBatchingRestoreCallback (moved from >>>>>>>> processor/), >>>>>>>>>>>>>>>> PartitionGrouper, WindowStoreIterator, StateSerdes (this one >>>>>>>> can >>>>>>>>> be >>>>>>>>>>>>>> moved >>>>>>>>>>>>>>>> into state/internals), TimestampedByteStore (we can move >>> this >>>>>>>> to >>>>>>>>>>>>>>> internals >>>>>>>>>>>>>>>> since store types would use vat by default, see below), >>>>>>>>>>>>>> ValueAndTimestamp >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (new) state/factory/: Stores, StoreBuilder, StoreSupplier; >>>>>>>> *BUT* >>>>>>>>> the >>>>>>>>>>>>>> new >>>>>>>>>>>>>>>> Stores would not have timestampedXXBuilder APIs since the >>>>>>>> default >>>>>>>>>>>>>>>> StoreSupplier / StoreBuilder value types are >>>>>>> ValueAndTimestamp >>>>>>>>>>>>> already. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (new) state/queryable/: QueryableStoreType, >>>>>>>> QueryableStoreTypes, >>>>>>>>>>>>>> HostInfo >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (new) state/keyValue/: KeyValueXXX classes, and also the >>> same >>>>>>>> for >>>>>>>>>>>>>>>> state/sessionWindow and state/timeWindow; *BUT* here we use >>>>>>>>>>>>>>>> ValueAndTimestamp as value types of those APIs directly, and >>>>>>>> also >>>>>>>>>>>>>>>> TimestampedKeyValue/WindowStore would be deprecated. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (new) kstream/api/: KStream, KTable, GroupedKStream (renamed >>>>>>>> from >>>>>>>>>>>>>>>> KGroupedStream), GroupedKTable (renamed from KGroupedTable), >>>>>>>>>>>>>>>> TimeWindowedKStream, SessionWindowedKStream, GlobalKTable >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (new) kstream/operator/: Aggregator, ForeachFunction, ... , >>>>>>>>> Merger >>>>>>>>>>>>> and >>>>>>>>>>>>>>>> Grouped, Joined, Materialized, ... , Printed and >>> Transformer, >>>>>>>>>>>>>>>> TransformerSupplier. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (new) kstream/window/: Window, Windows, Windowed, >>>>>>> TimeWindows, >>>>>>>>>>>>>>>> SessionWindows, UnlimitedWindows, JoinWindows, >>>>>>> WindowedSerdes, >>>>>>>>>>>>>>>> Time/SessionWindowedSerialized/Deserializer. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (new) configure/: RocksDBConfigSetter, TopicNameExtractor, >>>>>>>>>>>>>>>> TimestampExtractor, UsePreviousTimeOnInvalidTimestamp, >>>>>>>>>>>>>>>> WallclockTimestampExtractor, ExtractRecordMetadataTimestamp, >>>>>>>>>>>>>>>> FailOnInvalidTimestamp, LogAndSkipOnInvalidTimestamp, >>>>>>>>>>>>>>> StateRestoreListener, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> (new) metadata/: StreamsMetadata, ThreadMetadata, >>>>>>> TaskMetadata, >>>>>>>>>>>>> TaskId >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Still, any xxx/internals packages are declared as inner >>>>>>>> classes, >>>>>>>>> but >>>>>>>>>>>>>>> other >>>>>>>>>>>>>>>> xxx/yyy packages are declared as public APIs. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> ------------------- >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This is a very wild thought and I can totally understand if >>>>>>>>> people >>>>>>>>>>>>> feel >>>>>>>>>>>>>>>> this is too much since it definitely enlarges the scope of >>>>>>> this >>>>>>>>> KIP a >>>>>>>>>>>>>> lot >>>>>>>>>>>>>>>> :) just trying to play a devil's advocate here to do major >>>>>>>>>>>>> refactoring >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> avoid renaming Processor classes. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Guozhang >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 9:51 PM Matthias J. Sax < >>>>>>>>>>>>> matth...@confluent.io >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think `RecordProcessor` is a good name. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 6/21/19 5:09 PM, John Roesler wrote: >>>>>>>>>>>>>>>>>> After kicking the naming around a bit more, it seems like >>>>>>> any >>>>>>>>>>>>>> package >>>>>>>>>>>>>>>>>> name change is a bit "weird" because it fragments the >>>>>>> package >>>>>>>>> and >>>>>>>>>>>>>>>>>> directory structure. If we can come up with a reasonable >>>>>>> name >>>>>>>>> for >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>> interface after all, it seems like the better choice. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The real challenge is that the existing name "Processor" >>>>>>>> seems >>>>>>>>>>>>> just >>>>>>>>>>>>>>>>>> about perfect. In picking a new name, we need to consider >>>>>>> the >>>>>>>>>>>>>>> ultimate >>>>>>>>>>>>>>>>>> state, after the deprecation period, when we entirely >>>>>>> remove >>>>>>>>>>>>>>>>>> Processor. In this context, TypedProcessor seems a little >>>>>>> odd >>>>>>>>> to >>>>>>>>>>>>>> me, >>>>>>>>>>>>>>>>>> because it seems to imply that there should also be an >>>>>>>> "untyped >>>>>>>>>>>>>>>>>> processor". >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> After kicking around a few other ideas, what does everyone >>>>>>>>> think >>>>>>>>>>>>>>> about >>>>>>>>>>>>>>>>>> "RecordProcessor"? I _think_ maybe it stands on its own >>>>>>> just >>>>>>>>>>>>> fine, >>>>>>>>>>>>>>>>>> because it's a thing that processes... records? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> If others agree with this, I can change the proposal to >>>>>>>>>>>>>>> RecordProcessor. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Fri, Jun 21, 2019 at 6:42 PM John Roesler < >>>>>>>>> j...@confluent.io> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I've updated the KIP with the feedback so far. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The naming question is still the biggest (only?) >>>>>>> outstanding >>>>>>>>>>>>>> issue. >>>>>>>>>>>>>>> It >>>>>>>>>>>>>>>>>>> would be good to hear some more thoughts on it. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> As we stand now, there's one vote for changing the >>> package >>>>>>>>> name >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> something like 'typedprocessor', one for changing the >>>>>>>>> interface >>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>> TypedProcessor (as in the PoC), and one for just changing >>>>>>>> the >>>>>>>>>>>>>>>>>>> Processor interface in-place, breaking source >>>>>>> compatibility. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> How can we resolve this decision? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 5:44 PM John Roesler < >>>>>>>>> j...@confluent.io >>>>>>>>>>>>>> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks for the feedback, Guozhang and Matthias, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regarding motivation: I'll update the wiki. Briefly: >>>>>>>>>>>>>>>>>>>> * Any processor can benefit. Imagine a pure user of the >>>>>>>>>>>>>>> ProcessorAPI >>>>>>>>>>>>>>>>>>>> who has very complex processing logic. I have seen >>>>>>> several >>>>>>>>>>>>>>> processor >>>>>>>>>>>>>>>>>>>> implementation that are hundreds of lines long and call >>>>>>>>>>>>>>>>>>>> `context.forward` in many different locations and >>>>>>> branches. >>>>>>>>> In >>>>>>>>>>>>>>> such an >>>>>>>>>>>>>>>>>>>> implementation, it would be very easy to have a bug in a >>>>>>>>> rarely >>>>>>>>>>>>>>> used >>>>>>>>>>>>>>>>>>>> branch that forwards the wrong kind of value. This would >>>>>>>>>>>>>>> structurally >>>>>>>>>>>>>>>>>>>> prevent that from happening. >>>>>>>>>>>>>>>>>>>> * Also, anyone who heavily uses the ProcessorAPI would >>>>>>>> likely >>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>> developed helper methods to wire together processors, >>>>>>> just >>>>>>>> as >>>>>>>>>>>>> we >>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>> in the DSL implementation. This change would enable them >>>>>>> to >>>>>>>>>>>>>> ensure >>>>>>>>>>>>>>> at >>>>>>>>>>>>>>>>>>>> compile time that they are actually wiring together >>>>>>>>> compatible >>>>>>>>>>>>>>> types. >>>>>>>>>>>>>>>>>>>> This was actually _my_ original motivation, since I >>> found >>>>>>>> it >>>>>>>>>>>>> very >>>>>>>>>>>>>>>>>>>> difficult and time consuming to follow the Streams DSL >>>>>>>>> internal >>>>>>>>>>>>>>>>>>>> builders. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regarding breaking the source compatibility of >>>>>>> Processor: I >>>>>>>>>>>>> would >>>>>>>>>>>>>>>>>>>> _love_ to side-step the naming problem, but I really >>>>>>> don't >>>>>>>>> know >>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>>> it's excusable to break compatibility. I suspect that >>> our >>>>>>>>>>>>> oldest >>>>>>>>>>>>>>> and >>>>>>>>>>>>>>>>>>>> dearest friends are using the ProcessorAPI in some form >>>>>>> or >>>>>>>>>>>>>> another, >>>>>>>>>>>>>>>>>>>> and all their source code would break. It sucks to have >>>>>>> to >>>>>>>>>>>>>> create a >>>>>>>>>>>>>>>>>>>> whole new interface to get around this, but it feels >>> like >>>>>>>> the >>>>>>>>>>>>>> right >>>>>>>>>>>>>>>>>>>> thing to do. Would be nice to get even more feedback on >>>>>>>> this >>>>>>>>>>>>>> point, >>>>>>>>>>>>>>>>>>>> though. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regarding the types of stores, as I said in my response >>>>>>> to >>>>>>>>>>>>>> Sophie, >>>>>>>>>>>>>>>>>>>> it's not an issue. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regarding the change to StreamsBuilder, it doesn't pin >>>>>>> the >>>>>>>>>>>>> types >>>>>>>>>>>>>> in >>>>>>>>>>>>>>>>>>>> any way, since all the types are bounded by Object only, >>>>>>>> and >>>>>>>>>>>>>> there >>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>>>>>> no extra constraints between arguments (each type is >>> used >>>>>>>>> only >>>>>>>>>>>>>>> once in >>>>>>>>>>>>>>>>>>>> one argument). But maybe I missed the point you were >>>>>>> asking >>>>>>>>>>>>>> about. >>>>>>>>>>>>>>>>>>>> Since the type takes generic paramters, we should allow >>>>>>>> users >>>>>>>>>>>>> to >>>>>>>>>>>>>>> pass >>>>>>>>>>>>>>>>>>>> in parameterized arguments. Otherwise, they would _have >>>>>>> to_ >>>>>>>>>>>>> give >>>>>>>>>>>>>>> us a >>>>>>>>>>>>>>>>>>>> raw type, and they would be forced to get a "rawtyes" >>>>>>>> warning >>>>>>>>>>>>>> from >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> compiler. So, it's our obligation in any API that >>>>>>> accepts a >>>>>>>>>>>>>>>>>>>> parameterized-type parameter to allow people to actually >>>>>>>>> pass a >>>>>>>>>>>>>>>>>>>> parameterized type, even if we don't actually use the >>>>>>>>>>>>> parameters. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The naming question is a complex one, as I took pains to >>>>>>>>> detail >>>>>>>>>>>>>>>>>>>> previously. Please don't just pick out one minor point, >>>>>>>> call >>>>>>>>> it >>>>>>>>>>>>>>> weak, >>>>>>>>>>>>>>>>>>>> and then claim that it invalidates the whole decision. I >>>>>>>>> don't >>>>>>>>>>>>>>> think >>>>>>>>>>>>>>>>>>>> there's a clear best choice, so I'm more than happy for >>>>>>>>> someone >>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>> advocate for renaming the class instead of the package. >>>>>>> Can >>>>>>>>> you >>>>>>>>>>>>>>>>>>>> provide some reasons why you think that would be better? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regarding the deprecated methods, you're absolutely >>>>>>> right. >>>>>>>>> I'll >>>>>>>>>>>>>>>> update the KIP. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks again for all the feedback! >>>>>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Thu, Jun 20, 2019 at 4:34 PM Matthias J. Sax < >>>>>>>>>>>>>>> matth...@confluent.io> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Just want to second what Sophie said about the stores. >>>>>>> The >>>>>>>>>>>>> type >>>>>>>>>>>>>>> of a >>>>>>>>>>>>>>>>>>>>> used stores is completely independent of input/output >>>>>>>> types. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> This related to change `addGlobalStore()` method. Why >>> do >>>>>>>> you >>>>>>>>>>>>>> want >>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>> pin >>>>>>>>>>>>>>>>>>>>> the types? In fact, people request the ability to >>>>>>> filter() >>>>>>>>> and >>>>>>>>>>>>>>> maybe >>>>>>>>>>>>>>>>>>>>> even map() the data before they are put into the global >>>>>>>>> store. >>>>>>>>>>>>>>>> Limiting >>>>>>>>>>>>>>>>>>>>> the types seems to be a step backward here? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Also, the pack name is questionable. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> This wouldn't be the first project to do something >>> like >>>>>>>>>>>>> this... >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Not a strong argument. I would actually propose to not >>>>>>> a a >>>>>>>>> new >>>>>>>>>>>>>>>> package, >>>>>>>>>>>>>>>>>>>>> but just a new class `TypedProcessor`. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> For `ProcessorContext#forward` methods -- some of those >>>>>>>>>>>>> methods >>>>>>>>>>>>>>> are >>>>>>>>>>>>>>>>>>>>> already deprecated. While the will still be affected, >>> it >>>>>>>>> would >>>>>>>>>>>>>> be >>>>>>>>>>>>>>>> worth >>>>>>>>>>>>>>>>>>>>> to mark them as deprecated in the wiki page, too. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> @Guozhang: I dont' think we should break source >>>>>>>>> compatibility >>>>>>>>>>>>>> in a >>>>>>>>>>>>>>>> minor >>>>>>>>>>>>>>>>>>>>> release. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> -Matthias >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 6/20/19 1:43 PM, Guozhang Wang wrote: >>>>>>>>>>>>>>>>>>>>>> Hi John, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Thanks for KIP! I've a few comments below: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 1. So far the "Motivation" section is very general, >>> and >>>>>>>> the >>>>>>>>>>>>>> only >>>>>>>>>>>>>>>> concrete >>>>>>>>>>>>>>>>>>>>>> example that I have in mind is >>>>>>>> `TransformValues#punctuate`. >>>>>>>>>>>>> Do >>>>>>>>>>>>>> we >>>>>>>>>>>>>>>> have any >>>>>>>>>>>>>>>>>>>>>> other concrete issues that drive this KIP? If not then >>>>>>> I >>>>>>>>> feel >>>>>>>>>>>>>>>> better to >>>>>>>>>>>>>>>>>>>>>> narrow the scope of this KIP to: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 1.a) modifying ProcessorContext only with the output >>>>>>>> types >>>>>>>>> on >>>>>>>>>>>>>>>> forward. >>>>>>>>>>>>>>>>>>>>>> 1.b) modifying Transformer signature to have generics >>>>>>> of >>>>>>>>>>>>>>>> ProcessorContext, >>>>>>>>>>>>>>>>>>>>>> and then lift the restricting of not using punctuate: >>>>>>> if >>>>>>>>> user >>>>>>>>>>>>>> did >>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>>>> follow the enforced typing and just code without >>>>>>>> generics, >>>>>>>>>>>>> they >>>>>>>>>>>>>>>> will get >>>>>>>>>>>>>>>>>>>>>> warning at compile time and get run-time error if they >>>>>>>>>>>>> forward >>>>>>>>>>>>>>>> wrong-typed >>>>>>>>>>>>>>>>>>>>>> records, which I think would be acceptable. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I feel this would be a good solution for this specific >>>>>>>>> issue; >>>>>>>>>>>>>>>> again, feel >>>>>>>>>>>>>>>>>>>>>> free to update the wiki page with other known issues >>>>>>> that >>>>>>>>>>>>>> cannot >>>>>>>>>>>>>>> be >>>>>>>>>>>>>>>>>>>>>> resolved. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 2. If, we want to go with the current scope then my >>>>>>> next >>>>>>>>>>>>>> question >>>>>>>>>>>>>>>> would be, >>>>>>>>>>>>>>>>>>>>>> how much breakage we would introducing if we just >>>>>>> modify >>>>>>>>> the >>>>>>>>>>>>>>>> Processor >>>>>>>>>>>>>>>>>>>>>> signature directly? My feeling is that DSL users would >>>>>>> be >>>>>>>>>>>>> most >>>>>>>>>>>>>>>> likely not >>>>>>>>>>>>>>>>>>>>>> affected and PAPI users only need to modify a few >>> lines >>>>>>>> on >>>>>>>>>>>>>> class >>>>>>>>>>>>>>>>>>>>>> declaration. I feel it worth doing some research on >>>>>>> this >>>>>>>>> part >>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> then >>>>>>>>>>>>>>>>>>>>>> decide if we really want to bite the bullet of >>>>>>> duplicated >>>>>>>>>>>>>>> Processor >>>>>>>>>>>>>>>> / >>>>>>>>>>>>>>>>>>>>>> ProcessorSupplier classes for maintaining >>>>>>> compatibility. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Guozhang >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> On Wed, Jun 19, 2019 at 12:21 PM John Roesler < >>>>>>>>>>>>>> j...@confluent.io >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> In response to the feedback so far, I changed the >>>>>>>> package >>>>>>>>>>>>> name >>>>>>>>>>>>>>> from >>>>>>>>>>>>>>>>>>>>>>> `processor2` to `processor.generic`. >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 4:49 PM John Roesler < >>>>>>>>>>>>>> j...@confluent.io >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> Thanks for the feedback, Sophie! >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I actually felt a little uneasy when I wrote that >>>>>>>> remark, >>>>>>>>>>>>>>> because >>>>>>>>>>>>>>>> it's >>>>>>>>>>>>>>>>>>>>>>>> not restricted at all in the API, it's just >>> available >>>>>>>> to >>>>>>>>>>>>> you >>>>>>>>>>>>>> if >>>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>>>>>>>>>> choose to give your stores and context the same >>>>>>>>> parameters. >>>>>>>>>>>>>>> So, I >>>>>>>>>>>>>>>>>>>>>>>> think your use case is valid, and also perfectly >>>>>>>>>>>>> permissable >>>>>>>>>>>>>>>> under the >>>>>>>>>>>>>>>>>>>>>>>> current KIP. Sorry for sowing confusion on my own >>>>>>>>>>>>> discussion >>>>>>>>>>>>>>>> thread! >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I'm not crazy about the package name, either. I went >>>>>>>> with >>>>>>>>>>>>> it >>>>>>>>>>>>>>> only >>>>>>>>>>>>>>>>>>>>>>>> because there's seemingly nothing special about the >>>>>>> new >>>>>>>>>>>>>> package >>>>>>>>>>>>>>>> except >>>>>>>>>>>>>>>>>>>>>>>> that it can't have the same name as the old one. >>>>>>>>> Otherwise, >>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>> existing "processor" and "Processor" names for the >>>>>>>>> package >>>>>>>>>>>>>> and >>>>>>>>>>>>>>>> class >>>>>>>>>>>>>>>>>>>>>>>> are perfectly satisfying. Rather than pile on >>>>>>>> additional >>>>>>>>>>>>>>>> semantics, it >>>>>>>>>>>>>>>>>>>>>>>> seemed cleaner to just add a number to the package >>>>>>>> name. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> This wouldn't be the first project to do something >>>>>>> like >>>>>>>>>>>>>> this... >>>>>>>>>>>>>>>> Apache >>>>>>>>>>>>>>>>>>>>>>>> Commons, for example, has added a "2" to the end of >>>>>>>> some >>>>>>>>> of >>>>>>>>>>>>>>> their >>>>>>>>>>>>>>>>>>>>>>>> packages for exactly the same reason. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> I'm open to any suggestions. For example, we could >>> do >>>>>>>>>>>>>> something >>>>>>>>>>>>>>>> like >>>>>>>>>>>>>>>>>>>>>>>> org.apache.kafka.streams.typedprocessor.Processor or >>>>>>>>>>>>>>>>>>>>>>>> org.apache.kafka.streams.processor.typed.Processor , >>>>>>>>> which >>>>>>>>>>>>>>> would >>>>>>>>>>>>>>>> have >>>>>>>>>>>>>>>>>>>>>>>> just about the same effect. One microscopic thought >>>>>>> is >>>>>>>>>>>>> that, >>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>>>>>>> there's another interface in the "processor" package >>>>>>>> that >>>>>>>>>>>>> we >>>>>>>>>>>>>>> wish >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>> do the same thing to, would _could_ pile it in to >>>>>>>>>>>>>> "processor2", >>>>>>>>>>>>>>>> but we >>>>>>>>>>>>>>>>>>>>>>>> couldn't do the same if we use a package that has >>>>>>>> "typed" >>>>>>>>>>>>> in >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>> name, >>>>>>>>>>>>>>>>>>>>>>>> unless that change is _also_ related to types in >>> some >>>>>>>>> way. >>>>>>>>>>>>>> But >>>>>>>>>>>>>>>> this >>>>>>>>>>>>>>>>>>>>>>>> seems like a very minor concern. >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> What's your preference? >>>>>>>>>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 3:56 PM Sophie Blee-Goldman >>> < >>>>>>>>>>>>>>>> sop...@confluent.io> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> Hey John, thanks for writing this up! I like the >>>>>>>>> proposal >>>>>>>>>>>>>> but >>>>>>>>>>>>>>>> there's >>>>>>>>>>>>>>>>>>>>>>> one >>>>>>>>>>>>>>>>>>>>>>>>> point that I think may be too restrictive: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> "A processor that happens to use a typed store is >>>>>>>>> actually >>>>>>>>>>>>>>>> emitting the >>>>>>>>>>>>>>>>>>>>>>>>> same types that it is storing." >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I can imagine someone could want to leverage this >>>>>>> new >>>>>>>>> type >>>>>>>>>>>>>>> safety >>>>>>>>>>>>>>>>>>>>>>> without >>>>>>>>>>>>>>>>>>>>>>>>> also limiting how they can interact with/use their >>>>>>>>> store. >>>>>>>>>>>>> As >>>>>>>>>>>>>>> an >>>>>>>>>>>>>>>>>>>>>>> (admittedly >>>>>>>>>>>>>>>>>>>>>>>>> contrived) example, say you have an input stream of >>>>>>>>>>>>>> purchases >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> a >>>>>>>>>>>>>>>>>>>>>>> certain >>>>>>>>>>>>>>>>>>>>>>>>> type (entertainment, food, etc), and on seeing a >>> new >>>>>>>>>>>>> record >>>>>>>>>>>>>>> you >>>>>>>>>>>>>>>> want to >>>>>>>>>>>>>>>>>>>>>>>>> output how many types of purchase a shopper has >>> made >>>>>>>>> more >>>>>>>>>>>>>>> than 5 >>>>>>>>>>>>>>>>>>>>>>> purchases >>>>>>>>>>>>>>>>>>>>>>>>> of in the last month. Your state store will >>> probably >>>>>>>> be >>>>>>>>>>>>>>> holding >>>>>>>>>>>>>>>> some >>>>>>>>>>>>>>>>>>>>>>> more >>>>>>>>>>>>>>>>>>>>>>>>> complicated PurchaseHistory object (keyed by user), >>>>>>>> but >>>>>>>>>>>>> your >>>>>>>>>>>>>>>> output is >>>>>>>>>>>>>>>>>>>>>>> just >>>>>>>>>>>>>>>>>>>>>>>>> a <User, Long> >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> I'm also not crazy about "processor2" as the >>> package >>>>>>>>> name >>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>> not sure >>>>>>>>>>>>>>>>>>>>>>> what >>>>>>>>>>>>>>>>>>>>>>>>> a better one would be though (something with >>>>>>> "typed"?) >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>> On Mon, Jun 17, 2019 at 12:47 PM John Roesler < >>>>>>>>>>>>>>> j...@confluent.io> >>>>>>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> I'd like to propose KIP-478 ( >>>>>>>>>>>>>>>>>>>>>>> https://cwiki.apache.org/confluence/x/2SkLBw >>>>>>>>>>>>>>>>>>>>>>>>>> ). >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> This proposal would add output type bounds to the >>>>>>>>>>>>> Processor >>>>>>>>>>>>>>>> interface >>>>>>>>>>>>>>>>>>>>>>>>>> in Kafka Streams, which enables static checking of >>>>>>> a >>>>>>>>>>>>> number >>>>>>>>>>>>>>> of >>>>>>>>>>>>>>>> useful >>>>>>>>>>>>>>>>>>>>>>>>>> properties: >>>>>>>>>>>>>>>>>>>>>>>>>> * A processor B that consumes the output of >>>>>>>> processor A >>>>>>>>>>>>> is >>>>>>>>>>>>>>>> actually >>>>>>>>>>>>>>>>>>>>>>>>>> expecting the same types that processor A >>> produces. >>>>>>>>>>>>>>>>>>>>>>>>>> * A processor that happens to use a typed store is >>>>>>>>>>>>> actually >>>>>>>>>>>>>>>> emitting >>>>>>>>>>>>>>>>>>>>>>>>>> the same types that it is storing. >>>>>>>>>>>>>>>>>>>>>>>>>> * A processor is simply forwarding the expected >>>>>>> types >>>>>>>>> in >>>>>>>>>>>>>> all >>>>>>>>>>>>>>>> code >>>>>>>>>>>>>>>>>>>>>>> paths. >>>>>>>>>>>>>>>>>>>>>>>>>> * Processors added via the Streams DSL, which are >>>>>>> not >>>>>>>>>>>>>>> permitted >>>>>>>>>>>>>>>> to >>>>>>>>>>>>>>>>>>>>>>>>>> forward results at all are statically prevented >>>>>>> from >>>>>>>>>>>>> doing >>>>>>>>>>>>>> so >>>>>>>>>>>>>>>> by the >>>>>>>>>>>>>>>>>>>>>>>>>> compiler >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Internally, we can use the above properties to >>>>>>>> achieve >>>>>>>>> a >>>>>>>>>>>>>> much >>>>>>>>>>>>>>>> higher >>>>>>>>>>>>>>>>>>>>>>>>>> level of confidence in the Streams DSL >>>>>>>> implementation's >>>>>>>>>>>>>>>> correctness. >>>>>>>>>>>>>>>>>>>>>>>>>> Actually, while doing the POC, I found a few bugs >>>>>>> and >>>>>>>>>>>>>>> mistakes, >>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>> become structurally impossible with KIP-478. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Additionally, the stronger types dramatically >>>>>>> improve >>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>>>>>> self-documentation of our Streams internal >>>>>>>>>>>>> implementations, >>>>>>>>>>>>>>>> which >>>>>>>>>>>>>>>>>>>>>>>>>> makes it much easier for new contributors to ramp >>>>>>> up >>>>>>>>> with >>>>>>>>>>>>>>>> confidence. >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>> Thanks so much for your consideration! >>>>>>>>>>>>>>>>>>>>>>>>>> -John >>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>> -- Guozhang >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> -- >>>>>>>>>>>>>> -- Guozhang >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> -- Guozhang >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >
signature.asc
Description: OpenPGP digital signature