After kicking the naming around a bit more, it seems like any package name change is a bit "weird" because it fragments the package and directory structure. If we can come up with a reasonable name for the interface after all, it seems like the better choice.
The real challenge is that the existing name "Processor" seems just about perfect. In picking a new name, we need to consider the ultimate state, after the deprecation period, when we entirely remove Processor. In this context, TypedProcessor seems a little odd to me, because it seems to imply that there should also be an "untyped processor". After kicking around a few other ideas, what does everyone think about "RecordProcessor"? I _think_ maybe it stands on its own just fine, because it's a thing that processes... records? If others agree with this, I can change the proposal to RecordProcessor. Thanks, -John On Fri, Jun 21, 2019 at 6:42 PM John Roesler <j...@confluent.io> wrote: > > Hi all, > > I've updated the KIP with the feedback so far. > > The naming question is still the biggest (only?) outstanding issue. It > would be good to hear some more thoughts on it. > > As we stand now, there's one vote for changing the package name to > something like 'typedprocessor', one for changing the interface to > TypedProcessor (as in the PoC), and one for just changing the > Processor interface in-place, breaking source compatibility. > > How can we resolve this decision? > > Thanks, > -John > > On Thu, Jun 20, 2019 at 5:44 PM John Roesler <j...@confluent.io> wrote: > > > > Thanks for the feedback, Guozhang and Matthias, > > > > Regarding motivation: I'll update the wiki. Briefly: > > * Any processor can benefit. Imagine a pure user of the ProcessorAPI > > who has very complex processing logic. I have seen several processor > > implementation that are hundreds of lines long and call > > `context.forward` in many different locations and branches. In such an > > implementation, it would be very easy to have a bug in a rarely used > > branch that forwards the wrong kind of value. This would structurally > > prevent that from happening. > > * Also, anyone who heavily uses the ProcessorAPI would likely have > > developed helper methods to wire together processors, just as we have > > in the DSL implementation. This change would enable them to ensure at > > compile time that they are actually wiring together compatible types. > > This was actually _my_ original motivation, since I found it very > > difficult and time consuming to follow the Streams DSL internal > > builders. > > > > Regarding breaking the source compatibility of Processor: I would > > _love_ to side-step the naming problem, but I really don't know if > > it's excusable to break compatibility. I suspect that our oldest and > > dearest friends are using the ProcessorAPI in some form or another, > > and all their source code would break. It sucks to have to create a > > whole new interface to get around this, but it feels like the right > > thing to do. Would be nice to get even more feedback on this point, > > though. > > > > Regarding the types of stores, as I said in my response to Sophie, > > it's not an issue. > > > > Regarding the change to StreamsBuilder, it doesn't pin the types in > > any way, since all the types are bounded by Object only, and there are > > no extra constraints between arguments (each type is used only once in > > one argument). But maybe I missed the point you were asking about. > > Since the type takes generic paramters, we should allow users to pass > > in parameterized arguments. Otherwise, they would _have to_ give us a > > raw type, and they would be forced to get a "rawtyes" warning from the > > compiler. So, it's our obligation in any API that accepts a > > parameterized-type parameter to allow people to actually pass a > > parameterized type, even if we don't actually use the parameters. > > > > The naming question is a complex one, as I took pains to detail > > previously. Please don't just pick out one minor point, call it weak, > > and then claim that it invalidates the whole decision. I don't think > > there's a clear best choice, so I'm more than happy for someone to > > advocate for renaming the class instead of the package. Can you > > provide some reasons why you think that would be better? > > > > Regarding the deprecated methods, you're absolutely right. I'll update the > > KIP. > > > > Thanks again for all the feedback! > > -John > > > > On Thu, Jun 20, 2019 at 4:34 PM Matthias J. Sax <matth...@confluent.io> > > wrote: > > > > > > Just want to second what Sophie said about the stores. The type of a > > > used stores is completely independent of input/output types. > > > > > > This related to change `addGlobalStore()` method. Why do you want to pin > > > the types? In fact, people request the ability to filter() and maybe > > > even map() the data before they are put into the global store. Limiting > > > the types seems to be a step backward here? > > > > > > > > > > > > Also, the pack name is questionable. > > > > > > > This wouldn't be the first project to do something like this... > > > > > > Not a strong argument. I would actually propose to not a a new package, > > > but just a new class `TypedProcessor`. > > > > > > > > > For `ProcessorContext#forward` methods -- some of those methods are > > > already deprecated. While the will still be affected, it would be worth > > > to mark them as deprecated in the wiki page, too. > > > > > > > > > @Guozhang: I dont' think we should break source compatibility in a minor > > > release. > > > > > > > > > > > > -Matthias > > > > > > > > > > > > On 6/20/19 1:43 PM, Guozhang Wang wrote: > > > > Hi John, > > > > > > > > Thanks for KIP! I've a few comments below: > > > > > > > > 1. So far the "Motivation" section is very general, and the only > > > > concrete > > > > example that I have in mind is `TransformValues#punctuate`. Do we have > > > > any > > > > other concrete issues that drive this KIP? If not then I feel better to > > > > narrow the scope of this KIP to: > > > > > > > > 1.a) modifying ProcessorContext only with the output types on forward. > > > > 1.b) modifying Transformer signature to have generics of > > > > ProcessorContext, > > > > and then lift the restricting of not using punctuate: if user did not > > > > follow the enforced typing and just code without generics, they will get > > > > warning at compile time and get run-time error if they forward > > > > wrong-typed > > > > records, which I think would be acceptable. > > > > > > > > I feel this would be a good solution for this specific issue; again, > > > > feel > > > > free to update the wiki page with other known issues that cannot be > > > > resolved. > > > > > > > > 2. If, we want to go with the current scope then my next question would > > > > be, > > > > how much breakage we would introducing if we just modify the Processor > > > > signature directly? My feeling is that DSL users would be most likely > > > > not > > > > affected and PAPI users only need to modify a few lines on class > > > > declaration. I feel it worth doing some research on this part and then > > > > decide if we really want to bite the bullet of duplicated Processor / > > > > ProcessorSupplier classes for maintaining compatibility. > > > > > > > > > > > > Guozhang > > > > > > > > > > > > > > > > On Wed, Jun 19, 2019 at 12:21 PM John Roesler <j...@confluent.io> wrote: > > > > > > > >> Hi all, > > > >> > > > >> In response to the feedback so far, I changed the package name from > > > >> `processor2` to `processor.generic`. > > > >> > > > >> Thanks, > > > >> -John > > > >> > > > >> On Mon, Jun 17, 2019 at 4:49 PM John Roesler <j...@confluent.io> wrote: > > > >>> > > > >>> Thanks for the feedback, Sophie! > > > >>> > > > >>> I actually felt a little uneasy when I wrote that remark, because it's > > > >>> not restricted at all in the API, it's just available to you if you > > > >>> choose to give your stores and context the same parameters. So, I > > > >>> think your use case is valid, and also perfectly permissable under the > > > >>> current KIP. Sorry for sowing confusion on my own discussion thread! > > > >>> > > > >>> I'm not crazy about the package name, either. I went with it only > > > >>> because there's seemingly nothing special about the new package except > > > >>> that it can't have the same name as the old one. Otherwise, the > > > >>> existing "processor" and "Processor" names for the package and class > > > >>> are perfectly satisfying. Rather than pile on additional semantics, it > > > >>> seemed cleaner to just add a number to the package name. > > > >>> > > > >>> This wouldn't be the first project to do something like this... Apache > > > >>> Commons, for example, has added a "2" to the end of some of their > > > >>> packages for exactly the same reason. > > > >>> > > > >>> I'm open to any suggestions. For example, we could do something like > > > >>> org.apache.kafka.streams.typedprocessor.Processor or > > > >>> org.apache.kafka.streams.processor.typed.Processor , which would have > > > >>> just about the same effect. One microscopic thought is that, if > > > >>> there's another interface in the "processor" package that we wish to > > > >>> do the same thing to, would _could_ pile it in to "processor2", but we > > > >>> couldn't do the same if we use a package that has "typed" in the name, > > > >>> unless that change is _also_ related to types in some way. But this > > > >>> seems like a very minor concern. > > > >>> > > > >>> What's your preference? > > > >>> -John > > > >>> > > > >>> On Mon, Jun 17, 2019 at 3:56 PM Sophie Blee-Goldman > > > >>> <sop...@confluent.io> > > > >> wrote: > > > >>>> > > > >>>> Hey John, thanks for writing this up! I like the proposal but there's > > > >> one > > > >>>> point that I think may be too restrictive: > > > >>>> > > > >>>> "A processor that happens to use a typed store is actually emitting > > > >>>> the > > > >>>> same types that it is storing." > > > >>>> > > > >>>> I can imagine someone could want to leverage this new type safety > > > >> without > > > >>>> also limiting how they can interact with/use their store. As an > > > >> (admittedly > > > >>>> contrived) example, say you have an input stream of purchases of a > > > >> certain > > > >>>> type (entertainment, food, etc), and on seeing a new record you want > > > >>>> to > > > >>>> output how many types of purchase a shopper has made more than 5 > > > >> purchases > > > >>>> of in the last month. Your state store will probably be holding some > > > >> more > > > >>>> complicated PurchaseHistory object (keyed by user), but your output > > > >>>> is > > > >> just > > > >>>> a <User, Long> > > > >>>> > > > >>>> I'm also not crazy about "processor2" as the package name ... not > > > >>>> sure > > > >> what > > > >>>> a better one would be though (something with "typed"?) > > > >>>> > > > >>>> On Mon, Jun 17, 2019 at 12:47 PM John Roesler <j...@confluent.io> > > > >> wrote: > > > >>>> > > > >>>>> Hi all, > > > >>>>> > > > >>>>> I'd like to propose KIP-478 ( > > > >> https://cwiki.apache.org/confluence/x/2SkLBw > > > >>>>> ). > > > >>>>> > > > >>>>> This proposal would add output type bounds to the Processor > > > >>>>> interface > > > >>>>> in Kafka Streams, which enables static checking of a number of > > > >>>>> useful > > > >>>>> properties: > > > >>>>> * A processor B that consumes the output of processor A is actually > > > >>>>> expecting the same types that processor A produces. > > > >>>>> * A processor that happens to use a typed store is actually emitting > > > >>>>> the same types that it is storing. > > > >>>>> * A processor is simply forwarding the expected types in all code > > > >> paths. > > > >>>>> * Processors added via the Streams DSL, which are not permitted to > > > >>>>> forward results at all are statically prevented from doing so by the > > > >>>>> compiler > > > >>>>> > > > >>>>> Internally, we can use the above properties to achieve a much higher > > > >>>>> level of confidence in the Streams DSL implementation's correctness. > > > >>>>> Actually, while doing the POC, I found a few bugs and mistakes, > > > >>>>> which > > > >>>>> become structurally impossible with KIP-478. > > > >>>>> > > > >>>>> Additionally, the stronger types dramatically improve the > > > >>>>> self-documentation of our Streams internal implementations, which > > > >>>>> makes it much easier for new contributors to ramp up with > > > >>>>> confidence. > > > >>>>> > > > >>>>> Thanks so much for your consideration! > > > >>>>> -John > > > >>>>> > > > >> > > > > > > > > > > >