On 11.01.2019 21:29, John Roesler wrote: > Hi Jan, > > Thanks for the reply. > > It sounds like your larger point is that if we provide a building block > instead of the whole operation, then it's not too hard for users to > implement the whole operation, and maybe the building block is > independently useful.
exactly > > This is a very fair point. In fact, it's not exclusive with the current > plan, > in that we can always add the "building block" version in addition to, > rather than instead of, the full operation. It very well might be a mistake, > but I still prefer to begin by introducing the fully encapsulated operation > and subsequently consider adding the "building block" version if it turns > out that the encapsulated version is insufficient. Raising my hand here, I wont be using the new API unless the scattered table is there. I am going to stick with my PAPI solution. > > IMHO, one of Streams's strengths over other processing frameworks > is a simple API, so simplicity as a design goal seems to suggest that: >> a.tomanyJoin(B) > is preferable to >> a.map(retain(key and FK)).tomanyJoin(B).groupBy(a.key()).join(A) > at least to start with. > > To answer your question about my latter potential optimization, > no I don't have any code to look at. But, yes, the implementation > would bring B into A's tasks and keep them in a state store for joining. > Thanks for that reference, it does indeed sound similar to what > MapJoin does in Hive. always a pleasure with you John. > > Thanks again, > -John