On 11.01.2019 21:29, John Roesler wrote:
> Hi Jan,
>
> Thanks for the reply.
>
> It sounds like your larger point is that if we provide a building block
> instead of the whole operation, then it's not too hard for users to
> implement the whole operation, and maybe the building block is
> independently useful.

exactly
>
> This is a very fair point. In fact, it's not exclusive with the current
> plan,
> in that we can always add the "building block" version in addition to,
> rather than instead of, the full operation. It very well might be a mistake,
> but I still prefer to begin by introducing the fully encapsulated operation
> and subsequently consider adding the "building block" version if it turns
> out that the encapsulated version is insufficient.

Raising my hand here, I wont be using the new API unless the scattered 
table is there. I am going to stick with my PAPI solution.

>
> IMHO, one of Streams's strengths over other processing frameworks
> is a simple API, so simplicity as a design goal seems to suggest that:
>> a.tomanyJoin(B)
> is preferable to
>> a.map(retain(key and FK)).tomanyJoin(B).groupBy(a.key()).join(A)
> at least to start with.
>
> To answer your question about my latter potential optimization,
> no I don't have any code to look at. But, yes, the implementation
> would bring B into A's tasks and keep them in a state store for joining.
> Thanks for that reference, it does indeed sound similar to what
> MapJoin does in Hive.

always a pleasure with you John.

>
> Thanks again,
> -John

Reply via email to