LGTM.

It looks the Go SDK already adheres to these semantics as well for the 
reference impl(well, reshuffle/redistribute_randomly, _by_key isn't implemented 
in the Go SDK, and only uses the existing unqualified reshuffle URN [0].

The original strategy, and then for every element, the original Window, TS, and 
Pane are all serialized, shuffled, and then deserialized downstream.

https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/reshuffle.go#L65

https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/reshuffle.go#L145

Prism at the moment vaccuously implements reshuffle by omitting the node, and 
rewriting the inputs and outputs [1], as it's a local runner with single 
transform per bundle execution, but I was intending to make it a fusion break 
regardless.  Ultimately prism's "test" variant will default to executing the 
SDKs dictated reference implementation for the composite(s), and any "fast" or 
"prod" variant would simply do the current implementation.

Robert Burke
Beam Go Busybody

[0]: 
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L46C3-L46C50
[1]: 
https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/handlerunner.go#L82



On 2023/09/26 15:43:53 Kenneth Knowles wrote:
> Hi everyone,
> 
> Recently there was a bug [1] caused by discrepancies between two of
> Dataflow's reshuffle implementations. I think the reference implementation
> in the Java SDK [2] also does not match. This all led to discussion on the
> bug and the pull request [3] about what the actual semantics should be. I
> got it wrong, maybe multiple times. So I wrote up a very short document to
> finish the discussion:
> 
>     https://s.apache.org/beam-reshuffle
> 
> This is also probably among the simplest imaginable use of
> http://s.apache.org/ptransform-design-doc in case you want to see kind of
> how I intended it to be used.
> 
> Kenn
> 
> [1] https://github.com/apache/beam/issues/28219
> [2]
> https://github.com/apache/beam/blob/d52b077ad505c8b50f10ec6a4eb83d385cdaf96a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java#L84
> [3] https://github.com/apache/beam/pull/28272
> 

Reply via email to