On Wed, Oct 4, 2023 at 7:45 PM Robert Burke <lostl...@apache.org> wrote:
> LGTM. > > It looks the Go SDK already adheres to these semantics as well for the > reference impl(well, reshuffle/redistribute_randomly, _by_key isn't > implemented in the Go SDK, and only uses the existing unqualified reshuffle > URN [0]. > > The original strategy, and then for every element, the original Window, > TS, and Pane are all serialized, shuffled, and then deserialized downstream. > > > https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/reshuffle.go#L65 > > > https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/exec/reshuffle.go#L145 > > Prism at the moment vaccuously implements reshuffle by omitting the node, > and rewriting the inputs and outputs [1], as it's a local runner with > single transform per bundle execution, but I was intending to make it a > fusion break regardless. Ultimately prism's "test" variant will default to > executing the SDKs dictated reference implementation for the composite(s), > and any "fast" or "prod" variant would simply do the current implementation. > Very nice! And of course I should have linked out to the existing reshuffle URN in the proto. Kenn > > Robert Burke > Beam Go Busybody > > [0]: > https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/core/runtime/graphx/translate.go#L46C3-L46C50 > [1]: > https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/internal/handlerunner.go#L82 > > > > On 2023/09/26 15:43:53 Kenneth Knowles wrote: > > Hi everyone, > > > > Recently there was a bug [1] caused by discrepancies between two of > > Dataflow's reshuffle implementations. I think the reference > implementation > > in the Java SDK [2] also does not match. This all led to discussion on > the > > bug and the pull request [3] about what the actual semantics should be. I > > got it wrong, maybe multiple times. So I wrote up a very short document > to > > finish the discussion: > > > > https://s.apache.org/beam-reshuffle > > > > This is also probably among the simplest imaginable use of > > http://s.apache.org/ptransform-design-doc in case you want to see kind > of > > how I intended it to be used. > > > > Kenn > > > > [1] https://github.com/apache/beam/issues/28219 > > [2] > > > https://github.com/apache/beam/blob/d52b077ad505c8b50f10ec6a4eb83d385cdaf96a/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java#L84 > > [3] https://github.com/apache/beam/pull/28272 > > >