Let me add two new urns representing reshuffle via random key and reshuffle using key. I will share the PR later here, would need some help on Python/Go SDK changes too since I am not very familiar with them.
Best, Ke > On Oct 4, 2021, at 3:11 PM, Robert Bradshaw <[email protected]> wrote: > > On Mon, Oct 4, 2021 at 3:08 PM Jan Lukavský <[email protected] > <mailto:[email protected]>> wrote: >> >> On 10/4/21 11:43 PM, Robert Bradshaw wrote: >>> Oh, yes. >>> >>> Java Reshuffle.of() = Python ReshufflePerKey() >>> Java Reshuffle.viaRandomKey() == Python Reshuffle() >>> >>> We generally try to avoid this kind of discrepancy. It could make >>> sense to rename Reshuffle.of() to Reshuffle.viaKey(). >> >> I'd suggest Reshuffle.usingKey(), but I'm not native speaker, so that >> might be opinionated. > > usingKey does sound better. (And, FWIW, usingRandomKey() sounds better > to me than vaiRandomKey(), but probably not worth changing so the > question becomes whether to be stilted or consistent.) > >> More importantly - could we undeprecate Reshuffle >> (in Java SDK)? I believe it was deprecated for wrong reasons - yes, it >> has undocumented and non-portable side-effects, but is still makes sense >> for various use-cases (e.g. fan-out, or SDF). > > +1 > >>> >>> On Mon, Oct 4, 2021 at 2:33 PM Ke Wu <[email protected]> wrote: >>>> I should have said that the descrepency lives in SDK not Class vs Portable. >>>> >>>> Correct me if I am wrong, Reshuffle transform in Java SDK requires the >>>> input to be KV [1] while Reshuffle in Python [2] and Go [3] does not. >>>> >>>> >>>> [1] >>>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java#L53 >>>> [2] >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L730 >>>> [3] https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/gbk.go#L122 >>>> >>>> On Oct 4, 2021, at 12:09 PM, Robert Bradshaw <[email protected]> wrote: >>>> >>>> Reshuffle is not keyed, there is a separate reshuffle-per-key for >>>> that. This is true for both Java and Python. This shouldn't depend on >>>> classic vs. portable mode. It sounds like there's an issue in >>>> translation. >>>> >>>> On Mon, Oct 4, 2021 at 11:18 AM Ke Wu <[email protected]> wrote: >>>> >>>> >>>> Hello All, >>>> >>>> Recent Samza Runner tests failure in python/xlang [1][2] reveals an >>>> interesting fact that Reshuffle Transform in classic pipeline requires the >>>> input to be KV while portable pipeline does not, where Reshuffle in >>>> portable mode it has an extra step to append a random key [3]. >>>> >>>> This suggests that Reshuffle in classic mode is, sort of, equivalent to >>>> ReshufflePerKey in potable mode instead of Reshuffle itself. Couple of >>>> questions on this: >>>> >>>> 1. Is such SDK/API discrepancy expected? >>>> 2. If Yes, then, what are the advised approach for runners to implement >>>> translators for such transforms? >>>> 3. If No, is this something we can improve? >>>> >>>> Best, >>>> Ke >>>> >>>> >>>> [1] https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/288/ >>>> [2] https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/285/ >>>> [3] >>>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L730
