Thanks for the confirmation. For translation purpose, I think the issue is that
“beam:transform:reshuffle:v1” corresponds to Java Reshuffle.of() and Python Reshuffle(), where one is expecting KV but not the other. Ideally, it should be [Java Reshuffle.of() and Python ReshufflePerKey()] or [Java Reshuffle.viaRandomKey() and Python Reshuffle()]. In addition, there could be another Urn to represent the other pair. e.g. "beam:transform:reshuffle_per_key:v1” or “beam:transform:reshuffle_via_random_key:v1" Any thoughts on this? Best, Ke > On Oct 4, 2021, at 2:43 PM, Robert Bradshaw <[email protected]> wrote: > > Oh, yes. > > Java Reshuffle.of() = Python ReshufflePerKey() > Java Reshuffle.viaRandomKey() == Python Reshuffle() > > We generally try to avoid this kind of discrepancy. It could make > sense to rename Reshuffle.of() to Reshuffle.viaKey(). > > On Mon, Oct 4, 2021 at 2:33 PM Ke Wu <[email protected]> wrote: >> >> I should have said that the descrepency lives in SDK not Class vs Portable. >> >> Correct me if I am wrong, Reshuffle transform in Java SDK requires the input >> to be KV [1] while Reshuffle in Python [2] and Go [3] does not. >> >> >> [1] >> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java#L53 >> [2] >> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L730 >> [3] https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/gbk.go#L122 >> >> On Oct 4, 2021, at 12:09 PM, Robert Bradshaw <[email protected]> wrote: >> >> Reshuffle is not keyed, there is a separate reshuffle-per-key for >> that. This is true for both Java and Python. This shouldn't depend on >> classic vs. portable mode. It sounds like there's an issue in >> translation. >> >> On Mon, Oct 4, 2021 at 11:18 AM Ke Wu <[email protected]> wrote: >> >> >> Hello All, >> >> Recent Samza Runner tests failure in python/xlang [1][2] reveals an >> interesting fact that Reshuffle Transform in classic pipeline requires the >> input to be KV while portable pipeline does not, where Reshuffle in portable >> mode it has an extra step to append a random key [3]. >> >> This suggests that Reshuffle in classic mode is, sort of, equivalent to >> ReshufflePerKey in potable mode instead of Reshuffle itself. Couple of >> questions on this: >> >> 1. Is such SDK/API discrepancy expected? >> 2. If Yes, then, what are the advised approach for runners to implement >> translators for such transforms? >> 3. If No, is this something we can improve? >> >> Best, >> Ke >> >> >> [1] https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/288/ >> [2] https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/285/ >> [3] >> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L730 >> >>
