On 10/4/21 11:43 PM, Robert Bradshaw wrote:
Oh, yes.
Java Reshuffle.of() = Python ReshufflePerKey()
Java Reshuffle.viaRandomKey() == Python Reshuffle()
We generally try to avoid this kind of discrepancy. It could make
sense to rename Reshuffle.of() to Reshuffle.viaKey().
I'd suggest Reshuffle.usingKey(), but I'm not native speaker, so that
might be opinionated. More importantly - could we undeprecate Reshuffle
(in Java SDK)? I believe it was deprecated for wrong reasons - yes, it
has undocumented and non-portable side-effects, but is still makes sense
for various use-cases (e.g. fan-out, or SDF).
Jan
On Mon, Oct 4, 2021 at 2:33 PM Ke Wu <[email protected]> wrote:
I should have said that the descrepency lives in SDK not Class vs Portable.
Correct me if I am wrong, Reshuffle transform in Java SDK requires the input to
be KV [1] while Reshuffle in Python [2] and Go [3] does not.
[1]
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java#L53
[2]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L730
[3] https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/gbk.go#L122
On Oct 4, 2021, at 12:09 PM, Robert Bradshaw <[email protected]> wrote:
Reshuffle is not keyed, there is a separate reshuffle-per-key for
that. This is true for both Java and Python. This shouldn't depend on
classic vs. portable mode. It sounds like there's an issue in
translation.
On Mon, Oct 4, 2021 at 11:18 AM Ke Wu <[email protected]> wrote:
Hello All,
Recent Samza Runner tests failure in python/xlang [1][2] reveals an interesting
fact that Reshuffle Transform in classic pipeline requires the input to be KV
while portable pipeline does not, where Reshuffle in portable mode it has an
extra step to append a random key [3].
This suggests that Reshuffle in classic mode is, sort of, equivalent to
ReshufflePerKey in potable mode instead of Reshuffle itself. Couple of
questions on this:
1. Is such SDK/API discrepancy expected?
2. If Yes, then, what are the advised approach for runners to implement
translators for such transforms?
3. If No, is this something we can improve?
Best,
Ke
[1] https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/288/
[2] https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/285/
[3]
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L730