On 10/4/21 11:43 PM, Robert Bradshaw wrote:
Oh, yes.

Java Reshuffle.of() = Python ReshufflePerKey()
Java Reshuffle.viaRandomKey() == Python Reshuffle()

We generally try to avoid this kind of discrepancy. It could make
sense to rename Reshuffle.of() to Reshuffle.viaKey().

I'd suggest Reshuffle.usingKey(), but I'm not native speaker, so that might be opinionated. More importantly - could we undeprecate Reshuffle (in Java SDK)? I believe it was deprecated for wrong reasons - yes, it has undocumented and non-portable side-effects, but is still makes sense for various use-cases (e.g. fan-out, or SDF).

 Jan


On Mon, Oct 4, 2021 at 2:33 PM Ke Wu <[email protected]> wrote:
I should have said that the descrepency lives in SDK not Class vs Portable.

Correct me if I am wrong, Reshuffle transform in Java SDK requires the input to 
be KV [1] while Reshuffle in Python [2] and Go [3] does not.


[1] 
https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java#L53
[2] 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L730
[3] https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/gbk.go#L122

On Oct 4, 2021, at 12:09 PM, Robert Bradshaw <[email protected]> wrote:

Reshuffle is not keyed, there is a separate reshuffle-per-key for
that. This is true for both Java and Python. This shouldn't depend on
classic vs. portable mode. It sounds like there's an issue in
translation.

On Mon, Oct 4, 2021 at 11:18 AM Ke Wu <[email protected]> wrote:


Hello All,

Recent Samza Runner tests failure in python/xlang [1][2] reveals an interesting 
fact that Reshuffle Transform in classic pipeline requires the input to be KV 
while portable pipeline does not, where Reshuffle in portable mode it has an 
extra step to append a random key [3].

This suggests that Reshuffle in classic mode is, sort of, equivalent to 
ReshufflePerKey in potable mode instead of Reshuffle itself. Couple of 
questions on this:

1. Is such SDK/API discrepancy expected?
2. If Yes, then, what are the advised approach for runners to implement 
translators for such transforms?
3. If No, is this something we can improve?

Best,
Ke


[1] https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/288/
[2] https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/285/
[3] 
https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L730


Reply via email to