Thanks for the confirmation.

For translation purpose, I think the issue is that

“beam:transform:reshuffle:v1” corresponds to Java Reshuffle.of() and Python 
Reshuffle(), where one is expecting KV but not the other. 

Ideally, it should be [Java Reshuffle.of() and Python ReshufflePerKey()] or 
[Java Reshuffle.viaRandomKey() and Python Reshuffle()]. In addition, there 
could be another Urn to represent the other pair. e.g. 
"beam:transform:reshuffle_per_key:v1” or 
“beam:transform:reshuffle_via_random_key:v1"

Any thoughts on this?

Best,
Ke


> On Oct 4, 2021, at 2:43 PM, Robert Bradshaw <[email protected]> wrote:
> 
> Oh, yes.
> 
> Java Reshuffle.of() = Python ReshufflePerKey()
> Java Reshuffle.viaRandomKey() == Python Reshuffle()
> 
> We generally try to avoid this kind of discrepancy. It could make
> sense to rename Reshuffle.of() to Reshuffle.viaKey().
> 
> On Mon, Oct 4, 2021 at 2:33 PM Ke Wu <[email protected]> wrote:
>> 
>> I should have said that the descrepency lives in SDK not Class vs Portable.
>> 
>> Correct me if I am wrong, Reshuffle transform in Java SDK requires the input 
>> to be KV [1] while Reshuffle in Python [2] and Go [3] does not.
>> 
>> 
>> [1] 
>> https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/Reshuffle.java#L53
>> [2] 
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L730
>> [3] https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/gbk.go#L122
>> 
>> On Oct 4, 2021, at 12:09 PM, Robert Bradshaw <[email protected]> wrote:
>> 
>> Reshuffle is not keyed, there is a separate reshuffle-per-key for
>> that. This is true for both Java and Python. This shouldn't depend on
>> classic vs. portable mode. It sounds like there's an issue in
>> translation.
>> 
>> On Mon, Oct 4, 2021 at 11:18 AM Ke Wu <[email protected]> wrote:
>> 
>> 
>> Hello All,
>> 
>> Recent Samza Runner tests failure in python/xlang [1][2] reveals an 
>> interesting fact that Reshuffle Transform in classic pipeline requires the 
>> input to be KV while portable pipeline does not, where Reshuffle in portable 
>> mode it has an extra step to append a random key [3].
>> 
>> This suggests that Reshuffle in classic mode is, sort of, equivalent to 
>> ReshufflePerKey in potable mode instead of Reshuffle itself. Couple of 
>> questions on this:
>> 
>> 1. Is such SDK/API discrepancy expected?
>> 2. If Yes, then, what are the advised approach for runners to implement 
>> translators for such transforms?
>> 3. If No, is this something we can improve?
>> 
>> Best,
>> Ke
>> 
>> 
>> [1] https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Samza/288/
>> [2] https://ci-beam.apache.org/job/beam_PostCommit_XVR_Samza/285/
>> [3] 
>> https://github.com/apache/beam/blob/master/sdks/python/apache_beam/transforms/util.py#L730
>> 
>> 

Reply via email to