Thank you, Chesnay. My hope is to keep things computationally inexpensive, and if I understand you correctly, that is satisfied even with this rekeying.
Ryan On Sat, Apr 15, 2017 at 4:22 AM, Chesnay Schepler <ches...@apache.org> wrote: > Hello, > > I think if you have multiple keyBy() transformations with identical > parallelism the partitioning should > be "preserved". The second keyBy() will still go through the partitioning > process, but since both the key > and parallelism are identical the resulting partition should be identical > as well. resulting in no data being > shuffled around. We aren't really preserving the partitioning, but > re-creating the original one. > > Regards, > Chesnay > > > On 12.04.2017 21:37, Ryan Conway wrote: > >> Greetings, >> >> Is there a means of maintaining a stream's partitioning after running it >> through an operation such as map or filter? >> >> I have a pipeline stage S that operates on a stream partitioned by an ID >> field. S flat maps objects of type A to type B, which both have an "ID" >> field, and where each instance of B that S outputs has the same ID as its >> input instance of A. I hope to add a pipeline stage T immediately after S >> that operates using the same partitioning as S, so that I can avoid the >> expense of re-keying the instances of type B. >> >> If I am understanding the DataStream API correctly this is not feasible >> with Flink, as map(), filter() etc. all output SingleOutputStreamOperator. >> But I am hoping that I am missing something. >> >> Thank you, >> Ryan >> > > >