Re: Maintaining Stream Partitioning after Mapping?

Chesnay Schepler Sat, 15 Apr 2017 04:22:36 -0700

Hello,

I think if you have multiple keyBy() transformations with identicalparallelism the partitioning shouldbe "preserved". The second keyBy() will still go through thepartitioning process, but since both the keyand parallelism are identical the resulting partition should beidentical as well. resulting in no data beingshuffled around. We aren't really preserving the partitioning, butre-creating the original one.


Regards,
Chesnay

On 12.04.2017 21:37, Ryan Conway wrote:

Greetings,
Is there a means of maintaining a stream's partitioning after runningit through an operation such as map or filter?
I have a pipeline stage S that operates on a stream partitioned by anID field. S flat maps objects of type A to type B, which both have an"ID" field, and where each instance of B that S outputs has the sameID as its input instance of A. I hope to add a pipeline stage Timmediately after S that operates using the same partitioning as S, sothat I can avoid the expense of re-keying the instances of type B.
If I am understanding the DataStream API correctly this is notfeasible with Flink, as map(), filter() etc. alloutput SingleOutputStreamOperator. But I am hoping that I am missingsomething.
Thank you,
Ryan

Re: Maintaining Stream Partitioning after Mapping?

Reply via email to