Hi all,

I ever launched the discussion of "Proposal of external shuffle service" before 
and received very helpful feedbacks, especially with @Andrey Zagrebin's 
in-depth communication offline. 
Based on @Till Rohrmann's suggestion, I launch this separate thread again to 
summarize the current progress and welcome further reviews.

You might encounter such problems before:
1. When the task finishes, the TM is released by ResourceManager soon. But the 
internal partition has not been transfered completed or not consumed by 
downstream side yet, which would cause
unnecessary failover.
2. When the partition is transfered completed via network, it would be removed 
from TM immediately. If there are any exceptions during transport or 
downstream's consumption, the upstream task
has to be restarted again to re-produce the data.
3. You may not satisfy with the performance of one-partition-one-file mode for 
batch jobs in some scenarios, or you want to realize some external shuffle 
service deployed on YARN/K8S,etc. Or the
transport layer is not limited by curernt netty, such as via DFS or RDMA etc. 
But you might find it is difficult to make changes within current architecture.

All the above concerns would be covered by proposed pluggable shuffle manager 
architecture. If you are interested or wish more details, please refer to the 
google doc [1] or FLIP [2]. There are also
some sub-tasks under going within the umbrella jira [3] .

[1] 
https://docs.google.com/document/d/1l7yIVNH3HATP4BnjEOZFkO2CaHf1sVn_DSxS2llmkd8/edit?usp=sharing
[2] 
https://cwiki.apache.org/confluence/display/FLINK/FLIP-31%3A+Pluggable+Shuffle+Manager
[3] https://issues.apache.org/jira/browse/FLINK-10653

Best,
Zhijiang

Reply via email to