+1
In fact, not only Shuffle can benefit from RDMA. Broadcast modules can also
benefit from this with decent modification, with benefits of less CPU
occupation and better network proformance.
Tencent is evaluating on this in Lab, and we observe a roughly 50% improvemnet
in TeraSort, in 100G
There’s no need to compare to Flink’s Streaming Model. Spark should focus more
on how to go beyond itself.
From the beginning, Spark’s success comes from it’s unified model can satisfiy
SQL,Streaming, Machine Learning Models and Graphs Jobs …… all in One. But From
1.6 to 2.0, the abstraction