+1 Good luck!
Jerry Shao <js...@apache.org> 于2022年5月16日周一 21:44写道: > Hi all, > > We would like to propose Firestorm[1] as a new Apache incubator project, > you can find the proposal here [2] for more details. > > Firestorm is a high performance, general purpose Remote Shuffle Service for > distributed compute engines like Apache Spark > <https://spark.apache.org/>, Apache > Hadoop MapReduce <https://hadoop.apache.org/>, Apache Flink > <https://flink.apache.org/> and so on. We are aiming to make Firestorm a > universal shuffle service for distributed compute engines. > > Shuffle is the key part for a distributed compute engine to exchange the > data between distributed tasks, the performance and stability of shuffle > will directly affect the whole job. Current “local file pull-like shuffle > style” has several limitations: > > 1. Current shuffle is hard to support super large workloads, especially > in a high load environment, the major problem is IO problem (random > disk IO > issue, network congestion and timeout). > 2. Current shuffle is hard to deploy on the disaggregated compute > storage environment, as disk capacity is quite limited on compute nodes. > 3. The constraint of storing shuffle data locally makes it hard to scale > elastically. > > Remote Shuffle Service is the key technology for enterprises to build big > data platforms, to expand big data applications to disaggregated, > online-offline hybrid environments, and to solve above problems. > > The implementation of Remote Shuffle Service - “Firestorm” - is heavily > adopted in Tencent, and shows its advantages in production. Other > enterprises also adopted or prepared to adopt Firestorm in their > environments. > > Firestorm’s key idea is brought from Salfish shuffle > < > https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing > >, > it has several key design goals: > > 1. High performance. Firestorm’s performance is close enough to local > file based shuffle style for small workloads. For large workloads, it is > far better than the current shuffle style. > 2. Fault tolerance. Firestorm provides high availability for Coordinated > nodes, and failover for Shuffle nodes. > 3. Pluggable. Firestorm is highly pluggable, which could be suited to > different compute engines, different backend storages, and different > wire-protocols. > > We believe that Firestorm project will provide the great value for the > community if it is accepted by the Apache incubator. > > I will help this project as champion and many thanks to the 3 mentors: > > - Junping du (junping...@apache.org) > - Xun liu (liu...@apache.org) > - Zhankun Tang (zt...@apache.org) > > > [1] https://github.com/Tencent/Firestorm > [2] > https://cwiki.apache.org/confluence/display/INCUBATOR/FirestormProposal > > Best regards, > Jerry >