+1, good luck!
------------------ Original ------------------ From: Jerry Shao <js...@apache.org> Date: Mon,May 16,2022 9:44 PM To: general <general@incubator.apache.org> Subject: Re: [DISCUSSION] Incubating Proposal of Firestorm Hi all, We would like to propose Firestorm[1] as a new Apache incubator project, you can find the proposal here [2] for more details. Firestorm is a high performance, general purpose Remote Shuffle Service for distributed compute engines like Apache Spark <https://spark.apache.org/>, Apache Hadoop MapReduce <https://hadoop.apache.org/>, Apache Flink <https://flink.apache.org/> and so on. We are aiming to make Firestorm a universal shuffle service for distributed compute engines. Shuffle is the key part for a distributed compute engine to exchange the data between distributed tasks, the performance and stability of shuffle will directly affect the whole job. Current “local file pull-like shuffle style” has several limitations: 1. Current shuffle is hard to support super large workloads, especially in a high load environment, the major problem is IO problem (random disk IO issue, network congestion and timeout). 2. Current shuffle is hard to deploy on the disaggregated compute storage environment, as disk capacity is quite limited on compute nodes. 3. The constraint of storing shuffle data locally makes it hard to scale elastically. Remote Shuffle Service is the key technology for enterprises to build big data platforms, to expand big data applications to disaggregated, online-offline hybrid environments, and to solve above problems. The implementation of Remote Shuffle Service - “Firestorm” - is heavily adopted in Tencent, and shows its advantages in production. Other enterprises also adopted or prepared to adopt Firestorm in their environments. Firestorm’s key idea is brought from Salfish shuffle <https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing>, it has several key design goals: 1. High performance. Firestorm’s performance is close enough to local file based shuffle style for small workloads. For large workloads, it is far better than the current shuffle style. 2. Fault tolerance. Firestorm provides high availability for Coordinated nodes, and failover for Shuffle nodes. 3. Pluggable. Firestorm is highly pluggable, which could be suited to different compute engines, different backend storages, and different wire-protocols. We believe that Firestorm project will provide the great value for the community if it is accepted by the Apache incubator. I will help this project as champion and many thanks to the 3 mentors: - Junping du (junping...@apache.org) - Xun liu (liu...@apache.org) - Zhankun Tang (zt...@apache.org) [1] https://github.com/Tencent/Firestorm [2] https://cwiki.apache.org/confluence/display/INCUBATOR/FirestormProposal Best regards, Jerry