Re: [DISCUSSION] Incubating Proposal of Firestorm

41108453 Mon, 16 May 2022 07:20:11 -0700

+1， good luck！

------------------ Original ------------------
From: Jerry Shao <js...@apache.org&gt;
Date: Mon,May 16,2022 9:44 PM
To: general <general@incubator.apache.org&gt;
Subject: Re: [DISCUSSION] Incubating Proposal of Firestorm

Hi all,

We would like to propose Firestorm[1] as a new Apache incubator project,
you can find the proposal here [2] for more details.

Firestorm is a high performance, general purpose Remote Shuffle Service for
distributed compute engines like Apache Spark
<https://spark.apache.org/&gt;, Apache
Hadoop MapReduce <https://hadoop.apache.org/&gt;, Apache Flink
<https://flink.apache.org/&gt; and so on. We are aiming to make Firestorm a
universal shuffle service for distributed compute engines.

Shuffle is the key part for a distributed compute engine to exchange the
data between distributed tasks, the performance and stability of shuffle
will directly affect the whole job. Current “local file pull-like shuffle
style” has several limitations:

&nbsp;&nbsp; 1. Current shuffle is hard to support super large workloads, 
especially
&nbsp;&nbsp; in a high load environment, the major problem is IO problem 
(random disk IO
&nbsp;&nbsp; issue, network congestion and timeout).
&nbsp;&nbsp; 2. Current shuffle is hard to deploy on the disaggregated compute
&nbsp;&nbsp; storage environment, as disk capacity is quite limited on compute 
nodes.
&nbsp;&nbsp; 3. The constraint of storing shuffle data locally makes it hard to 
scale
&nbsp;&nbsp; elastically.

Remote Shuffle Service is the key technology for enterprises to build big
data platforms, to expand big data applications to disaggregated,
online-offline hybrid environments, and to solve above problems.

The implementation of Remote Shuffle Service -&nbsp; “Firestorm”&nbsp; - is 
heavily
adopted in Tencent, and shows its advantages in production. Other
enterprises also adopted or prepared to adopt Firestorm in their
environments.

Firestorm’s key idea is brought from Salfish shuffle
<https://www.researchgate.net/publication/262241541_Sailfish_a_framework_for_large_scale_data_processing&gt;,
it has several key design goals:

&nbsp;&nbsp; 1. High performance. Firestorm’s performance is close enough to 
local
&nbsp;&nbsp; file based shuffle style for small workloads. For large workloads, 
it is
&nbsp;&nbsp; far better than the current shuffle style.
&nbsp;&nbsp; 2. Fault tolerance. Firestorm provides high availability for 
Coordinated
&nbsp;&nbsp; nodes, and failover for Shuffle nodes.
&nbsp;&nbsp; 3. Pluggable. Firestorm is highly pluggable, which could be suited 
to
&nbsp;&nbsp; different compute engines, different backend storages, and 
different
&nbsp;&nbsp; wire-protocols.

We believe that Firestorm project will provide the great value for the
community if it is accepted by the Apache incubator.

I will help this project as champion and many thanks to the 3 mentors:

&nbsp;&nbsp; - Junping du (junping...@apache.org)
&nbsp;&nbsp; - Xun liu (liu...@apache.org)
&nbsp;&nbsp; - Zhankun Tang (zt...@apache.org)

[1] https://github.com/Tencent/Firestorm
[2] https://cwiki.apache.org/confluence/display/INCUBATOR/FirestormProposal

Best regards,
Jerry

Re: [DISCUSSION] Incubating Proposal of Firestorm

Reply via email to