Thanks for the interesting proposal!
Another related Apache project is Nemo: https://nemo.apache.org/.

-Gon


On Mon, Sep 26, 2022 at 1:01 AM li gang <lgcar...@apache.org> wrote:

> Glad to see this proposal, it's an interesting project,good luck.
>
> Yu Li <car...@gmail.com> 于2022年9月22日周四 11:45写道:
>
> > Hi All,
> >
> > I would like to propose Datark [1] as a new apache incubator project, and
> > you can find the proposal [2] of Datark for more details.
> >
> > Datark is an intermediate (shuffle and spilled) data service for big data
> > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost
> > performance, stability, and flexibility. It aims at enabling computing
> > engines to fully embrace the disaggregated architecture. In a lot of
> cases,
> > intermediate data depends on large local disks, and is often a major
> cause
> > of inefficiency, instability, and inflexibility in the lifecycle of a
> > distributed job. Datark solves the problems through the following core
> > designs:
> >
> > 1. Push-based shuffle plus partition data aggregation to turn random IO
> > access into sequential access.
> > 2. FileSystem-like API to support writing spilled data.
> > 3. Hierarchical storage from memory to DFS/object store to enable fast
> > cache and massive storage space.
> > 4. Engine-irrelevant APIs for easy integrating to various engines.
> > 5. Extended fault tolerance and data replication to increase reliability
> >
> > Datark is currently adopted in the production environment at both Alibaba
> > and many other companies, serving petabytes of data per day. Beyond that,
> > it has more open source users including Shopee, NetEase, Bilibily, BOSS,
> > and Synnex. Most of these users have made contributions to the project,
> > forming an active community with dozens of developers.
> >
> > The proposed initial committers are interested in joining ASF to
> reinforce
> > extensive collaboration and build a more vibrant community. We believe
> the
> > Datark project will provide tremendous value for the community if it is
> > introduced into the Apache incubator.
> >
> > I will help this project as the champion and many thanks to our four
> other
> > mentors:
> >
> > * Becket Qin (j...@apache.org)
> > * Duo Zhang (zhang...@apache.org)
> > * Lidong Dai (lidong...@apache.org)
> > * Willem Jiang (ningji...@apache.org)
> >
> > FWIW, although with different solutions, the issues Datark aims to
> resolve
> > have some overlap with Apache Uniffle (incubating) [3]. Actually we
> noticed
> > this during the discussion phase of Uniffle incubation (when we were also
> > preparing for the incubation) and had some open and friendly discussion
> to
> > see whether there could be a joint force [4], and finally decided to
> > develop independently for the time being [5].
> >
> > Look forward to your feedback. Thanks.
> >
> > Best Regards,
> > Yu
> >
> > [1] https://github.com/alibaba/RemoteShuffleService
> > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal
> > [3] https://uniffle.apache.org/
> > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz
> > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw
> >
>
>
> --
>
>
> ------------------------------
> Best Regards
>
> DolphinScheduler PMC
> Gang Li 李岗
>
> lgcar...@apache.org
>


-- 
Byung-Gon Chun

Reply via email to