Thanks for the interesting proposal! Another related Apache project is Nemo: https://nemo.apache.org/.
-Gon On Mon, Sep 26, 2022 at 1:01 AM li gang <lgcar...@apache.org> wrote: > Glad to see this proposal, it's an interesting project,good luck. > > Yu Li <car...@gmail.com> 于2022年9月22日周四 11:45写道: > > > Hi All, > > > > I would like to propose Datark [1] as a new apache incubator project, and > > you can find the proposal [2] of Datark for more details. > > > > Datark is an intermediate (shuffle and spilled) data service for big data > > compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost > > performance, stability, and flexibility. It aims at enabling computing > > engines to fully embrace the disaggregated architecture. In a lot of > cases, > > intermediate data depends on large local disks, and is often a major > cause > > of inefficiency, instability, and inflexibility in the lifecycle of a > > distributed job. Datark solves the problems through the following core > > designs: > > > > 1. Push-based shuffle plus partition data aggregation to turn random IO > > access into sequential access. > > 2. FileSystem-like API to support writing spilled data. > > 3. Hierarchical storage from memory to DFS/object store to enable fast > > cache and massive storage space. > > 4. Engine-irrelevant APIs for easy integrating to various engines. > > 5. Extended fault tolerance and data replication to increase reliability > > > > Datark is currently adopted in the production environment at both Alibaba > > and many other companies, serving petabytes of data per day. Beyond that, > > it has more open source users including Shopee, NetEase, Bilibily, BOSS, > > and Synnex. Most of these users have made contributions to the project, > > forming an active community with dozens of developers. > > > > The proposed initial committers are interested in joining ASF to > reinforce > > extensive collaboration and build a more vibrant community. We believe > the > > Datark project will provide tremendous value for the community if it is > > introduced into the Apache incubator. > > > > I will help this project as the champion and many thanks to our four > other > > mentors: > > > > * Becket Qin (j...@apache.org) > > * Duo Zhang (zhang...@apache.org) > > * Lidong Dai (lidong...@apache.org) > > * Willem Jiang (ningji...@apache.org) > > > > FWIW, although with different solutions, the issues Datark aims to > resolve > > have some overlap with Apache Uniffle (incubating) [3]. Actually we > noticed > > this during the discussion phase of Uniffle incubation (when we were also > > preparing for the incubation) and had some open and friendly discussion > to > > see whether there could be a joint force [4], and finally decided to > > develop independently for the time being [5]. > > > > Look forward to your feedback. Thanks. > > > > Best Regards, > > Yu > > > > [1] https://github.com/alibaba/RemoteShuffleService > > [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal > > [3] https://uniffle.apache.org/ > > [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz > > [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw > > > > > -- > > > ------------------------------ > Best Regards > > DolphinScheduler PMC > Gang Li 李岗 > > lgcar...@apache.org > -- Byung-Gon Chun