+1 > 2022年9月25日 08:44,MINX Feng <fmx...@outlook.com> 写道: > > It is an interesting project. Good luck to Datark, may this project lives > long and prosper. > > Best wishes! > Ethan > >> 2022年9月22日 11:45,Yu Li <car...@gmail.com> 写道: >> >> Hi All, >> >> I would like to propose Datark [1] as a new apache incubator project, and >> you can find the proposal [2] of Datark for more details. >> >> Datark is an intermediate (shuffle and spilled) data service for big data >> compute engines (Apache Spark, Apache Flink, Apache Hive, etc.) to boost >> performance, stability, and flexibility. It aims at enabling computing >> engines to fully embrace the disaggregated architecture. In a lot of cases, >> intermediate data depends on large local disks, and is often a major cause >> of inefficiency, instability, and inflexibility in the lifecycle of a >> distributed job. Datark solves the problems through the following core >> designs: >> >> 1. Push-based shuffle plus partition data aggregation to turn random IO >> access into sequential access. >> 2. FileSystem-like API to support writing spilled data. >> 3. Hierarchical storage from memory to DFS/object store to enable fast >> cache and massive storage space. >> 4. Engine-irrelevant APIs for easy integrating to various engines. >> 5. Extended fault tolerance and data replication to increase reliability >> >> Datark is currently adopted in the production environment at both Alibaba >> and many other companies, serving petabytes of data per day. Beyond that, >> it has more open source users including Shopee, NetEase, Bilibily, BOSS, >> and Synnex. Most of these users have made contributions to the project, >> forming an active community with dozens of developers. >> >> The proposed initial committers are interested in joining ASF to reinforce >> extensive collaboration and build a more vibrant community. We believe the >> Datark project will provide tremendous value for the community if it is >> introduced into the Apache incubator. >> >> I will help this project as the champion and many thanks to our four other >> mentors: >> >> * Becket Qin (j...@apache.org) >> * Duo Zhang (zhang...@apache.org) >> * Lidong Dai (lidong...@apache.org) >> * Willem Jiang (ningji...@apache.org) >> >> FWIW, although with different solutions, the issues Datark aims to resolve >> have some overlap with Apache Uniffle (incubating) [3]. Actually we noticed >> this during the discussion phase of Uniffle incubation (when we were also >> preparing for the incubation) and had some open and friendly discussion to >> see whether there could be a joint force [4], and finally decided to >> develop independently for the time being [5]. >> >> Look forward to your feedback. Thanks. >> >> Best Regards, >> Yu >> >> [1] https://github.com/alibaba/RemoteShuffleService >> [2] https://cwiki.apache.org/confluence/display/INCUBATOR/DatarkProposal >> [3] https://uniffle.apache.org/ >> [4] https://lists.apache.org/thread/1w74z5f0pb7bhslhzcl5x7rdj9s9objz >> [5] https://lists.apache.org/thread/pg8lzhzc1794x3yloqp169j0mdzqs3yw > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org >
--------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org