Dear Apache Incubator Community, We propose to contribute Chunjun as an Apache Incubator project. We are still looking for possible Champion and Mentors if anyone would like to volunteer. Thanks a lot. Best Regards, Real-time computing engine team of DTStack.
#Chunjun Proposal ##Abstract Chunjun is a distributed ETL tool and distributed data integration tool. Currently Chunjun is based on Apache Flink. It was initially known as FlinkX and renamed Chunjun on February 22, 2022. - Chunjun codebase: https://github.com/DTStack/chunjun ##Proposal We propose to contribute the Chunjun codebase to the Apache Software Foundation with the intent of forming a productive, meritocratic and open community around Chunjun’s continued development, according to the 'Apache Way'. The Chunjun's source code is already under the Apache License Version 2.0. ##Background We developed Chunjun in DTStack company in 2017, when we needed a low-code development and high-performance data integration tool. It has been an open-source project on GitHub since April 2018. Chunjun is running in DTStack production environment all the time. Chunjun has also been widely used by companies in China, including DTStack (https://www.dtstack.com/), Qihu360(https://www.360.cn/), Iflytek (https://www.iflytek.com/), XPeng Motors (https://en.xiaopeng.com/), WeBank (https://www.webank.com/), Asiainfo(https://asiainfo.com/), Guazi(https://www.guazi.com/), Hello Inc (https://www.hello-inc.com/), etc. Nowadays, Chunjun has a strong community in China. ##Rationale High-performance of Chunjun is based on Apache Flink, and Chunjun can integrate data from different data source. Users only need to configure a JSON file to complete the data reading, transformer, and writing. Users can implement new reader/writer plugins to meet their requirements. Chunjun have implemented plugins that can capture data change for MySQL to restore data for Apache Doris. Chunjun has the following feature: real-time and offline integrate data from different data sources. change data capture(CDC) to merge restore data. resume from broken-point. capture and collect dirty data. limit data transferring rate. thoughput metrics. capture and restore schema evolution. (TODO) ##Current Status###Meritocracy Since Chunjun was open-sourced, many enterprises have adopted Chunjun to build up their data integration system. In return, we have received many issue reports or enhancements from them simultaneously. The codebase is now mainly managed by the development team inside DTStack who's responsible for building internal data integration system too.###Community Chunjun has been building a community around contributors and users to this framework for the last five years. We organized one meetup in 2020. Currently, we communicate in Github issues and in chinese DingTalk group. There are about 3000 people in this group. And we believe that we can get a lot of help from the Apache Flink community too. We will organize a meetup again in 2022.###Core Developers (In alphabetical order) Chao Xu (https://github.com/zoudaokoulife) Gongjiang Tang, (https://github.com/kyo-tom) Huai Yang, (https://github.com/yanghuaiGit) Jiangbo Li, (https://github.com/lijiangbo) Luning Wong, (https://github.com/deadwind4) Luo Li, (https://github.com/kanata163) Sishu Yang, (https://github.com/yangsishu) Tianzhu Wen, (https://github.com/WTZ468071157) Weiliang Hao, (https://github.com/xiuzhu9527) Wenqiang Liu, (https://github.com/meng1222) Xing Liu, (https://github.com/simenliuxing) Yang Lan, (https://github.com/HiLany) Yanquan Lv, (https://github.com/lvyanquan) Yifan Hu, (https://github.com/demotto) Zaiyue Yu, (https://github.com/tonybobam) Zhangwan Zhao, (https://github.com/jiemotongxue) Zhiqiang Li, (https://github.com/ChestnutQiang) They are almost working in real-time computing engine team of DTStack. Only Yifan Hu working for CaoCao Tech. Most of them are Apache Flink contributor. ##Known Risks###Project Name The name of the project is Chunjun. Chunjun comes from mandarin chinese Pinyin "Chun Jun", and it is one of the top ten famous swords in China.###Orphaned products More than 20 contributors and thousands of forks and star further show that Chunjun is actively supported, and we seek to further prosper the community with the aid of Apache. As a consequence, Chunjun is unlikely to be reduced to an orphaned project.###Inexperience with Open Source Many of the Chunjun committers have experience working on open source projects. They are also active contributors to other Apache projects. ###Homogenous Developers The most of core developers are from DTStack, and Chunjun received some bug fixes and enhancements from other developers not working at DTStack. ###Reliance on Salaried Developers Currently, most of core developers are paid to work on Chunjun project by DTStack. We look forward to attracting more people outside DTStack to join this project.###Relationships with Other Apache Products We have integrated with Apache Flink, Apache Hadoop, Apache Common and Apache HttpComponents, Log4J and Maven. Usage of Apache projects related to Chunjun plugin Apache Hive Apache Solr Apache Doris Apache HBase Apache Kudu Apache Kafka Apache Pulsar (TODO)###An Excessive Fascination with the Apache Brand We acknowledge the value and reputation that the Apache brand would bring to Chunjun. However, our primary interest is in the excellent community provided by Apache Software Foundation, in which all the projects could gain stability for long-term development. ##Documentation A complete set of documents is provided on GitHub, including English and Simplified Chinese versions. English: https://github.com/DTStack/chunjun/blob/master/README.md Chinese: https://github.com/DTStack/chunjun/blob/master/README_CH.md ##Initial Code https://github.com/DTStack/chunjun ##Initial Source and Intellectual Property Submission Plan The codebase is already licensed under the Apache License 2.0 and the copyright is assigned to DTStack. If the project enters incubator, DTStack will transfer the source code & trademark ownership to ASF via a Software Grant Agreement. Our initial committers will submit iCLA(s), SGA, and CCLA(s). ##External DependenciesApache-2.0 licenses Apache Avro Apache Commons Apache Curator Apache Flink Apache Hadoop Apache HttpComponents Apache Log4j Gson Guava Jackson Powermock PrometheusEclipse Distribution License JUnitEPL licenses LogbackMIT licenses Mockito SLF4J ##Required Resources ###Git Repositories https://github.com/apache/incubator-chunjun###Issue Tracking The community would like to continue using GitHub Issues.###Mailing List priv...@chunjun.incubator.apache.org d...@chunjun.incubator.apache.org comm...@chunjun.incubator.apache.org###Continuous Integration tool GitHub Action ##Initial Committers (In alphabetical order) Chao Xu (https://github.com/zoudaokoulife, xuchao at dtstack dot com) Luning Wong (https://github.com/deadwind4, gfeng48 at gmail dot com) Sishu Yang (https://github.com/yangsishu, sishu at dtstack dot com) Yang Huai (https://github.com/yanghuaiGit, dujie at dtstack dot com) Zhiqiang Li (https://github.com/ChestnutQiang, wujuan at dtstack dot com) ##Affiliations The initial committers are employees of DTStack. The nominated mentors and champion are employees of TODO. ##Sponsors ###Champion TODO ###Nominated Mentors TODO