Upon reviewing the board report template, I am planning on the following schedule: 1. I'll leave this proposal for another few weeks to gather any additional input 2. In early February 2024 I'll start a formal vote thread on the dev@ mailing list for this proposal 3. If the vote passes, I'll submit a proposed resolution to the ASF board for their meeting in April 2024 using the pre-existing template[1]
[1] https://svn.apache.org/repos/private/committers/board/templates/subproject-tlp-resolution.txt On Wed, Dec 27, 2023 at 6:32 PM L. C. Hsieh <vii...@gmail.com> wrote: > Thanks for writing the proposal. It looks great to me too. > I added a few comments on it. > > On Wed, Dec 27, 2023 at 3:05 PM Andy Grove <andygrov...@gmail.com> wrote: > > > > Thank you for creating the draft proposal, Andrew. I have reviewed this > and > > I think it looks great. > > > > Andy. > > > > On Wed, Dec 27, 2023 at 3:19 PM Andrew Lamb <al...@influxdata.com> > wrote: > > > > > I have created a draft proposal [1] to break DataFusion out to its own > top > > > level project. Please provide your feedback and suggestions. > > > > > > The proposal is included at the end of this email and in this Google > Doc: > > > > > > > https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g > > > . > > > > > > Feel free to respond to this email or comment / make suggestions > directly > > > on the document. > > > > > > I would be especially grateful if people could review and comment on > the > > > proposed list of committers and PMC members. > > > > > > I hope everyone is not getting sick of hearing about this, but I think > in > > > this case it is better to over communicate than risk surprises. > > > > > > Andrew > > > > > > [1] https://github.com/apache/arrow-datafusion/issues/8491 > > > > > > > > > ---------- > > > > > > DataFusion Top Level Project Proposal > > > Dec 27, 2023 > > > > > > [Editor’s note: This document is based on the proposal to the ASF > board to > > > create the Arrow project. One it is been reviewed, we plan to send it > to > > > the ASF board sometime in January or February 2024 for their > consideration] > > > > > > To: The ASF (bo...@apache.org) > > > > > > Summary: > > > > > > We propose creating a new top level project, Apache DataFusion, from an > > > existing sub project of Apache Arrow to facilitate additional > community and > > > project growth. > > > > > > ---- > > > Apache DataFusion for Apache Top Level Project > > > > > > Abstract > > > > > > Apache Arrow DataFusion[1] is a very fast, extensible query engine for > > > building high-quality data-centric systems in Rust, using the Apache > Arrow > > > in-memory format. DataFusion offers SQL and Dataframe APIs, excellent > > > performance, built-in support for CSV, Parquet, JSON, and Avro, > extensive > > > customization, and a great community. > > > > > > [1] https://arrow.apache.org/datafusion/ > > > > > > > > > Proposal > > > > > > We propose creating a new top level ASF project, Apache DataFusion, > > > governed initially by a subset of the Arrow project’s PMC and > committers. > > > The project’s code is in four existing git repositories, currently > governed > > > by Apache Arrow which would transfer to the new top level project. > > > > > > Background > > > > > > When DataFusion was initially donated to the Arrow project, it did not > have > > > a strong enough community to stand on its own. It has since grown > > > significantly, and benefited immensely from being part of Arrow and > > > nurturing of the Apache Way, and now has a community strong enough to > stand > > > on its own and that would benefit from focused governance attention. > > > > > > The community has discussed this idea publicly for more than 6 months > > > https://github.com/apache/arrow-datafusion/discussions/6475 and > briefly > > > on > > > the Arrow PMC mailing list > > > https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As > of > > > the > > > time of this writing both had exclusively positive reactions. > > > > > > Several current members of the Arrow PMC are both active contributors > to > > > DataFusion and understand and believe deeply in the Apache Way, and > play > > > active governance roles in the Arrow project as PMC members and PMC > chairs, > > > guiding the community, and releasing software versions. With this > existing > > > governance experience and structure, the new top level project will be > able > > > to function well immediately and independently. > > > > > > Overview of DataFusion > > > > > > Current Status > > > > > > Meritocracy > > > > > > DataFusion has been developed as part of Apache Arrow and thus has been > > > operating as a meritocracy. Many of the developers of DataFusion are > Arrow > > > PMC members or committers. The DataFusion project plans to continue > adding > > > new PMC and committers as the project matures and grows. > > > > > > Community > > > > > > The DataFusion development team seeks to foster the development and > user > > > communities. We hope that becoming a separate project will help both > Arrow > > > and DataFusion communities by being more focused. Focused governance > will > > > make it easier to grow the community of committers and PMC members and > make > > > the organization more clear to others. > > > > > > Alignment > > > > > > The ASF is a natural host for DataFusion given that it is already the > home > > > of Arrow, Parquet, and other related distributed system, storage and > query > > > execution systems. > > > > > > Project Leadership > > > > > > Proposed Initial PMC > > > > > > We propose the following people as the initial DataFusion PMC members. > This > > > is a subset of the existing Arrow PMC members who contribute to > DataFusion > > > https://people.apache.org/phonebook.html?unix=arrow > > > > > > Andy Grove (agrove): Arrow PMC Chair > > > Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair > > > Daniël Heres (dheres) Arrow PMC > > > Jie Wen (jakevin): Arrow PMC, Doris Committer > > > Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC > > > Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC > > > Qingping Hou: (houqp): Arrow PMC, Doris Committer > > > Will Jones (wjones127): Arrow PMC > > > > > > We’d like to propose Andrew Lamb as the initial Chair, (and thus ASF > VP) > > > for the DataFusion project. > > > > > > Affiliations > > > > > > Andy Grove (agrove): NVidia > > > Andrew Lamb (alamb): InfluxData > > > Daniël Heres (dheres): Coralogix > > > Jie Wen (jakevin): SelectDB > > > Kun Liu (liukun): Ebay > > > Liang-Chi Hsieh (viirya): Apple > > > Qingping Hou: (houqp): Scribd > > > Will Jones (wjones127): VoltronData > > > > > > Proposed Initial Committers > > > > > > In addition to the PMC, we propose the following people as the initial > > > DataFusion committers. This is a subset of the existing Arrow > committers > > > who contribute to DataFusion > > > https://people.apache.org/phonebook.html?unix=arrow > > > > > > akurmustafa Mustafa Akur (Synnada) > > > avantgardner Brent Gardner (Coralogix) > > > comphead Oleks V. (Unaffiliated) > > > jiayuliu Liu Jiayu (Airbnb) > > > mete Metehan Yildirim (Synnada) > > > mingmwang Wang Mingming (Ebay) > > > mneumann Marco Neumann (InfluxData) > > > nju_yaho Zhong Yanghong (Ebay) > > > ozankabak Mehmet Ozan Kabak (Synnada) > > > paddyhoran Paddy Horan (Assured Allies) > > > rdettai Rémi Dettai (Cloudfuse) > > > sunchao Sun Chao (Apple) > > > thinkharderdev Daniel Harris (Coralogix) > > > tustvold Raphael Taylor-Davies (InfluxData) > > > viirya L. C. Hsieh (Apple) > > > wayne Ruihang Xia (Greptime) > > > xudong963 Xudong Wang (ByteDance) > > > yjshen Yijie Shen (Space and Time) > > > > > > > > > Risk Assessments > > > > > > Naming / Trademarks > > > > > > As a sub-project of Arrow, the DataFusion name has been used for over 4 > > > years without any known issues. A podling name search has thus far not > > > turned up any concerns: > > > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 > > > > > > Legal / IP Clearance > > > > > > All DataFusion code has either been donated to the Arrow project with > > > appropriate IP clearance or has been developed directly under ASF > > > processes and procedures. Thus creating a new top level project poses > no > > > new Legal or IP risks. > > > > > > Code Extraction > > > > > > The relevant code is already in 4 separate repositories: > > > https://github.com/apache/arrow-datafusion/ > > > https://github.com/apache/arrow-datafusion-python > > > https://github.com/apache/arrow-ballista > > > https://github.com/apache/arrow-ballista-python > > > > > > We foresee no issues with code extraction and propose these > repositories be > > > respectively renamed to reflect top level projects: > > > https://github.com/apache/datafusion/ > > > https://github.com/apache/datafusion-python > > > https://github.com/apache/datafusion-ballista > > > https://github.com/apache/datafusion-ballista-python > > > > > > Note: https://github.com/apache/arrow-rs, the Rust implementation of > > > Arrow, would remain part of the Arrow project. > > > > > > Orphaned Products > > > > > > DataFusion is known to be used in many open source and commercial > projects > > > > > > > https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users > > > , > > > has had multiple commits daily for several years, and its adoption and > > > number of contributors appears to be growing. > > > > > > Inexperience with Open Source > > > > > > The proposed PMC has extensive experience with Apache Arrow and other > > > Apache projects, and includes PMC members and PMC chairs. The > DataFusion > > > PMC and more experienced committers will continue to coach new > community > > > members who may be less familiar with the Apache Way. > > > > > > Homogeneous Developers > > > > > > The 8 proposed PMC members are from 8 different employers and the > proposed > > > committers are similarly distributed across affiliations. No specific > > > entity employs more than 3 total proposed developers. > > > > > > Reliance on Salaried Developers > > > > > > A substantial amount of work on DataFusion has been by salaried > developers, > > > but it also has a long tradition of attracting contributions from > students > > > and hobbyists and we plan no changes in contribution structure. > > > > > > Relationships with Other Apache Products > > > > > > DataFusion will obviously have a strong relationship with the Arrow > project > > > given the overlap in people. We don’t foresee close collaboration with > > > other projects at this time. > > > > > > Cryptography > > > > > > DataFusion does not directly support encryption and there are no > near-term > > > plans to add support for encryption. Users who need this functionality > can > > > use the extension APIs. > > > > > > Required Resources > > > > > > Mailing Lists > > > > > > - private@datafusion for private PMC discussions (with moderated > > > subscriptions) > > > - dev@datafusion > > > - commits@datafusion > > > > > > Version Control > > > > > > We propose to continue to use git for source control and gitub for > hosting > > > and testing resources. > > > > > > Issue Tracking > > > > > > DataFusion would continue to use github for its issue tracking and > > > communications > > > > > > Other Resources > > > > > > The existing repositories already make use of existing Apache > > > infrastructure, and we expect no change in the initial resource usage. > As > > > the project continues to grow, we expect continued infrastructure > demand > > > growth. > > > > > > > > > FAQ: Has a sub project been promoted to a top level project before? > > > > > > Yes, and it appears to happen commonly. The Arrow project itself was > > > created as a top level project from work that started in Apache Drill, > and > > > there are many sub projects of Hadoop that spun out as their own top > level > > > projects such as Mahout, Avro and HBase: > > > > > > > https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4 > > > > > > > > > > > > > > > Related material: > > > Name search request / research for DataFusion: > > > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 > > > Discussion about which repositories on the arrow mailing list: > > > https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q > > > Discussion about initial PMC on the arrow mailing list: > > > https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b > > > Discussion about creating a new DataFusion top level project: > > > https://github.com/apache/arrow-datafusion/discussions/6475 > > > Discussion about graduating on incubator list: > > > https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99 > > > Original Proposal for the Arrow project: > > > https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3 > > > >