+1 (non binding) Regards JB
On Fri, Mar 1, 2024 at 12:33 PM Andrew Lamb <al...@influxdata.com> wrote: > > Hello, > > As we have discussed[1][2] I would like to vote on the proposal to > create a new Apache Top Level Project for DataFusion. The text of the > proposed resolution and background document is copy/pasted below > > If the community is in favor of this, we plan to submit the resolution > to the ASF board for approval with the next Arrow report (for the > April 2024 board meeting). > > The vote will be open for at least 7 days. > > [ ] +1 Accept this Proposal > [ ] +0 > [ ] -1 Do not accept this proposal because... > > Andrew > > [1] https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341 > [2] https://github.com/apache/arrow-datafusion/discussions/6475 > > ---------- Proposed Resolution --------- > > Resolution to Create the Apache DataFusion Project from the Apache > Arrow DataFusion Sub Project > > ============================================================= > > X. Establish the Apache DataFusion Project > > WHEREAS, the Board of Directors deems it to be in the best > interests of the Foundation and consistent with the > Foundation's purpose to establish a Project Management > Committee charged with the creation and maintenance of > open-source software related to an extensible query engine > for distribution at no charge to the public. > > NOW, THEREFORE, BE IT RESOLVED, that a Project Management > Committee (PMC), to be known as the "Apache DataFusion Project", > be and hereby is established pursuant to Bylaws of the > Foundation; and be it further > > RESOLVED, that the Apache DataFusion Project be and hereby is > responsible for the creation and maintenance of software > related to an extensible query engine; and be it further > > RESOLVED, that the office of "Vice President, Apache DataFusion" be > and hereby is created, the person holding such office to > serve at the direction of the Board of Directors as the chair > of the Apache DataFusion Project, and to have primary responsibility > for management of the projects within the scope of > responsibility of the Apache DataFusion Project; and be it further > > RESOLVED, that the persons listed immediately below be and > hereby are appointed to serve as the initial members of the > Apache DataFusion Project: > > * Andy Grove (agr...@apache.org) > * Andrew Lamb (al...@apache.org) > * Daniël Heres (dhe...@apache.org) > * Jie Wen (jake...@apache.org) > * Kun Liu (liu...@apache.org) > * Liang-Chi Hsieh (vii...@apache.org) > * Qingping Hou: (ho...@apache.org) > * Wes McKinney(w...@apache.org) > * Will Jones (wjones...@apache.org) > > RESOLVED, that the Apache DataFusion Project be and hereby > is tasked with the migration and rationalization of the Apache > Arrow DataFusion sub-project; and be it further > > RESOLVED, that all responsibilities pertaining to the Apache > Arrow DataFusion sub-project encumbered upon the > Apache Arrow Project are hereafter discharged. > > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb > be appointed to the office of Vice President, Apache DataFusion, to > serve in accordance with and subject to the direction of the > Board of Directors and the Bylaws of the Foundation until > death, resignation, retirement, removal or disqualification, > or until a successor is appointed. > ============================================================= > > > ------- > > > Summary: > > We propose creating a new top level project, Apache DataFusion, from > an existing sub project of Apache Arrow to facilitate additional > community and project growth. > > Abstract > > Apache Arrow DataFusion[1] is a very fast, extensible query engine > for building high-quality data-centric systems in Rust, using the > Apache Arrow in-memory format. DataFusion offers SQL and Dataframe > APIs, excellent performance, built-in support for CSV, Parquet, JSON, > and Avro, extensive customization, and a great community. > > [1] https://arrow.apache.org/datafusion/ > > > Proposal > > We propose creating a new top level ASF project, Apache DataFusion, > governed initially by a subset of the Apache Arrow project’s PMC and > committers. The project’s code is in five existing git repositories, > currently governed by Apache Arrow which would transfer to the new top > level project. > > Background > > When DataFusion was initially donated to the Arrow project, it did not > have a strong enough community to stand on its own. It has since grown > significantly, and benefited immensely from being part of Arrow and > nurturing of the Apache Way, and now has a community strong enough to > stand on its own and that would benefit from focused governance > attention. > > The community has discussed this idea publicly for more than 6 months > https://github.com/apache/arrow-datafusion/discussions/6475 and > briefly on the Arrow PMC mailing list > https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As > of the time of this writing both had exclusively positive reactions. > > Several current members of the Arrow PMC are both active contributors > to DataFusion and understand and believe deeply in the Apache Way, and > play active governance roles in the Arrow project as PMC members and > PMC chairs, guiding the community, and releasing software versions. > With this existing governance experience and structure, the new top > level project will be able to function well immediately and > independently. > > Overview of DataFusion > > Current Status > > Meritocracy > > DataFusion has been developed as part of Apache Arrow and thus has > been operating as a meritocracy. Many of the developers of DataFusion > are Arrow PMC members or committers. The DataFusion project plans to > continue adding new PMC and committers as the project matures and > grows. > > Community > > The DataFusion development team seeks to foster the development and > user communities. We hope that becoming a separate project will help > both Arrow and DataFusion communities by being more focused. Focused > governance will make it easier to grow the community of committers and > PMC members and make the organization more clear to others. > > Alignment > > The ASF is a natural host for DataFusion given that it is already the > home of Arrow, Parquet, and other related distributed system, storage > and query execution systems. > > Project Leadership > > Proposed Initial PMC > > We propose the following people as the initial DataFusion PMC members. > This is a subset of the existing Arrow PMC members who contribute to > DataFusion https://people.apache.org/phonebook.html?unix=arrow > > Andy Grove (agrove): Arrow PMC Chair > Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair > Daniël Heres (dheres) Arrow PMC > Jie Wen (jakevin): Arrow PMC, Doris Committer > Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC > Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC > Qingping Hou: (houqp): Arrow PMC > Wes McKinney(wesm): Arrow PMC, ASF Member > Will Jones (wjones127): Arrow PMC > > We’d like to propose Andrew Lamb as the initial Chair, (and thus ASF > VP) for the DataFusion project. > > Affiliations > > Andy Grove (agrove): NVidia > Andrew Lamb (alamb): InfluxData > Daniël Heres (dheres): Coralogix > Jie Wen (jakevin): SelectDB > Kun Liu (liukun): Ebay > Liang-Chi Hsieh (viirya): Apple > Qingping Hou: (houqp): Scribd > Wes McKinney(wesm): Posit > Will Jones (wjones127): LanceDB > > Proposed Initial Committers > > In addition to the PMC, we propose the following people as the initial > DataFusion committers. This is a subset of the existing Arrow > committers who contribute to DataFusion > https://people.apache.org/phonebook.html?unix=arrow > > akurmustafa Mustafa Akur (Synnada) > avantgardner Brent Gardner (Coralogix) > comphead Oleks V. (Unaffiliated) > jayzhan Jay Zhan (Unaffiliated) > jeffreyvo Jeffry Vo (Unaffiliated) > jiayuliu Liu Jiayu (Airbnb) > mete Metehan Yildirim (Synnada) > mingmwang Wang Mingming (Ebay) > mneumann Marco Neumann (InfluxData) > nju_yaho Zhong Yanghong (Ebay) > ozankabak Mehmet Ozan Kabak (Synnada) > paddyhoran Paddy Horan (Assured Allies) > rdettai Rémi Dettai (Cloudfuse) > sunchao Chao Sun (Apple) > thinkharderdev Daniel Harris (Coralogix) > tustvold Raphael Taylor-Davies (InfluxData) > wayne Ruihang Xia (Greptime) > xudong963 Xudong Wang (ByteDance) > yjshen Yijie Shen (Space and Time) > yangjiang Yang Jiang (ebay) > > > Risk Assessments > > Naming / Trademarks > > As a sub-project of Arrow, the DataFusion name has been used for over > 4 years without any known issues. A podling name search did not turn > up any concerns and was approved: > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 > > Legal / IP Clearance > > All DataFusion code has either been donated to the Arrow project with > appropriate IP clearance or has been developed directly under ASF > processes and procedures. Thus creating a new top level project poses > no new Legal or IP risks. > > Code Extraction > > The relevant code is already in 5 separate repositories: > https://github.com/apache/arrow-datafusion/ > https://github.com/apache/arrow-datafusion-python > https://github.com/apache/arrow-ballista > https://github.com/apache/arrow-ballista-python > https://github.com/apache/arrow-datafusion-comet > > We foresee no issues with code extraction and propose these > repositories be renamed to reflect top level projects > > Note: https://github.com/apache/arrow-rs, the Rust implementation of > Arrow, would remain part of the Arrow project. > > Orphaned Products > > DataFusion is known to be used in many open source and commercial > projects > https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users, > has had multiple commits daily for several years, and its adoption and > number of contributors appears to be growing. We do not foresee the > project being orphaned in the next several years. > > Inexperience with Open Source > > The proposed PMC has extensive experience with Apache Arrow and other > Apache projects, and includes PMC members, PMC chairs and an ASF > Member. The DataFusion PMC and more experienced committers will > continue to coach new community members who may be less familiar with > the Apache Way. > > Homogeneous Developers > > The 9 proposed PMC members are from 9 different employers and the > proposed committers are similarly distributed across affiliations. No > specific entity employs more than 3 total proposed developers. > > Reliance on Salaried Developers > > A substantial amount of work on DataFusion has been by salaried > developers, but it also has a long tradition of attracting > contributions from students and hobbyists and we plan no changes in > contribution structure. > > Relationships with Other Apache Products > > DataFusion will obviously have a strong relationship with the Arrow > project given the overlap in people. We don’t foresee close > collaboration with other projects at this time. > > Cryptography > > DataFusion does not directly support encryption and there are no > near-term plans to add support for encryption. Users who need this > functionality can use the extension APIs. > > Required Resources > > Mailing Lists > > - priv...@datafusion.apache.org for private PMC discussions (with > moderated subscriptions) > - d...@datafusion.apache.org > - comm...@datafusion.apache.org > - u...@datafusion.apache.org > > Version Control > > We propose to continue to use git for source control and github for > hosting and testing resources. > > We also need to rename the github repositories to reflect the new top > level names: > > https://github.com/apache/arrow-datafusion/ → apache/datafusion > https://github.com/apache/arrow-datafusion-python → apache/datafusion-python > https://github.com/apache/arrow-ballista → apache/datafusion-ballista > https://github.com/apache/arrow-ballista-python → > apache/datafusion-ballista-python > https://github.com/apache/arrow-datafusion-comet → apache/datafusion-comet > > > > Issue Tracking > > DataFusion would continue to use github for its issue tracking and > communications > > Other Resources > > The existing repositories already make use of existing Apache > infrastructure, and we expect no change in the initial resource usage. > As the project continues to grow, we expect continued infrastructure > demand growth. > > > FAQ: Has a sub project been promoted to a top level project before? > > Yes, and it appears to happen commonly. The Arrow project itself was > created as a top level project from work that started in Apache Drill, > and there are many sub projects of Hadoop that spun out as their own > top level projects such as Mahout, Avro and HBase: > https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4 > > > > Related material: > Name search request / research for DataFusion: > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 > Discussion about this proposal on the arrow mailing list: > https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341 > Discussion about which repositories on the arrow mailing list: > https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q > Discussion about initial PMC on the arrow mailing list: > https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b > Discussion in github about creating a new DataFusion top level > project: https://github.com/apache/arrow-datafusion/discussions/6475 > Discussion about graduating on incubator list: > https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99 > Original Proposal for the Arrow project: > https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3