+1 (binding)
On Sat, Mar 2, 2024 at 8:08 AM vin jake <jakevin...@gmail.com> wrote: > > +1 (binding) > > On Fri, Mar 1, 2024 at 7:33 PM Andrew Lamb <al...@influxdata.com> wrote: > > > Hello, > > > > As we have discussed[1][2] I would like to vote on the proposal to > > create a new Apache Top Level Project for DataFusion. The text of the > > proposed resolution and background document is copy/pasted below > > > > If the community is in favor of this, we plan to submit the resolution > > to the ASF board for approval with the next Arrow report (for the > > April 2024 board meeting). > > > > The vote will be open for at least 7 days. > > > > [ ] +1 Accept this Proposal > > [ ] +0 > > [ ] -1 Do not accept this proposal because... > > > > Andrew > > > > [1] https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341 > > [2] https://github.com/apache/arrow-datafusion/discussions/6475 > > > > ---------- Proposed Resolution --------- > > > > Resolution to Create the Apache DataFusion Project from the Apache > > Arrow DataFusion Sub Project > > > > ============================================================= > > > > X. Establish the Apache DataFusion Project > > > > WHEREAS, the Board of Directors deems it to be in the best > > interests of the Foundation and consistent with the > > Foundation's purpose to establish a Project Management > > Committee charged with the creation and maintenance of > > open-source software related to an extensible query engine > > for distribution at no charge to the public. > > > > NOW, THEREFORE, BE IT RESOLVED, that a Project Management > > Committee (PMC), to be known as the "Apache DataFusion Project", > > be and hereby is established pursuant to Bylaws of the > > Foundation; and be it further > > > > RESOLVED, that the Apache DataFusion Project be and hereby is > > responsible for the creation and maintenance of software > > related to an extensible query engine; and be it further > > > > RESOLVED, that the office of "Vice President, Apache DataFusion" be > > and hereby is created, the person holding such office to > > serve at the direction of the Board of Directors as the chair > > of the Apache DataFusion Project, and to have primary responsibility > > for management of the projects within the scope of > > responsibility of the Apache DataFusion Project; and be it further > > > > RESOLVED, that the persons listed immediately below be and > > hereby are appointed to serve as the initial members of the > > Apache DataFusion Project: > > > > * Andy Grove (agr...@apache.org) > > * Andrew Lamb (al...@apache.org) > > * Daniël Heres (dhe...@apache.org) > > * Jie Wen (jake...@apache.org) > > * Kun Liu (liu...@apache.org) > > * Liang-Chi Hsieh (vii...@apache.org) > > * Qingping Hou: (ho...@apache.org) > > * Wes McKinney(w...@apache.org) > > * Will Jones (wjones...@apache.org) > > > > RESOLVED, that the Apache DataFusion Project be and hereby > > is tasked with the migration and rationalization of the Apache > > Arrow DataFusion sub-project; and be it further > > > > RESOLVED, that all responsibilities pertaining to the Apache > > Arrow DataFusion sub-project encumbered upon the > > Apache Arrow Project are hereafter discharged. > > > > NOW, THEREFORE, BE IT FURTHER RESOLVED, that Andrew Lamb > > be appointed to the office of Vice President, Apache DataFusion, to > > serve in accordance with and subject to the direction of the > > Board of Directors and the Bylaws of the Foundation until > > death, resignation, retirement, removal or disqualification, > > or until a successor is appointed. > > ============================================================= > > > > > > ------- > > > > > > Summary: > > > > We propose creating a new top level project, Apache DataFusion, from > > an existing sub project of Apache Arrow to facilitate additional > > community and project growth. > > > > Abstract > > > > Apache Arrow DataFusion[1] is a very fast, extensible query engine > > for building high-quality data-centric systems in Rust, using the > > Apache Arrow in-memory format. DataFusion offers SQL and Dataframe > > APIs, excellent performance, built-in support for CSV, Parquet, JSON, > > and Avro, extensive customization, and a great community. > > > > [1] https://arrow.apache.org/datafusion/ > > > > > > Proposal > > > > We propose creating a new top level ASF project, Apache DataFusion, > > governed initially by a subset of the Apache Arrow project’s PMC and > > committers. The project’s code is in five existing git repositories, > > currently governed by Apache Arrow which would transfer to the new top > > level project. > > > > Background > > > > When DataFusion was initially donated to the Arrow project, it did not > > have a strong enough community to stand on its own. It has since grown > > significantly, and benefited immensely from being part of Arrow and > > nurturing of the Apache Way, and now has a community strong enough to > > stand on its own and that would benefit from focused governance > > attention. > > > > The community has discussed this idea publicly for more than 6 months > > https://github.com/apache/arrow-datafusion/discussions/6475 and > > briefly on the Arrow PMC mailing list > > https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. As > > of the time of this writing both had exclusively positive reactions. > > > > Several current members of the Arrow PMC are both active contributors > > to DataFusion and understand and believe deeply in the Apache Way, and > > play active governance roles in the Arrow project as PMC members and > > PMC chairs, guiding the community, and releasing software versions. > > With this existing governance experience and structure, the new top > > level project will be able to function well immediately and > > independently. > > > > Overview of DataFusion > > > > Current Status > > > > Meritocracy > > > > DataFusion has been developed as part of Apache Arrow and thus has > > been operating as a meritocracy. Many of the developers of DataFusion > > are Arrow PMC members or committers. The DataFusion project plans to > > continue adding new PMC and committers as the project matures and > > grows. > > > > Community > > > > The DataFusion development team seeks to foster the development and > > user communities. We hope that becoming a separate project will help > > both Arrow and DataFusion communities by being more focused. Focused > > governance will make it easier to grow the community of committers and > > PMC members and make the organization more clear to others. > > > > Alignment > > > > The ASF is a natural host for DataFusion given that it is already the > > home of Arrow, Parquet, and other related distributed system, storage > > and query execution systems. > > > > Project Leadership > > > > Proposed Initial PMC > > > > We propose the following people as the initial DataFusion PMC members. > > This is a subset of the existing Arrow PMC members who contribute to > > DataFusion https://people.apache.org/phonebook.html?unix=arrow > > > > Andy Grove (agrove): Arrow PMC Chair > > Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair > > Daniël Heres (dheres) Arrow PMC > > Jie Wen (jakevin): Arrow PMC, Doris Committer > > Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC > > Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC > > Qingping Hou: (houqp): Arrow PMC > > Wes McKinney(wesm): Arrow PMC, ASF Member > > Will Jones (wjones127): Arrow PMC > > > > We’d like to propose Andrew Lamb as the initial Chair, (and thus ASF > > VP) for the DataFusion project. > > > > Affiliations > > > > Andy Grove (agrove): NVidia > > Andrew Lamb (alamb): InfluxData > > Daniël Heres (dheres): Coralogix > > Jie Wen (jakevin): SelectDB > > Kun Liu (liukun): Ebay > > Liang-Chi Hsieh (viirya): Apple > > Qingping Hou: (houqp): Scribd > > Wes McKinney(wesm): Posit > > Will Jones (wjones127): LanceDB > > > > Proposed Initial Committers > > > > In addition to the PMC, we propose the following people as the initial > > DataFusion committers. This is a subset of the existing Arrow > > committers who contribute to DataFusion > > https://people.apache.org/phonebook.html?unix=arrow > > > > akurmustafa Mustafa Akur (Synnada) > > avantgardner Brent Gardner (Coralogix) > > comphead Oleks V. (Unaffiliated) > > jayzhan Jay Zhan (Unaffiliated) > > jeffreyvo Jeffry Vo (Unaffiliated) > > jiayuliu Liu Jiayu (Airbnb) > > mete Metehan Yildirim (Synnada) > > mingmwang Wang Mingming (Ebay) > > mneumann Marco Neumann (InfluxData) > > nju_yaho Zhong Yanghong (Ebay) > > ozankabak Mehmet Ozan Kabak (Synnada) > > paddyhoran Paddy Horan (Assured Allies) > > rdettai Rémi Dettai (Cloudfuse) > > sunchao Chao Sun (Apple) > > thinkharderdev Daniel Harris (Coralogix) > > tustvold Raphael Taylor-Davies (InfluxData) > > wayne Ruihang Xia (Greptime) > > xudong963 Xudong Wang (ByteDance) > > yjshen Yijie Shen (Space and Time) > > yangjiang Yang Jiang (ebay) > > > > > > Risk Assessments > > > > Naming / Trademarks > > > > As a sub-project of Arrow, the DataFusion name has been used for over > > 4 years without any known issues. A podling name search did not turn > > up any concerns and was approved: > > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 > > > > Legal / IP Clearance > > > > All DataFusion code has either been donated to the Arrow project with > > appropriate IP clearance or has been developed directly under ASF > > processes and procedures. Thus creating a new top level project poses > > no new Legal or IP risks. > > > > Code Extraction > > > > The relevant code is already in 5 separate repositories: > > https://github.com/apache/arrow-datafusion/ > > https://github.com/apache/arrow-datafusion-python > > https://github.com/apache/arrow-ballista > > https://github.com/apache/arrow-ballista-python > > https://github.com/apache/arrow-datafusion-comet > > > > We foresee no issues with code extraction and propose these > > repositories be renamed to reflect top level projects > > > > Note: https://github.com/apache/arrow-rs, the Rust implementation of > > Arrow, would remain part of the Arrow project. > > > > Orphaned Products > > > > DataFusion is known to be used in many open source and commercial > > projects > > https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users > > , > > has had multiple commits daily for several years, and its adoption and > > number of contributors appears to be growing. We do not foresee the > > project being orphaned in the next several years. > > > > Inexperience with Open Source > > > > The proposed PMC has extensive experience with Apache Arrow and other > > Apache projects, and includes PMC members, PMC chairs and an ASF > > Member. The DataFusion PMC and more experienced committers will > > continue to coach new community members who may be less familiar with > > the Apache Way. > > > > Homogeneous Developers > > > > The 9 proposed PMC members are from 9 different employers and the > > proposed committers are similarly distributed across affiliations. No > > specific entity employs more than 3 total proposed developers. > > > > Reliance on Salaried Developers > > > > A substantial amount of work on DataFusion has been by salaried > > developers, but it also has a long tradition of attracting > > contributions from students and hobbyists and we plan no changes in > > contribution structure. > > > > Relationships with Other Apache Products > > > > DataFusion will obviously have a strong relationship with the Arrow > > project given the overlap in people. We don’t foresee close > > collaboration with other projects at this time. > > > > Cryptography > > > > DataFusion does not directly support encryption and there are no > > near-term plans to add support for encryption. Users who need this > > functionality can use the extension APIs. > > > > Required Resources > > > > Mailing Lists > > > > - priv...@datafusion.apache.org for private PMC discussions (with > > moderated subscriptions) > > - d...@datafusion.apache.org > > - comm...@datafusion.apache.org > > - u...@datafusion.apache.org > > > > Version Control > > > > We propose to continue to use git for source control and github for > > hosting and testing resources. > > > > We also need to rename the github repositories to reflect the new top > > level names: > > > > https://github.com/apache/arrow-datafusion/ → apache/datafusion > > https://github.com/apache/arrow-datafusion-python → > > apache/datafusion-python > > https://github.com/apache/arrow-ballista → apache/datafusion-ballista > > https://github.com/apache/arrow-ballista-python → > > apache/datafusion-ballista-python > > https://github.com/apache/arrow-datafusion-comet → apache/datafusion-comet > > > > > > > > Issue Tracking > > > > DataFusion would continue to use github for its issue tracking and > > communications > > > > Other Resources > > > > The existing repositories already make use of existing Apache > > infrastructure, and we expect no change in the initial resource usage. > > As the project continues to grow, we expect continued infrastructure > > demand growth. > > > > > > FAQ: Has a sub project been promoted to a top level project before? > > > > Yes, and it appears to happen commonly. The Arrow project itself was > > created as a top level project from work that started in Apache Drill, > > and there are many sub projects of Hadoop that spun out as their own > > top level projects such as Mahout, Avro and HBase: > > > > https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4 > > > > > > > > Related material: > > Name search request / research for DataFusion: > > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 > > Discussion about this proposal on the arrow mailing list: > > https://lists.apache.org/thread/c150t1s1x0kcb3r03cjyx31kqs5oc341 > > Discussion about which repositories on the arrow mailing list: > > https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q > > Discussion about initial PMC on the arrow mailing list: > > https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b > > Discussion in github about creating a new DataFusion top level > > project: https://github.com/apache/arrow-datafusion/discussions/6475 > > Discussion about graduating on incubator list: > > https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99 > > Original Proposal for the Arrow project: > > https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3 > >