Thanks JB, I did do a name search and posted the results here [1]
However, I am not sure what the next steps for that particular process is (like does someone have to approve it, for example?) Any insight you could provide would be greatly appreciated Andrew [1] https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 On Fri, Jan 5, 2024 at 7:55 AM Jean-Baptiste Onofré <j...@nanthrax.net> wrote: > Hi Andrew, > > I did a quick review on the doc and it looks good to me. I just added > a question about name search (DataFusion will probably work as TLP, > but we have to check as we have a new Apache name moving from Arrow > DataFusion to DataFusion). > > Please let me know if I can help on that. > > Thanks ! > Regards > JB > > On Fri, Jan 5, 2024 at 12:26 PM Andrew Lamb <al...@influxdata.com> wrote: > > > > Upon reviewing the board report template, I am planning on the following > > schedule: > > 1. I'll leave this proposal for another few weeks to gather any > additional > > input > > 2. In early February 2024 I'll start a formal vote thread on the dev@ > > mailing list for this proposal > > 3. If the vote passes, I'll submit a proposed resolution to the ASF board > > for their meeting in April 2024 using the pre-existing template[1] > > > > > > [1] > > > https://svn.apache.org/repos/private/committers/board/templates/subproject-tlp-resolution.txt > > > > On Wed, Dec 27, 2023 at 6:32 PM L. C. Hsieh <vii...@gmail.com> wrote: > > > > > Thanks for writing the proposal. It looks great to me too. > > > I added a few comments on it. > > > > > > On Wed, Dec 27, 2023 at 3:05 PM Andy Grove <andygrov...@gmail.com> > wrote: > > > > > > > > Thank you for creating the draft proposal, Andrew. I have reviewed > this > > > and > > > > I think it looks great. > > > > > > > > Andy. > > > > > > > > On Wed, Dec 27, 2023 at 3:19 PM Andrew Lamb <al...@influxdata.com> > > > wrote: > > > > > > > > > I have created a draft proposal [1] to break DataFusion out to its > own > > > top > > > > > level project. Please provide your feedback and suggestions. > > > > > > > > > > The proposal is included at the end of this email and in this > Google > > > Doc: > > > > > > > > > > > > > > https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g > > > > > . > > > > > > > > > > Feel free to respond to this email or comment / make suggestions > > > directly > > > > > on the document. > > > > > > > > > > I would be especially grateful if people could review and comment > on > > > the > > > > > proposed list of committers and PMC members. > > > > > > > > > > I hope everyone is not getting sick of hearing about this, but I > think > > > in > > > > > this case it is better to over communicate than risk surprises. > > > > > > > > > > Andrew > > > > > > > > > > [1] https://github.com/apache/arrow-datafusion/issues/8491 > > > > > > > > > > > > > > > ---------- > > > > > > > > > > DataFusion Top Level Project Proposal > > > > > Dec 27, 2023 > > > > > > > > > > [Editor’s note: This document is based on the proposal to the ASF > > > board to > > > > > create the Arrow project. One it is been reviewed, we plan to send > it > > > to > > > > > the ASF board sometime in January or February 2024 for their > > > consideration] > > > > > > > > > > To: The ASF (bo...@apache.org) > > > > > > > > > > Summary: > > > > > > > > > > We propose creating a new top level project, Apache DataFusion, > from an > > > > > existing sub project of Apache Arrow to facilitate additional > > > community and > > > > > project growth. > > > > > > > > > > ---- > > > > > Apache DataFusion for Apache Top Level Project > > > > > > > > > > Abstract > > > > > > > > > > Apache Arrow DataFusion[1] is a very fast, extensible query > engine for > > > > > building high-quality data-centric systems in Rust, using the > Apache > > > Arrow > > > > > in-memory format. DataFusion offers SQL and Dataframe APIs, > excellent > > > > > performance, built-in support for CSV, Parquet, JSON, and Avro, > > > extensive > > > > > customization, and a great community. > > > > > > > > > > [1] https://arrow.apache.org/datafusion/ > > > > > > > > > > > > > > > Proposal > > > > > > > > > > We propose creating a new top level ASF project, Apache DataFusion, > > > > > governed initially by a subset of the Arrow project’s PMC and > > > committers. > > > > > The project’s code is in four existing git repositories, currently > > > governed > > > > > by Apache Arrow which would transfer to the new top level project. > > > > > > > > > > Background > > > > > > > > > > When DataFusion was initially donated to the Arrow project, it did > not > > > have > > > > > a strong enough community to stand on its own. It has since grown > > > > > significantly, and benefited immensely from being part of Arrow and > > > > > nurturing of the Apache Way, and now has a community strong enough > to > > > stand > > > > > on its own and that would benefit from focused governance > attention. > > > > > > > > > > The community has discussed this idea publicly for more than 6 > months > > > > > https://github.com/apache/arrow-datafusion/discussions/6475 and > > > briefly > > > > > on > > > > > the Arrow PMC mailing list > > > > > https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. > As > > > of > > > > > the > > > > > time of this writing both had exclusively positive reactions. > > > > > > > > > > Several current members of the Arrow PMC are both active > contributors > > > to > > > > > DataFusion and understand and believe deeply in the Apache Way, and > > > play > > > > > active governance roles in the Arrow project as PMC members and PMC > > > chairs, > > > > > guiding the community, and releasing software versions. With this > > > existing > > > > > governance experience and structure, the new top level project > will be > > > able > > > > > to function well immediately and independently. > > > > > > > > > > Overview of DataFusion > > > > > > > > > > Current Status > > > > > > > > > > Meritocracy > > > > > > > > > > DataFusion has been developed as part of Apache Arrow and thus has > been > > > > > operating as a meritocracy. Many of the developers of DataFusion > are > > > Arrow > > > > > PMC members or committers. The DataFusion project plans to continue > > > adding > > > > > new PMC and committers as the project matures and grows. > > > > > > > > > > Community > > > > > > > > > > The DataFusion development team seeks to foster the development and > > > user > > > > > communities. We hope that becoming a separate project will help > both > > > Arrow > > > > > and DataFusion communities by being more focused. Focused > governance > > > will > > > > > make it easier to grow the community of committers and PMC members > and > > > make > > > > > the organization more clear to others. > > > > > > > > > > Alignment > > > > > > > > > > The ASF is a natural host for DataFusion given that it is already > the > > > home > > > > > of Arrow, Parquet, and other related distributed system, storage > and > > > query > > > > > execution systems. > > > > > > > > > > Project Leadership > > > > > > > > > > Proposed Initial PMC > > > > > > > > > > We propose the following people as the initial DataFusion PMC > members. > > > This > > > > > is a subset of the existing Arrow PMC members who contribute to > > > DataFusion > > > > > https://people.apache.org/phonebook.html?unix=arrow > > > > > > > > > > Andy Grove (agrove): Arrow PMC Chair > > > > > Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair > > > > > Daniël Heres (dheres) Arrow PMC > > > > > Jie Wen (jakevin): Arrow PMC, Doris Committer > > > > > Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC > > > > > Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC > > > > > Qingping Hou: (houqp): Arrow PMC, Doris Committer > > > > > Will Jones (wjones127): Arrow PMC > > > > > > > > > > We’d like to propose Andrew Lamb as the initial Chair, (and thus > ASF > > > VP) > > > > > for the DataFusion project. > > > > > > > > > > Affiliations > > > > > > > > > > Andy Grove (agrove): NVidia > > > > > Andrew Lamb (alamb): InfluxData > > > > > Daniël Heres (dheres): Coralogix > > > > > Jie Wen (jakevin): SelectDB > > > > > Kun Liu (liukun): Ebay > > > > > Liang-Chi Hsieh (viirya): Apple > > > > > Qingping Hou: (houqp): Scribd > > > > > Will Jones (wjones127): VoltronData > > > > > > > > > > Proposed Initial Committers > > > > > > > > > > In addition to the PMC, we propose the following people as the > initial > > > > > DataFusion committers. This is a subset of the existing Arrow > > > committers > > > > > who contribute to DataFusion > > > > > https://people.apache.org/phonebook.html?unix=arrow > > > > > > > > > > akurmustafa Mustafa Akur (Synnada) > > > > > avantgardner Brent Gardner (Coralogix) > > > > > comphead Oleks V. (Unaffiliated) > > > > > jiayuliu Liu Jiayu (Airbnb) > > > > > mete Metehan Yildirim (Synnada) > > > > > mingmwang Wang Mingming (Ebay) > > > > > mneumann Marco Neumann (InfluxData) > > > > > nju_yaho Zhong Yanghong (Ebay) > > > > > ozankabak Mehmet Ozan Kabak (Synnada) > > > > > paddyhoran Paddy Horan (Assured Allies) > > > > > rdettai Rémi Dettai (Cloudfuse) > > > > > sunchao Sun Chao (Apple) > > > > > thinkharderdev Daniel Harris (Coralogix) > > > > > tustvold Raphael Taylor-Davies (InfluxData) > > > > > viirya L. C. Hsieh (Apple) > > > > > wayne Ruihang Xia (Greptime) > > > > > xudong963 Xudong Wang (ByteDance) > > > > > yjshen Yijie Shen (Space and Time) > > > > > > > > > > > > > > > Risk Assessments > > > > > > > > > > Naming / Trademarks > > > > > > > > > > As a sub-project of Arrow, the DataFusion name has been used for > over 4 > > > > > years without any known issues. A podling name search has thus far > not > > > > > turned up any concerns: > > > > > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 > > > > > > > > > > Legal / IP Clearance > > > > > > > > > > All DataFusion code has either been donated to the Arrow project > with > > > > > appropriate IP clearance or has been developed directly under ASF > > > > > processes and procedures. Thus creating a new top level project > poses > > > no > > > > > new Legal or IP risks. > > > > > > > > > > Code Extraction > > > > > > > > > > The relevant code is already in 4 separate repositories: > > > > > https://github.com/apache/arrow-datafusion/ > > > > > https://github.com/apache/arrow-datafusion-python > > > > > https://github.com/apache/arrow-ballista > > > > > https://github.com/apache/arrow-ballista-python > > > > > > > > > > We foresee no issues with code extraction and propose these > > > repositories be > > > > > respectively renamed to reflect top level projects: > > > > > https://github.com/apache/datafusion/ > > > > > https://github.com/apache/datafusion-python > > > > > https://github.com/apache/datafusion-ballista > > > > > https://github.com/apache/datafusion-ballista-python > > > > > > > > > > Note: https://github.com/apache/arrow-rs, the Rust > implementation of > > > > > Arrow, would remain part of the Arrow project. > > > > > > > > > > Orphaned Products > > > > > > > > > > DataFusion is known to be used in many open source and commercial > > > projects > > > > > > > > > > > > > > https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users > > > > > , > > > > > has had multiple commits daily for several years, and its adoption > and > > > > > number of contributors appears to be growing. > > > > > > > > > > Inexperience with Open Source > > > > > > > > > > The proposed PMC has extensive experience with Apache Arrow and > other > > > > > Apache projects, and includes PMC members and PMC chairs. The > > > DataFusion > > > > > PMC and more experienced committers will continue to coach new > > > community > > > > > members who may be less familiar with the Apache Way. > > > > > > > > > > Homogeneous Developers > > > > > > > > > > The 8 proposed PMC members are from 8 different employers and the > > > proposed > > > > > committers are similarly distributed across affiliations. No > specific > > > > > entity employs more than 3 total proposed developers. > > > > > > > > > > Reliance on Salaried Developers > > > > > > > > > > A substantial amount of work on DataFusion has been by salaried > > > developers, > > > > > but it also has a long tradition of attracting contributions from > > > students > > > > > and hobbyists and we plan no changes in contribution structure. > > > > > > > > > > Relationships with Other Apache Products > > > > > > > > > > DataFusion will obviously have a strong relationship with the Arrow > > > project > > > > > given the overlap in people. We don’t foresee close collaboration > with > > > > > other projects at this time. > > > > > > > > > > Cryptography > > > > > > > > > > DataFusion does not directly support encryption and there are no > > > near-term > > > > > plans to add support for encryption. Users who need this > functionality > > > can > > > > > use the extension APIs. > > > > > > > > > > Required Resources > > > > > > > > > > Mailing Lists > > > > > > > > > > - private@datafusion for private PMC discussions (with moderated > > > > > subscriptions) > > > > > - dev@datafusion > > > > > - commits@datafusion > > > > > > > > > > Version Control > > > > > > > > > > We propose to continue to use git for source control and gitub for > > > hosting > > > > > and testing resources. > > > > > > > > > > Issue Tracking > > > > > > > > > > DataFusion would continue to use github for its issue tracking and > > > > > communications > > > > > > > > > > Other Resources > > > > > > > > > > The existing repositories already make use of existing Apache > > > > > infrastructure, and we expect no change in the initial resource > usage. > > > As > > > > > the project continues to grow, we expect continued infrastructure > > > demand > > > > > growth. > > > > > > > > > > > > > > > FAQ: Has a sub project been promoted to a top level project before? > > > > > > > > > > Yes, and it appears to happen commonly. The Arrow project itself > was > > > > > created as a top level project from work that started in Apache > Drill, > > > and > > > > > there are many sub projects of Hadoop that spun out as their own > top > > > level > > > > > projects such as Mahout, Avro and HBase: > > > > > > > > > > > > > > https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4 > > > > > > > > > > > > > > > > > > > > > > > > > Related material: > > > > > Name search request / research for DataFusion: > > > > > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 > > > > > Discussion about which repositories on the arrow mailing list: > > > > > https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q > > > > > Discussion about initial PMC on the arrow mailing list: > > > > > https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b > > > > > Discussion about creating a new DataFusion top level project: > > > > > https://github.com/apache/arrow-datafusion/discussions/6475 > > > > > Discussion about graduating on incubator list: > > > > > https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99 > > > > > Original Proposal for the Arrow project: > > > > > https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3 > > > > > > > > >