I'd be happy to help. I think we will have to participate in PMC matters infrequently (should there be a difficult issue in the future, we could offer some perspective from cases in the past).
On Wed, Feb 28, 2024 at 2:13 PM Andrew Lamb <al...@influxdata.com> wrote: > Wes brought up a great point on the document[1] that I wanted to discuss > here more broadly: > > > Others may point out that (I think) you don't have any ASF Members on > your initial PMC. When we started Arrow, we had several veteran ASF members > on our initial PMC who haven't been very active in the project otherwise. > If you wanted Jacques or I (both Members), for example, to serve on the PMC > in that capacity we would likely be happy to do that. > > I personally think having ASF Member(s) [2] on the PMC would be most > helpful to connect us to the larger organization and would like to add Wes > and or Jacques if they are willing to do so (are you Wes / Jacques)? > > If there are no concerns and Wes / Jacques are willing I will add their > names to the proposed initial PMC. > > Andrew > > [1] > https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g/edit?disco=AAABH2b6I88 > [2] https://www.apache.org/foundation/members > > On Mon, Feb 26, 2024 at 5:10 PM Andrew Lamb <al...@influxdata.com> wrote: > >> An update: >> >> I have updated the proposal [1] with additional information (new >> committers Jeffrey Vo and Jay Zhan, and the new datafusion-comet repository) >> >> I plan to: >> 1. Call for a formal vote on this (dev@arrow.apache.org) mailing list >> this Friday March 2 >> 2. If the vote passes, submit the proposal to the ASF board as part of >> the April 2024 Arrow report. >> >> This extended timeline is designed to balance the needs of some >> contributors to prepare for the changed structure with their employers. >> >> Full Details can be found on [2]. >> >> Thank you, >> Andrew >> >> [1] >> https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g/edit >> [2] https://github.com/apache/arrow-datafusion/discussions/6475 >> >> On Fri, Jan 5, 2024 at 11:19 AM Andrew Lamb <al...@influxdata.com> wrote: >> >>> Thank you very much >>> >>> On Fri, Jan 5, 2024 at 11:17 AM Jean-Baptiste Onofré <j...@nanthrax.net> >>> wrote: >>> >>>> Hi Andrew, >>>> >>>> The PODLINGNAMESEARCH is not yet completed: the VP Brand Management >>>> (Mark Thomas) should comment in the Jira to approve or not the name. >>>> >>>> I added a comment in the Jira to ping Mark. He should get back to us >>>> soon. >>>> >>>> Regards >>>> JB >>>> >>>> On Fri, Jan 5, 2024 at 3:38 PM Andrew Lamb <al...@influxdata.com> >>>> wrote: >>>> > >>>> > Thanks JB, >>>> > >>>> > I did do a name search and posted the results here [1] >>>> > >>>> > However, I am not sure what the next steps for that particular >>>> process is >>>> > (like does someone have to approve it, for example?) >>>> > >>>> > Any insight you could provide would be greatly appreciated >>>> > >>>> > Andrew >>>> > >>>> > [1] https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 >>>> > >>>> > >>>> > On Fri, Jan 5, 2024 at 7:55 AM Jean-Baptiste Onofré <j...@nanthrax.net> >>>> wrote: >>>> > >>>> > > Hi Andrew, >>>> > > >>>> > > I did a quick review on the doc and it looks good to me. I just >>>> added >>>> > > a question about name search (DataFusion will probably work as TLP, >>>> > > but we have to check as we have a new Apache name moving from Arrow >>>> > > DataFusion to DataFusion). >>>> > > >>>> > > Please let me know if I can help on that. >>>> > > >>>> > > Thanks ! >>>> > > Regards >>>> > > JB >>>> > > >>>> > > On Fri, Jan 5, 2024 at 12:26 PM Andrew Lamb <al...@influxdata.com> >>>> wrote: >>>> > > > >>>> > > > Upon reviewing the board report template, I am planning on the >>>> following >>>> > > > schedule: >>>> > > > 1. I'll leave this proposal for another few weeks to gather any >>>> > > additional >>>> > > > input >>>> > > > 2. In early February 2024 I'll start a formal vote thread on the >>>> dev@ >>>> > > > mailing list for this proposal >>>> > > > 3. If the vote passes, I'll submit a proposed resolution to the >>>> ASF board >>>> > > > for their meeting in April 2024 using the pre-existing template[1] >>>> > > > >>>> > > > >>>> > > > [1] >>>> > > > >>>> > > >>>> https://svn.apache.org/repos/private/committers/board/templates/subproject-tlp-resolution.txt >>>> > > > >>>> > > > On Wed, Dec 27, 2023 at 6:32 PM L. C. Hsieh <vii...@gmail.com> >>>> wrote: >>>> > > > >>>> > > > > Thanks for writing the proposal. It looks great to me too. >>>> > > > > I added a few comments on it. >>>> > > > > >>>> > > > > On Wed, Dec 27, 2023 at 3:05 PM Andy Grove < >>>> andygrov...@gmail.com> >>>> > > wrote: >>>> > > > > > >>>> > > > > > Thank you for creating the draft proposal, Andrew. I have >>>> reviewed >>>> > > this >>>> > > > > and >>>> > > > > > I think it looks great. >>>> > > > > > >>>> > > > > > Andy. >>>> > > > > > >>>> > > > > > On Wed, Dec 27, 2023 at 3:19 PM Andrew Lamb < >>>> al...@influxdata.com> >>>> > > > > wrote: >>>> > > > > > >>>> > > > > > > I have created a draft proposal [1] to break DataFusion out >>>> to its >>>> > > own >>>> > > > > top >>>> > > > > > > level project. Please provide your feedback and suggestions. >>>> > > > > > > >>>> > > > > > > The proposal is included at the end of this email and in >>>> this >>>> > > Google >>>> > > > > Doc: >>>> > > > > > > >>>> > > > > > > >>>> > > > > >>>> > > >>>> https://docs.google.com/document/d/11WTNYS8KWScOt3ySTX39WVS6krPhUvHsuJRY9PZQx4g >>>> > > > > > > . >>>> > > > > > > >>>> > > > > > > Feel free to respond to this email or comment / make >>>> suggestions >>>> > > > > directly >>>> > > > > > > on the document. >>>> > > > > > > >>>> > > > > > > I would be especially grateful if people could review and >>>> comment >>>> > > on >>>> > > > > the >>>> > > > > > > proposed list of committers and PMC members. >>>> > > > > > > >>>> > > > > > > I hope everyone is not getting sick of hearing about this, >>>> but I >>>> > > think >>>> > > > > in >>>> > > > > > > this case it is better to over communicate than risk >>>> surprises. >>>> > > > > > > >>>> > > > > > > Andrew >>>> > > > > > > >>>> > > > > > > [1] https://github.com/apache/arrow-datafusion/issues/8491 >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > ---------- >>>> > > > > > > >>>> > > > > > > DataFusion Top Level Project Proposal >>>> > > > > > > Dec 27, 2023 >>>> > > > > > > >>>> > > > > > > [Editor’s note: This document is based on the proposal to >>>> the ASF >>>> > > > > board to >>>> > > > > > > create the Arrow project. One it is been reviewed, we plan >>>> to send >>>> > > it >>>> > > > > to >>>> > > > > > > the ASF board sometime in January or February 2024 for their >>>> > > > > consideration] >>>> > > > > > > >>>> > > > > > > To: The ASF (bo...@apache.org) >>>> > > > > > > >>>> > > > > > > Summary: >>>> > > > > > > >>>> > > > > > > We propose creating a new top level project, Apache >>>> DataFusion, >>>> > > from an >>>> > > > > > > existing sub project of Apache Arrow to facilitate >>>> additional >>>> > > > > community and >>>> > > > > > > project growth. >>>> > > > > > > >>>> > > > > > > ---- >>>> > > > > > > Apache DataFusion for Apache Top Level Project >>>> > > > > > > >>>> > > > > > > Abstract >>>> > > > > > > >>>> > > > > > > Apache Arrow DataFusion[1] is a very fast, extensible query >>>> > > engine for >>>> > > > > > > building high-quality data-centric systems in Rust, using >>>> the >>>> > > Apache >>>> > > > > Arrow >>>> > > > > > > in-memory format. DataFusion offers SQL and Dataframe APIs, >>>> > > excellent >>>> > > > > > > performance, built-in support for CSV, Parquet, JSON, and >>>> Avro, >>>> > > > > extensive >>>> > > > > > > customization, and a great community. >>>> > > > > > > >>>> > > > > > > [1] https://arrow.apache.org/datafusion/ >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > Proposal >>>> > > > > > > >>>> > > > > > > We propose creating a new top level ASF project, Apache >>>> DataFusion, >>>> > > > > > > governed initially by a subset of the Arrow project’s PMC >>>> and >>>> > > > > committers. >>>> > > > > > > The project’s code is in four existing git repositories, >>>> currently >>>> > > > > governed >>>> > > > > > > by Apache Arrow which would transfer to the new top level >>>> project. >>>> > > > > > > >>>> > > > > > > Background >>>> > > > > > > >>>> > > > > > > When DataFusion was initially donated to the Arrow project, >>>> it did >>>> > > not >>>> > > > > have >>>> > > > > > > a strong enough community to stand on its own. It has since >>>> grown >>>> > > > > > > significantly, and benefited immensely from being part of >>>> Arrow and >>>> > > > > > > nurturing of the Apache Way, and now has a community strong >>>> enough >>>> > > to >>>> > > > > stand >>>> > > > > > > on its own and that would benefit from focused governance >>>> > > attention. >>>> > > > > > > >>>> > > > > > > The community has discussed this idea publicly for more >>>> than 6 >>>> > > months >>>> > > > > > > https://github.com/apache/arrow-datafusion/discussions/6475 >>>> and >>>> > > > > briefly >>>> > > > > > > on >>>> > > > > > > the Arrow PMC mailing list >>>> > > > > > > >>>> https://lists.apache.org/thread/thv2jdm6640l6gm88hy8jhk5prjww0cs. >>>> > > As >>>> > > > > of >>>> > > > > > > the >>>> > > > > > > time of this writing both had exclusively positive >>>> reactions. >>>> > > > > > > >>>> > > > > > > Several current members of the Arrow PMC are both active >>>> > > contributors >>>> > > > > to >>>> > > > > > > DataFusion and understand and believe deeply in the Apache >>>> Way, and >>>> > > > > play >>>> > > > > > > active governance roles in the Arrow project as PMC members >>>> and PMC >>>> > > > > chairs, >>>> > > > > > > guiding the community, and releasing software versions. >>>> With this >>>> > > > > existing >>>> > > > > > > governance experience and structure, the new top level >>>> project >>>> > > will be >>>> > > > > able >>>> > > > > > > to function well immediately and independently. >>>> > > > > > > >>>> > > > > > > Overview of DataFusion >>>> > > > > > > >>>> > > > > > > Current Status >>>> > > > > > > >>>> > > > > > > Meritocracy >>>> > > > > > > >>>> > > > > > > DataFusion has been developed as part of Apache Arrow and >>>> thus has >>>> > > been >>>> > > > > > > operating as a meritocracy. Many of the developers of >>>> DataFusion >>>> > > are >>>> > > > > Arrow >>>> > > > > > > PMC members or committers. The DataFusion project plans to >>>> continue >>>> > > > > adding >>>> > > > > > > new PMC and committers as the project matures and grows. >>>> > > > > > > >>>> > > > > > > Community >>>> > > > > > > >>>> > > > > > > The DataFusion development team seeks to foster the >>>> development and >>>> > > > > user >>>> > > > > > > communities. We hope that becoming a separate project will >>>> help >>>> > > both >>>> > > > > Arrow >>>> > > > > > > and DataFusion communities by being more focused. Focused >>>> > > governance >>>> > > > > will >>>> > > > > > > make it easier to grow the community of committers and PMC >>>> members >>>> > > and >>>> > > > > make >>>> > > > > > > the organization more clear to others. >>>> > > > > > > >>>> > > > > > > Alignment >>>> > > > > > > >>>> > > > > > > The ASF is a natural host for DataFusion given that it is >>>> already >>>> > > the >>>> > > > > home >>>> > > > > > > of Arrow, Parquet, and other related distributed system, >>>> storage >>>> > > and >>>> > > > > query >>>> > > > > > > execution systems. >>>> > > > > > > >>>> > > > > > > Project Leadership >>>> > > > > > > >>>> > > > > > > Proposed Initial PMC >>>> > > > > > > >>>> > > > > > > We propose the following people as the initial DataFusion >>>> PMC >>>> > > members. >>>> > > > > This >>>> > > > > > > is a subset of the existing Arrow PMC members who >>>> contribute to >>>> > > > > DataFusion >>>> > > > > > > https://people.apache.org/phonebook.html?unix=arrow >>>> > > > > > > >>>> > > > > > > Andy Grove (agrove): Arrow PMC Chair >>>> > > > > > > Andrew Lamb (alamb): Arrow PMC, past Arrow PMC Chair >>>> > > > > > > Daniël Heres (dheres) Arrow PMC >>>> > > > > > > Jie Wen (jakevin): Arrow PMC, Doris Committer >>>> > > > > > > Kun Liu (liukun): Arrow PMC, IoTDB PMC, TSFile PMC >>>> > > > > > > Liang-Chi Hsieh (viirya): Arrow PMC, Spark PMC >>>> > > > > > > Qingping Hou: (houqp): Arrow PMC, Doris Committer >>>> > > > > > > Will Jones (wjones127): Arrow PMC >>>> > > > > > > >>>> > > > > > > We’d like to propose Andrew Lamb as the initial Chair, (and >>>> thus >>>> > > ASF >>>> > > > > VP) >>>> > > > > > > for the DataFusion project. >>>> > > > > > > >>>> > > > > > > Affiliations >>>> > > > > > > >>>> > > > > > > Andy Grove (agrove): NVidia >>>> > > > > > > Andrew Lamb (alamb): InfluxData >>>> > > > > > > Daniël Heres (dheres): Coralogix >>>> > > > > > > Jie Wen (jakevin): SelectDB >>>> > > > > > > Kun Liu (liukun): Ebay >>>> > > > > > > Liang-Chi Hsieh (viirya): Apple >>>> > > > > > > Qingping Hou: (houqp): Scribd >>>> > > > > > > Will Jones (wjones127): VoltronData >>>> > > > > > > >>>> > > > > > > Proposed Initial Committers >>>> > > > > > > >>>> > > > > > > In addition to the PMC, we propose the following people as >>>> the >>>> > > initial >>>> > > > > > > DataFusion committers. This is a subset of the existing >>>> Arrow >>>> > > > > committers >>>> > > > > > > who contribute to DataFusion >>>> > > > > > > https://people.apache.org/phonebook.html?unix=arrow >>>> > > > > > > >>>> > > > > > > akurmustafa Mustafa Akur (Synnada) >>>> > > > > > > avantgardner Brent Gardner (Coralogix) >>>> > > > > > > comphead Oleks V. (Unaffiliated) >>>> > > > > > > jiayuliu Liu Jiayu (Airbnb) >>>> > > > > > > mete Metehan Yildirim (Synnada) >>>> > > > > > > mingmwang Wang Mingming (Ebay) >>>> > > > > > > mneumann Marco Neumann (InfluxData) >>>> > > > > > > nju_yaho Zhong Yanghong (Ebay) >>>> > > > > > > ozankabak Mehmet Ozan Kabak (Synnada) >>>> > > > > > > paddyhoran Paddy Horan (Assured Allies) >>>> > > > > > > rdettai Rémi Dettai (Cloudfuse) >>>> > > > > > > sunchao Sun Chao (Apple) >>>> > > > > > > thinkharderdev Daniel Harris (Coralogix) >>>> > > > > > > tustvold Raphael Taylor-Davies (InfluxData) >>>> > > > > > > viirya L. C. Hsieh (Apple) >>>> > > > > > > wayne Ruihang Xia (Greptime) >>>> > > > > > > xudong963 Xudong Wang (ByteDance) >>>> > > > > > > yjshen Yijie Shen (Space and Time) >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > Risk Assessments >>>> > > > > > > >>>> > > > > > > Naming / Trademarks >>>> > > > > > > >>>> > > > > > > As a sub-project of Arrow, the DataFusion name has been >>>> used for >>>> > > over 4 >>>> > > > > > > years without any known issues. A podling name search has >>>> thus far >>>> > > not >>>> > > > > > > turned up any concerns: >>>> > > > > > > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 >>>> > > > > > > >>>> > > > > > > Legal / IP Clearance >>>> > > > > > > >>>> > > > > > > All DataFusion code has either been donated to the Arrow >>>> project >>>> > > with >>>> > > > > > > appropriate IP clearance or has been developed directly >>>> under ASF >>>> > > > > > > processes and procedures. Thus creating a new top level >>>> project >>>> > > poses >>>> > > > > no >>>> > > > > > > new Legal or IP risks. >>>> > > > > > > >>>> > > > > > > Code Extraction >>>> > > > > > > >>>> > > > > > > The relevant code is already in 4 separate repositories: >>>> > > > > > > https://github.com/apache/arrow-datafusion/ >>>> > > > > > > https://github.com/apache/arrow-datafusion-python >>>> > > > > > > https://github.com/apache/arrow-ballista >>>> > > > > > > https://github.com/apache/arrow-ballista-python >>>> > > > > > > >>>> > > > > > > We foresee no issues with code extraction and propose these >>>> > > > > repositories be >>>> > > > > > > respectively renamed to reflect top level projects: >>>> > > > > > > https://github.com/apache/datafusion/ >>>> > > > > > > https://github.com/apache/datafusion-python >>>> > > > > > > https://github.com/apache/datafusion-ballista >>>> > > > > > > https://github.com/apache/datafusion-ballista-python >>>> > > > > > > >>>> > > > > > > Note: https://github.com/apache/arrow-rs, the Rust >>>> > > implementation of >>>> > > > > > > Arrow, would remain part of the Arrow project. >>>> > > > > > > >>>> > > > > > > Orphaned Products >>>> > > > > > > >>>> > > > > > > DataFusion is known to be used in many open source and >>>> commercial >>>> > > > > projects >>>> > > > > > > >>>> > > > > > > >>>> > > > > >>>> > > >>>> https://arrow.apache.org/datafusion/user-guide/introduction.html#known-users >>>> > > > > > > , >>>> > > > > > > has had multiple commits daily for several years, and its >>>> adoption >>>> > > and >>>> > > > > > > number of contributors appears to be growing. >>>> > > > > > > >>>> > > > > > > Inexperience with Open Source >>>> > > > > > > >>>> > > > > > > The proposed PMC has extensive experience with Apache Arrow >>>> and >>>> > > other >>>> > > > > > > Apache projects, and includes PMC members and PMC chairs. >>>> The >>>> > > > > DataFusion >>>> > > > > > > PMC and more experienced committers will continue to coach >>>> new >>>> > > > > community >>>> > > > > > > members who may be less familiar with the Apache Way. >>>> > > > > > > >>>> > > > > > > Homogeneous Developers >>>> > > > > > > >>>> > > > > > > The 8 proposed PMC members are from 8 different employers >>>> and the >>>> > > > > proposed >>>> > > > > > > committers are similarly distributed across affiliations. No >>>> > > specific >>>> > > > > > > entity employs more than 3 total proposed developers. >>>> > > > > > > >>>> > > > > > > Reliance on Salaried Developers >>>> > > > > > > >>>> > > > > > > A substantial amount of work on DataFusion has been by >>>> salaried >>>> > > > > developers, >>>> > > > > > > but it also has a long tradition of attracting >>>> contributions from >>>> > > > > students >>>> > > > > > > and hobbyists and we plan no changes in contribution >>>> structure. >>>> > > > > > > >>>> > > > > > > Relationships with Other Apache Products >>>> > > > > > > >>>> > > > > > > DataFusion will obviously have a strong relationship with >>>> the Arrow >>>> > > > > project >>>> > > > > > > given the overlap in people. We don’t foresee close >>>> collaboration >>>> > > with >>>> > > > > > > other projects at this time. >>>> > > > > > > >>>> > > > > > > Cryptography >>>> > > > > > > >>>> > > > > > > DataFusion does not directly support encryption and there >>>> are no >>>> > > > > near-term >>>> > > > > > > plans to add support for encryption. Users who need this >>>> > > functionality >>>> > > > > can >>>> > > > > > > use the extension APIs. >>>> > > > > > > >>>> > > > > > > Required Resources >>>> > > > > > > >>>> > > > > > > Mailing Lists >>>> > > > > > > >>>> > > > > > > - private@datafusion for private PMC discussions (with >>>> moderated >>>> > > > > > > subscriptions) >>>> > > > > > > - dev@datafusion >>>> > > > > > > - commits@datafusion >>>> > > > > > > >>>> > > > > > > Version Control >>>> > > > > > > >>>> > > > > > > We propose to continue to use git for source control and >>>> gitub for >>>> > > > > hosting >>>> > > > > > > and testing resources. >>>> > > > > > > >>>> > > > > > > Issue Tracking >>>> > > > > > > >>>> > > > > > > DataFusion would continue to use github for its issue >>>> tracking and >>>> > > > > > > communications >>>> > > > > > > >>>> > > > > > > Other Resources >>>> > > > > > > >>>> > > > > > > The existing repositories already make use of existing >>>> Apache >>>> > > > > > > infrastructure, and we expect no change in the initial >>>> resource >>>> > > usage. >>>> > > > > As >>>> > > > > > > the project continues to grow, we expect continued >>>> infrastructure >>>> > > > > demand >>>> > > > > > > growth. >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > FAQ: Has a sub project been promoted to a top level project >>>> before? >>>> > > > > > > >>>> > > > > > > Yes, and it appears to happen commonly. The Arrow project >>>> itself >>>> > > was >>>> > > > > > > created as a top level project from work that started in >>>> Apache >>>> > > Drill, >>>> > > > > and >>>> > > > > > > there are many sub projects of Hadoop that spun out as >>>> their own >>>> > > top >>>> > > > > level >>>> > > > > > > projects such as Mahout, Avro and HBase: >>>> > > > > > > >>>> > > > > > > >>>> > > > > >>>> > > >>>> https://news.apache.org/foundation/entry/the_apache_software_foundation_announces4 >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > >>>> > > > > > > Related material: >>>> > > > > > > Name search request / research for DataFusion: >>>> > > > > > > https://issues.apache.org/jira/browse/PODLINGNAMESEARCH-219 >>>> > > > > > > Discussion about which repositories on the arrow mailing >>>> list: >>>> > > > > > > >>>> https://lists.apache.org/thread/ob3n0d9ky0bgrryl3xn39w9k566bq00q >>>> > > > > > > Discussion about initial PMC on the arrow mailing list: >>>> > > > > > > >>>> https://lists.apache.org/thread/pymrzcdw4qdptvby85f69rg3pcckl15b >>>> > > > > > > Discussion about creating a new DataFusion top level >>>> project: >>>> > > > > > > https://github.com/apache/arrow-datafusion/discussions/6475 >>>> > > > > > > Discussion about graduating on incubator list: >>>> > > > > > > >>>> https://lists.apache.org/thread/r4n73pmms1lv0jbohyx1o1z13d615t99 >>>> > > > > > > Original Proposal for the Arrow project: >>>> > > > > > > >>>> https://lists.apache.org/thread/x2qzdwglm8pkqp9gv03bbgw17khl7pq3 >>>> > > > > > > >>>> > > > > >>>> > > >>>> >>>