Thanks Jiayu, Benson, Micah and Andrew for your input on this. I have created an unofficial Github org [1] as a quick and dirty experiment for something like spark-packages.org. We should make it clear that code developed in this org will still need to go through the donation process in order to get into the ASF org.
[1]: https://github.com/datafusion-contrib On Mon, Nov 8, 2021 at 3:12 AM Andrew Lamb <al...@influxdata.com> wrote: > > I think a separate non-ASF organization, with a central list of extensions > like spark-packages.org sounds like a good idea to me. > > On Sun, Nov 7, 2021 at 1:34 PM Micah Kornfield <emkornfi...@gmail.com> > wrote: > > > I'll preface this with not being an expert on these matters but this is my > > impression. > > > > > > > Therefore, I am proposing that we create an unofficial shared Github > > > organization to host these Datafusion contrib type projects that are > > > only maintained by non-PMC community members. > > > > > > I think as long as this is hosted outside of the Apache github > > organization, this seems fine. I think being careful around trade-mark > > issues and making it clear it isn't officially part of the Apache > > DataFusion project are the things to be careful about. FWIW, I seem to > > recall this type of model was something proposed in Spark and there was > > some tension at the time with branding of the project. It looks like Spark > > has settled on having a central site <https://spark-packages.org/> [1][2] > > for linking additional modules and they don't have a common namespace. > > > > > > > Am I curious if this is something that could be done under the Apache > > > governance model? My main goal is to create an unofficial incubator > > > type space for community members to develop and collaborate on > > > extensions that may or may not be adopted as official extensions in > > > the future. > > > > > > My limited understanding is either something is governed by the ASF rules > > (i.e. PMC/Committers officially recognized by the apache foundation, along > > with release requirements) or it isn't, there really isn't a half-way thing > > here from the ASF perspective. Independent projects can choose ASF-like > > policies and manage themselves in this manner. The incubator program at the > > ASF is for projects that might or might not have sustained interest to > > continue (but my understanding is incubation follows all the process of a > > normal top-level Apache project). Any code developed outside of ASF > > governance needs to go through the donation process (IP Clearance, etc) to > > be moved into ASF repos, even if it is developed by PMC members/committers > > (see prior discussions on Arrow2 in Rust and the Julia libraries). > > > > Cheers, > > Micah > > > > [1] https://spark.apache.org/contributing.html > > [2] https://spark-packages.org/ > > > > > > On Sun, Nov 7, 2021 at 2:31 AM Benson Muite <benson_mu...@emailplus.org> > > wrote: > > > > > A community owned GitHub organization would be helpful. Maybe for all > > > other Arrow related projects not just Datafusion. This would make them > > > easier to find, and for community members to contribute. It could also > > > include a listing of relevant projects elsewhere. > > > > > > On 11/7/21 9:40 AM, Jiayu Liu wrote: > > > > FWIW if there's a way to contribute code pertaining to datafusion I can > > > > contribute my version of Java bindings to it. > > > > > > > > IMO having a central place (instead of linking) for all bindings, 3rd > > > > libraries, etc. for datafusion would mean more synergy across different > > > > languages but I won't go as far as a monorepo because the CI/CD process > > > > and release process are unlikely to benefit from it. Maybe a community > > > > owned GitHub org? > > > > > > > > On 2021/11/07 00:52:49 QP Hou wrote: > > > >> Hi all, > > > >> > > > >> I would like to propose a new and more community friendly governance > > > >> model for community contributed and maintained extensions for the > > > >> datafusion project. > > > >> > > > >> Over the last year, many datafusion extensions have been proposed and > > > >> created by the community including the java binding, s3 and hdfs[1] > > > >> object storage implementations, etc. Right now these code are or will > > > >> be hosted in individual github namespaces due to the following > > > >> reasons: > > > >> > > > >> * Most of these extensions are not considered part of the Datafusion > > > >> core, so the current maintainers prefer to not have them managed in > > > >> the main repository. The current python binding and ballista code base > > > >> is already adding a decent amount of overhead to our development > > > >> process. Adding more dependent crates will slow us down further > > > >> without much upside. > > > >> > > > >> * Considering the overhead of the official Apache release process, > > > >> current Datafusion PMCs don't have the bandwidth to manage individual > > > >> releases for these extensions. All of the authors of these extensions > > > >> are not Arrow PMC members, so they won't have the access to drive the > > > >> Apache releases by themselves. > > > >> > > > >> Therefore, I am proposing that we create an unofficial shared Github > > > >> organization to host these Datafusion contrib type projects that are > > > >> only maintained by non-PMC community members. I think this is strictly > > > >> better than hosting these extensions projects in personal github > > > >> namespaces. If any of these extensions end up getting significant > > > >> involvements or interests from Datafusion committers, then we can > > > >> promote them into official projects and provide official Apache style > > > >> release support. > > > >> > > > >> Other alternatives I have considered are: > > > >> > > > >> * Keep these projects under personal namespaces and only link them in > > > >> Datafusion's documentation. > > > >> > > > >> * Manage these extensions using experimental repos. But as far as I > > > >> know, the code owners still need to be a PMC member in order to > > > >> perform crates.io releases and it's not intended for long running > > > >> projects without no goal for eventual archival. > > > >> > > > >> * Create a dedicated mono repo named apache/datafusion-contrib to host > > > >> these extensions. However, this approach also requires PMC members to > > > >> get involved for crates.io releases if I understand it correctly. > > > >> > > > >> Am I curious if this is something that could be done under the Apache > > > >> governance model? My main goal is to create an unofficial incubator > > > >> type space for community members to develop and collaborate on > > > >> extensions that may or may not be adopted as official extensions in > > > >> the future. > > > >> > > > >> [1]: https://github.com/apache/arrow-datafusion/pull/1223 > > > >> > > > >> Thanks, > > > >> QP > > > >> > > > > > > > > > > > >