I think a separate non-ASF organization, with a central list of extensions like spark-packages.org sounds like a good idea to me.
On Sun, Nov 7, 2021 at 1:34 PM Micah Kornfield <emkornfi...@gmail.com> wrote: > I'll preface this with not being an expert on these matters but this is my > impression. > > > > Therefore, I am proposing that we create an unofficial shared Github > > organization to host these Datafusion contrib type projects that are > > only maintained by non-PMC community members. > > > I think as long as this is hosted outside of the Apache github > organization, this seems fine. I think being careful around trade-mark > issues and making it clear it isn't officially part of the Apache > DataFusion project are the things to be careful about. FWIW, I seem to > recall this type of model was something proposed in Spark and there was > some tension at the time with branding of the project. It looks like Spark > has settled on having a central site <https://spark-packages.org/> [1][2] > for linking additional modules and they don't have a common namespace. > > > > Am I curious if this is something that could be done under the Apache > > governance model? My main goal is to create an unofficial incubator > > type space for community members to develop and collaborate on > > extensions that may or may not be adopted as official extensions in > > the future. > > > My limited understanding is either something is governed by the ASF rules > (i.e. PMC/Committers officially recognized by the apache foundation, along > with release requirements) or it isn't, there really isn't a half-way thing > here from the ASF perspective. Independent projects can choose ASF-like > policies and manage themselves in this manner. The incubator program at the > ASF is for projects that might or might not have sustained interest to > continue (but my understanding is incubation follows all the process of a > normal top-level Apache project). Any code developed outside of ASF > governance needs to go through the donation process (IP Clearance, etc) to > be moved into ASF repos, even if it is developed by PMC members/committers > (see prior discussions on Arrow2 in Rust and the Julia libraries). > > Cheers, > Micah > > [1] https://spark.apache.org/contributing.html > [2] https://spark-packages.org/ > > > On Sun, Nov 7, 2021 at 2:31 AM Benson Muite <benson_mu...@emailplus.org> > wrote: > > > A community owned GitHub organization would be helpful. Maybe for all > > other Arrow related projects not just Datafusion. This would make them > > easier to find, and for community members to contribute. It could also > > include a listing of relevant projects elsewhere. > > > > On 11/7/21 9:40 AM, Jiayu Liu wrote: > > > FWIW if there's a way to contribute code pertaining to datafusion I can > > > contribute my version of Java bindings to it. > > > > > > IMO having a central place (instead of linking) for all bindings, 3rd > > > libraries, etc. for datafusion would mean more synergy across different > > > languages but I won't go as far as a monorepo because the CI/CD process > > > and release process are unlikely to benefit from it. Maybe a community > > > owned GitHub org? > > > > > > On 2021/11/07 00:52:49 QP Hou wrote: > > >> Hi all, > > >> > > >> I would like to propose a new and more community friendly governance > > >> model for community contributed and maintained extensions for the > > >> datafusion project. > > >> > > >> Over the last year, many datafusion extensions have been proposed and > > >> created by the community including the java binding, s3 and hdfs[1] > > >> object storage implementations, etc. Right now these code are or will > > >> be hosted in individual github namespaces due to the following > > >> reasons: > > >> > > >> * Most of these extensions are not considered part of the Datafusion > > >> core, so the current maintainers prefer to not have them managed in > > >> the main repository. The current python binding and ballista code base > > >> is already adding a decent amount of overhead to our development > > >> process. Adding more dependent crates will slow us down further > > >> without much upside. > > >> > > >> * Considering the overhead of the official Apache release process, > > >> current Datafusion PMCs don't have the bandwidth to manage individual > > >> releases for these extensions. All of the authors of these extensions > > >> are not Arrow PMC members, so they won't have the access to drive the > > >> Apache releases by themselves. > > >> > > >> Therefore, I am proposing that we create an unofficial shared Github > > >> organization to host these Datafusion contrib type projects that are > > >> only maintained by non-PMC community members. I think this is strictly > > >> better than hosting these extensions projects in personal github > > >> namespaces. If any of these extensions end up getting significant > > >> involvements or interests from Datafusion committers, then we can > > >> promote them into official projects and provide official Apache style > > >> release support. > > >> > > >> Other alternatives I have considered are: > > >> > > >> * Keep these projects under personal namespaces and only link them in > > >> Datafusion's documentation. > > >> > > >> * Manage these extensions using experimental repos. But as far as I > > >> know, the code owners still need to be a PMC member in order to > > >> perform crates.io releases and it's not intended for long running > > >> projects without no goal for eventual archival. > > >> > > >> * Create a dedicated mono repo named apache/datafusion-contrib to host > > >> these extensions. However, this approach also requires PMC members to > > >> get involved for crates.io releases if I understand it correctly. > > >> > > >> Am I curious if this is something that could be done under the Apache > > >> governance model? My main goal is to create an unofficial incubator > > >> type space for community members to develop and collaborate on > > >> extensions that may or may not be adopted as official extensions in > > >> the future. > > >> > > >> [1]: https://github.com/apache/arrow-datafusion/pull/1223 > > >> > > >> Thanks, > > >> QP > > >> > > > > > > > >