I'll preface this with not being an expert on these matters but this is my
impression.


> Therefore, I am proposing that we create an unofficial shared Github
> organization to host these Datafusion contrib type projects that are
> only maintained by non-PMC community members.


I think as long as this is hosted outside of the Apache github
organization, this seems fine.  I think being careful around trade-mark
issues and making it clear it isn't officially part of the Apache
DataFusion project are the things to be careful about.  FWIW, I seem to
recall this type of model was something proposed in Spark and there was
some tension at the time with branding of the project.  It looks like Spark
has settled on having a central site <https://spark-packages.org/> [1][2]
for linking additional modules and they don't have a common namespace.


> Am I curious if this is something that could be done under the Apache
> governance model? My main goal is to create an unofficial incubator
> type space for community members to develop and collaborate on
> extensions that may or may not be adopted as official extensions in
> the future.


My limited understanding is either something is governed by the ASF rules
(i.e. PMC/Committers officially recognized by the apache foundation, along
with release requirements) or it isn't, there really isn't a half-way thing
here from the ASF perspective.  Independent projects can choose ASF-like
policies and manage themselves in this manner. The incubator program at the
ASF is for projects that might or might not have sustained interest to
continue (but my understanding is incubation follows all the process of a
normal top-level Apache project).  Any code developed outside of ASF
governance needs to go through the donation process (IP Clearance, etc) to
be moved into ASF repos, even if it is developed by PMC members/committers
(see prior discussions on Arrow2 in Rust and the Julia libraries).

Cheers,
Micah

[1] https://spark.apache.org/contributing.html
[2] https://spark-packages.org/


On Sun, Nov 7, 2021 at 2:31 AM Benson Muite <benson_mu...@emailplus.org>
wrote:

> A community owned GitHub organization would be helpful. Maybe for all
> other Arrow related projects not just Datafusion. This would make them
> easier to find, and for community members to contribute. It could also
> include a listing of relevant projects elsewhere.
>
> On 11/7/21 9:40 AM, Jiayu Liu wrote:
> > FWIW if there's a way to contribute code pertaining to datafusion I can
> > contribute my version of Java bindings to it.
> >
> > IMO having a central place (instead of linking) for all bindings, 3rd
> > libraries, etc. for datafusion would mean more synergy across different
> > languages but I won't go as far as a monorepo because the CI/CD process
> > and release process are unlikely to benefit from it. Maybe a community
> > owned GitHub org?
> >
> > On 2021/11/07 00:52:49 QP Hou wrote:
> >> Hi all,
> >>
> >> I would like to propose a new and more community friendly governance
> >> model for community contributed and maintained extensions for the
> >> datafusion project.
> >>
> >> Over the last year, many datafusion extensions have been proposed and
> >> created by the community including the java binding, s3 and hdfs[1]
> >> object storage implementations, etc. Right now these code are or will
> >> be hosted in individual github namespaces due to the following
> >> reasons:
> >>
> >> * Most of these extensions are not considered part of the Datafusion
> >> core, so the current maintainers prefer to not have them managed in
> >> the main repository. The current python binding and ballista code base
> >> is already adding a decent amount of overhead to our development
> >> process. Adding more dependent crates will slow us down further
> >> without much upside.
> >>
> >> * Considering the overhead of the official Apache release process,
> >> current Datafusion PMCs don't have the bandwidth to manage individual
> >> releases for these extensions. All of the authors of these extensions
> >> are not Arrow PMC members, so they won't have the access to drive the
> >> Apache releases by themselves.
> >>
> >> Therefore, I am proposing that we create an unofficial shared Github
> >> organization to host these Datafusion contrib type projects that are
> >> only maintained by non-PMC community members. I think this is strictly
> >> better than hosting these extensions projects in personal github
> >> namespaces. If any of these extensions end up getting significant
> >> involvements or interests from Datafusion committers, then we can
> >> promote them into official projects and provide official Apache style
> >> release support.
> >>
> >> Other alternatives I have considered are:
> >>
> >> * Keep these projects under personal namespaces and only link them in
> >> Datafusion's documentation.
> >>
> >> * Manage these extensions using experimental repos. But as far as I
> >> know, the code owners still need to be a PMC member in order to
> >> perform crates.io releases and it's not intended for long running
> >> projects without no goal for eventual archival.
> >>
> >> * Create a dedicated mono repo named apache/datafusion-contrib to host
> >> these extensions. However, this approach also requires PMC members to
> >> get involved for crates.io releases if I understand it correctly.
> >>
> >> Am I curious if this is something that could be done under the Apache
> >> governance model? My main goal is to create an unofficial incubator
> >> type space for community members to develop and collaborate on
> >> extensions that may or may not be adopted as official extensions in
> >> the future.
> >>
> >> [1]: https://github.com/apache/arrow-datafusion/pull/1223
> >>
> >> Thanks,
> >> QP
> >>
> >
>
>

Reply via email to