Thanks Jiayu, Benson, Micah and Andrew for your input on this. I have
created an unofficial Github org [1] as a quick and dirty experiment
for something like spark-packages.org. We should make it clear that
code developed in this org will still need to go through the donation
process in order to get into the ASF org.

[1]: https://github.com/datafusion-contrib

On Mon, Nov 8, 2021 at 3:12 AM Andrew Lamb <al...@influxdata.com> wrote:
>
> I think a separate non-ASF organization, with a central list of extensions
> like spark-packages.org sounds like a good idea to me.
>
> On Sun, Nov 7, 2021 at 1:34 PM Micah Kornfield <emkornfi...@gmail.com>
> wrote:
>
> > I'll preface this with not being an expert on these matters but this is my
> > impression.
> >
> >
> > > Therefore, I am proposing that we create an unofficial shared Github
> > > organization to host these Datafusion contrib type projects that are
> > > only maintained by non-PMC community members.
> >
> >
> > I think as long as this is hosted outside of the Apache github
> > organization, this seems fine.  I think being careful around trade-mark
> > issues and making it clear it isn't officially part of the Apache
> > DataFusion project are the things to be careful about.  FWIW, I seem to
> > recall this type of model was something proposed in Spark and there was
> > some tension at the time with branding of the project.  It looks like Spark
> > has settled on having a central site <https://spark-packages.org/> [1][2]
> > for linking additional modules and they don't have a common namespace.
> >
> >
> > > Am I curious if this is something that could be done under the Apache
> > > governance model? My main goal is to create an unofficial incubator
> > > type space for community members to develop and collaborate on
> > > extensions that may or may not be adopted as official extensions in
> > > the future.
> >
> >
> > My limited understanding is either something is governed by the ASF rules
> > (i.e. PMC/Committers officially recognized by the apache foundation, along
> > with release requirements) or it isn't, there really isn't a half-way thing
> > here from the ASF perspective.  Independent projects can choose ASF-like
> > policies and manage themselves in this manner. The incubator program at the
> > ASF is for projects that might or might not have sustained interest to
> > continue (but my understanding is incubation follows all the process of a
> > normal top-level Apache project).  Any code developed outside of ASF
> > governance needs to go through the donation process (IP Clearance, etc) to
> > be moved into ASF repos, even if it is developed by PMC members/committers
> > (see prior discussions on Arrow2 in Rust and the Julia libraries).
> >
> > Cheers,
> > Micah
> >
> > [1] https://spark.apache.org/contributing.html
> > [2] https://spark-packages.org/
> >
> >
> > On Sun, Nov 7, 2021 at 2:31 AM Benson Muite <benson_mu...@emailplus.org>
> > wrote:
> >
> > > A community owned GitHub organization would be helpful. Maybe for all
> > > other Arrow related projects not just Datafusion. This would make them
> > > easier to find, and for community members to contribute. It could also
> > > include a listing of relevant projects elsewhere.
> > >
> > > On 11/7/21 9:40 AM, Jiayu Liu wrote:
> > > > FWIW if there's a way to contribute code pertaining to datafusion I can
> > > > contribute my version of Java bindings to it.
> > > >
> > > > IMO having a central place (instead of linking) for all bindings, 3rd
> > > > libraries, etc. for datafusion would mean more synergy across different
> > > > languages but I won't go as far as a monorepo because the CI/CD process
> > > > and release process are unlikely to benefit from it. Maybe a community
> > > > owned GitHub org?
> > > >
> > > > On 2021/11/07 00:52:49 QP Hou wrote:
> > > >> Hi all,
> > > >>
> > > >> I would like to propose a new and more community friendly governance
> > > >> model for community contributed and maintained extensions for the
> > > >> datafusion project.
> > > >>
> > > >> Over the last year, many datafusion extensions have been proposed and
> > > >> created by the community including the java binding, s3 and hdfs[1]
> > > >> object storage implementations, etc. Right now these code are or will
> > > >> be hosted in individual github namespaces due to the following
> > > >> reasons:
> > > >>
> > > >> * Most of these extensions are not considered part of the Datafusion
> > > >> core, so the current maintainers prefer to not have them managed in
> > > >> the main repository. The current python binding and ballista code base
> > > >> is already adding a decent amount of overhead to our development
> > > >> process. Adding more dependent crates will slow us down further
> > > >> without much upside.
> > > >>
> > > >> * Considering the overhead of the official Apache release process,
> > > >> current Datafusion PMCs don't have the bandwidth to manage individual
> > > >> releases for these extensions. All of the authors of these extensions
> > > >> are not Arrow PMC members, so they won't have the access to drive the
> > > >> Apache releases by themselves.
> > > >>
> > > >> Therefore, I am proposing that we create an unofficial shared Github
> > > >> organization to host these Datafusion contrib type projects that are
> > > >> only maintained by non-PMC community members. I think this is strictly
> > > >> better than hosting these extensions projects in personal github
> > > >> namespaces. If any of these extensions end up getting significant
> > > >> involvements or interests from Datafusion committers, then we can
> > > >> promote them into official projects and provide official Apache style
> > > >> release support.
> > > >>
> > > >> Other alternatives I have considered are:
> > > >>
> > > >> * Keep these projects under personal namespaces and only link them in
> > > >> Datafusion's documentation.
> > > >>
> > > >> * Manage these extensions using experimental repos. But as far as I
> > > >> know, the code owners still need to be a PMC member in order to
> > > >> perform crates.io releases and it's not intended for long running
> > > >> projects without no goal for eventual archival.
> > > >>
> > > >> * Create a dedicated mono repo named apache/datafusion-contrib to host
> > > >> these extensions. However, this approach also requires PMC members to
> > > >> get involved for crates.io releases if I understand it correctly.
> > > >>
> > > >> Am I curious if this is something that could be done under the Apache
> > > >> governance model? My main goal is to create an unofficial incubator
> > > >> type space for community members to develop and collaborate on
> > > >> extensions that may or may not be adopted as official extensions in
> > > >> the future.
> > > >>
> > > >> [1]: https://github.com/apache/arrow-datafusion/pull/1223
> > > >>
> > > >> Thanks,
> > > >> QP
> > > >>
> > > >
> > >
> > >
> >

Reply via email to