Thank you QP

Andrew

On Sun, Nov 14, 2021 at 5:02 PM QP Hou <houqp....@gmail.com> wrote:

> Thanks Jiayu, Benson, Micah and Andrew for your input on this. I have
> created an unofficial Github org [1] as a quick and dirty experiment
> for something like spark-packages.org. We should make it clear that
> code developed in this org will still need to go through the donation
> process in order to get into the ASF org.
>
> [1]: https://github.com/datafusion-contrib
>
> On Mon, Nov 8, 2021 at 3:12 AM Andrew Lamb <al...@influxdata.com> wrote:
> >
> > I think a separate non-ASF organization, with a central list of
> extensions
> > like spark-packages.org sounds like a good idea to me.
> >
> > On Sun, Nov 7, 2021 at 1:34 PM Micah Kornfield <emkornfi...@gmail.com>
> > wrote:
> >
> > > I'll preface this with not being an expert on these matters but this
> is my
> > > impression.
> > >
> > >
> > > > Therefore, I am proposing that we create an unofficial shared Github
> > > > organization to host these Datafusion contrib type projects that are
> > > > only maintained by non-PMC community members.
> > >
> > >
> > > I think as long as this is hosted outside of the Apache github
> > > organization, this seems fine.  I think being careful around trade-mark
> > > issues and making it clear it isn't officially part of the Apache
> > > DataFusion project are the things to be careful about.  FWIW, I seem to
> > > recall this type of model was something proposed in Spark and there was
> > > some tension at the time with branding of the project.  It looks like
> Spark
> > > has settled on having a central site <https://spark-packages.org/>
> [1][2]
> > > for linking additional modules and they don't have a common namespace.
> > >
> > >
> > > > Am I curious if this is something that could be done under the Apache
> > > > governance model? My main goal is to create an unofficial incubator
> > > > type space for community members to develop and collaborate on
> > > > extensions that may or may not be adopted as official extensions in
> > > > the future.
> > >
> > >
> > > My limited understanding is either something is governed by the ASF
> rules
> > > (i.e. PMC/Committers officially recognized by the apache foundation,
> along
> > > with release requirements) or it isn't, there really isn't a half-way
> thing
> > > here from the ASF perspective.  Independent projects can choose
> ASF-like
> > > policies and manage themselves in this manner. The incubator program
> at the
> > > ASF is for projects that might or might not have sustained interest to
> > > continue (but my understanding is incubation follows all the process
> of a
> > > normal top-level Apache project).  Any code developed outside of ASF
> > > governance needs to go through the donation process (IP Clearance,
> etc) to
> > > be moved into ASF repos, even if it is developed by PMC
> members/committers
> > > (see prior discussions on Arrow2 in Rust and the Julia libraries).
> > >
> > > Cheers,
> > > Micah
> > >
> > > [1] https://spark.apache.org/contributing.html
> > > [2] https://spark-packages.org/
> > >
> > >
> > > On Sun, Nov 7, 2021 at 2:31 AM Benson Muite <
> benson_mu...@emailplus.org>
> > > wrote:
> > >
> > > > A community owned GitHub organization would be helpful. Maybe for all
> > > > other Arrow related projects not just Datafusion. This would make
> them
> > > > easier to find, and for community members to contribute. It could
> also
> > > > include a listing of relevant projects elsewhere.
> > > >
> > > > On 11/7/21 9:40 AM, Jiayu Liu wrote:
> > > > > FWIW if there's a way to contribute code pertaining to datafusion
> I can
> > > > > contribute my version of Java bindings to it.
> > > > >
> > > > > IMO having a central place (instead of linking) for all bindings,
> 3rd
> > > > > libraries, etc. for datafusion would mean more synergy across
> different
> > > > > languages but I won't go as far as a monorepo because the CI/CD
> process
> > > > > and release process are unlikely to benefit from it. Maybe a
> community
> > > > > owned GitHub org?
> > > > >
> > > > > On 2021/11/07 00:52:49 QP Hou wrote:
> > > > >> Hi all,
> > > > >>
> > > > >> I would like to propose a new and more community friendly
> governance
> > > > >> model for community contributed and maintained extensions for the
> > > > >> datafusion project.
> > > > >>
> > > > >> Over the last year, many datafusion extensions have been proposed
> and
> > > > >> created by the community including the java binding, s3 and
> hdfs[1]
> > > > >> object storage implementations, etc. Right now these code are or
> will
> > > > >> be hosted in individual github namespaces due to the following
> > > > >> reasons:
> > > > >>
> > > > >> * Most of these extensions are not considered part of the
> Datafusion
> > > > >> core, so the current maintainers prefer to not have them managed
> in
> > > > >> the main repository. The current python binding and ballista code
> base
> > > > >> is already adding a decent amount of overhead to our development
> > > > >> process. Adding more dependent crates will slow us down further
> > > > >> without much upside.
> > > > >>
> > > > >> * Considering the overhead of the official Apache release process,
> > > > >> current Datafusion PMCs don't have the bandwidth to manage
> individual
> > > > >> releases for these extensions. All of the authors of these
> extensions
> > > > >> are not Arrow PMC members, so they won't have the access to drive
> the
> > > > >> Apache releases by themselves.
> > > > >>
> > > > >> Therefore, I am proposing that we create an unofficial shared
> Github
> > > > >> organization to host these Datafusion contrib type projects that
> are
> > > > >> only maintained by non-PMC community members. I think this is
> strictly
> > > > >> better than hosting these extensions projects in personal github
> > > > >> namespaces. If any of these extensions end up getting significant
> > > > >> involvements or interests from Datafusion committers, then we can
> > > > >> promote them into official projects and provide official Apache
> style
> > > > >> release support.
> > > > >>
> > > > >> Other alternatives I have considered are:
> > > > >>
> > > > >> * Keep these projects under personal namespaces and only link
> them in
> > > > >> Datafusion's documentation.
> > > > >>
> > > > >> * Manage these extensions using experimental repos. But as far as
> I
> > > > >> know, the code owners still need to be a PMC member in order to
> > > > >> perform crates.io releases and it's not intended for long running
> > > > >> projects without no goal for eventual archival.
> > > > >>
> > > > >> * Create a dedicated mono repo named apache/datafusion-contrib to
> host
> > > > >> these extensions. However, this approach also requires PMC
> members to
> > > > >> get involved for crates.io releases if I understand it correctly.
> > > > >>
> > > > >> Am I curious if this is something that could be done under the
> Apache
> > > > >> governance model? My main goal is to create an unofficial
> incubator
> > > > >> type space for community members to develop and collaborate on
> > > > >> extensions that may or may not be adopted as official extensions
> in
> > > > >> the future.
> > > > >>
> > > > >> [1]: https://github.com/apache/arrow-datafusion/pull/1223
> > > > >>
> > > > >> Thanks,
> > > > >> QP
> > > > >>
> > > > >
> > > >
> > > >
> > >
>

Reply via email to