Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Luciano Resende Sat, 26 Mar 2016 10:38:47 -0700

On Sat, Mar 26, 2016 at 10:20 AM, Jean-Baptiste Onofré <[email protected]>
wrote:


> Hi Luciano,
>
> If we take the "pure" technical vision, there's pros and cons of having
> spark-extra (or whatever the name we give) still as an Apache project:
>
> Pro:
>  - Governance & Quality Insurance: we follow the Apache rules, meaning
> that a release has to be staged and voted by the PMC. It's a form of
> governance of the project and quality (as the releases are reviewed).
>  - Software origin: users know where the connector comes from, and they
> have the guarantee in term of licensing, etc.
>  - IP/ICLA: We know the committers of this project, and we know they agree
> with the ICL agreement.
>
> Cons:
>  - Third licenses support. As an Apache project, the "connectors" will be
> allowed to use only Apache or Category B licensed dependencies. For
> instance, if I would like to create a Spark connector for couchbase, I
> can't do it at Apache.
>

Yes, this is not solving the incompatible license problems


>  - Release cycle. As an Apache project, it means we have to follow the
> rules, meaning that the release cycle can appear strict and long due to the
> staging and vote process. For me, it's a huge benefit but some can see as
> too strict ;)
>

IMHO, This is the small price we pay for all the good stuff you mentioned
in pro


>
> Maybe, we can imagine both, as we have in ServiceMix or Camel:
> - all modules/connectors matching the Apache rule (especially in term of
> licensing) should be in the Apache Spark-Modules (or Spark-Extensions, or
> whatever). It's like the ServiceMix Bundles.
>

If you are talking here about Spark proper, then we are currently seeing
that this is going to be hard. If there was a way to have more flexibility
to host these directly into Spark proper, I would never be creating this
thread as we would have all the pros you mentioned hosting them directly
into Spark.


> - all modules/connectors that can't fit into the Apache rule (due to
> licensing issue) can go into GitHub Spark-Extra (or Spark-Package). It's
> like the ServiceMix Extra or Camel Extra on github.
>
>
We could look into this, but it might be a "Spark Extra  discussion" on how
we can help foster a community around the non-compatible licensed
connectors.


> My $0.01.
>
> Regards
> JB
>
>
> On 03/26/2016 06:07 PM, Luciano Resende wrote:
>
>> I believe some of this has been resolved in the context of some parts
>> that had interest in one extra connector, but we still have a few
>> removed, and as you mentioned, we still don't have a simple way or
>> willingness to manage and be current on new packages like kafka. And
>> based on the fact that this thread is still alive, I believe that other
>> community members might have other concerns as well.
>>
>> After some thought, I believe having a separate project (what was
>> mentioned here as Spark Extras) to handle Spark Connectors and Spark
>> add-ons in general could be very beneficial to Spark and the overall
>> Spark community, which would have a central place in Apache to
>> collaborate around related Spark components.
>>
>> Some of the benefits on this approach
>>
>> - Enables maintaining the connectors inside Apache, following the Apache
>> governance and release rules, while allowing Spark proper to focus on
>> the core runtime.
>> - Provides more flexibility in controlling the direction (currency) of
>> the existing connectors (e.g. willing to find a solution and maintain
>> multiple versions of same connectors like kafka 0.8x and 0.9x)
>> - Becomes a home for other types of Spark related connectors helping
>> expanding the community around Spark (e.g. Zeppelin see most of it's
>> current contribution around new/enhanced connectors)
>>
>> What are some requirements for Spark Extras to be successful:
>>
>> - Be up to date with Spark Trunk APIs (based on daily CIs against
>> SNAPSHOT)
>> - Adhere to Spark release cycles (have a very little window compared to
>> Spark release)
>> - Be more open and flexible to the set of connectors it will accept and
>> maintain (e.g. also handle multiple versions like the kafka 0.9 issue we
>> have today)
>>
>> Where to start Spark Extras
>>
>> Depending on the interest here, we could follow the steps of (Apache
>> Arrow) and start this directly as a TLP, or start as an incubator
>> project. I would consider the first option first.
>>
>> Who would participate
>>
>> Have thought about this for a bit, and if we go to the direction of TLP,
>> I would say Spark Committers and Apache Members can request to
>> participate as PMC members, while other committers can request to become
>> committers. Non committers would be added based on meritocracy after the
>> start of the project.
>>
>> Project Name
>>
>> It would be ideal if we could have a project name that shows close ties
>> to Spark (e.g. Spark Extras or Spark Connectors) but we will need
>> permission and support from whoever is going to evaluate the project
>> proposal (e.g. Apache Board)
>>
>>
>> Thoughts ?
>>
>> Does anyone have any big disagreement or objection to moving into this
>> direction ?
>>
>> Otherwise, who would be interested in joining the project, so I can
>> start working on some concrete proposal ?
>>
>>
>>
>> On Sat, Mar 26, 2016 at 6:58 AM, Sean Owen <[email protected]
>> <mailto:[email protected]>> wrote:
>>
>>     This has been resolved; see the JIRA and related PRs but also
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/SPARK-13843-Next-steps-td16783.html
>>
>>     This is not a scenario where a [VOTE] needs to take place, and code
>>     changes don't proceed through PMC votes. From the project perspective,
>>     code was deleted/retired for lack of interest, and this is controlled
>>     by the normal lazy consensus protocol which wasn't vetoed.
>>
>>     The subsequent discussion was in part about whether other modules
>>     should go, or whether one should come back, which it did. The latter
>>     suggests that change could have been left open for some discussion
>>     longer. Ideally, you would have commented before the initial change
>>     happened, but it sounds like several people would have liked more
>>     time. I don't think I'd call that "improper conduct" though, no. It
>>     was reversed via the same normal code management process.
>>
>>     The rest of the question concerned what becomes of the code that was
>>     removed. It was revived outside the project for anyone who cares to
>>     continue collaborating. There seemed to be no disagreement about that,
>>     mostly because the code in question was of minimal interest. PMC
>>     doesn't need to rule on anything. There may still be some loose ends
>>     there like namespace changes. I'll add to the other thread about this.
>>
>>
>>
>>     On Sat, Mar 26, 2016 at 1:17 PM, Jacek Laskowski <[email protected]
>>     <mailto:[email protected]>> wrote:
>>      > Hi,
>>      >
>>      > Although I'm not that much experienced member of ASF, I share your
>>      > concerns. I haven't looked at the issue from this point of view,
>> but
>>      > after having read the thread I think PMC should've signed off the
>>      > migration of ASF-owned code to a non-ASF repo. At least a vote is
>>      > required (and this discussion is a sign that the process has not
>> been
>>      > conducted properly as people have concerns, me including).
>>      >
>>      > Thanks Mridul!
>>      >
>>      > Pozdrawiam,
>>      > Jacek Laskowski
>>      > ----
>>      > https://medium.com/@jaceklaskowski/
>>      > Mastering Apache Spark http://bit.ly/mastering-apache-spark
>>      > Follow me at https://twitter.com/jaceklaskowski
>>      >
>>      >
>>      > On Thu, Mar 17, 2016 at 9:13 PM, Mridul Muralidharan
>>     <[email protected] <mailto:[email protected]>> wrote:
>>      >> I am not referring to code edits - but to migrating submodules and
>>      >> code currently in Apache Spark to 'outside' of it.
>>      >> If I understand correctly, assets from Apache Spark are being
>> moved
>>      >> out of it into thirdparty external repositories - not owned by
>>     Apache.
>>      >>
>>      >> At a minimum, dev@ discussion (like this one) should be
>> initiated.
>>      >> As PMC is responsible for the project assets (including code),
>>     signoff
>>      >> is required for it IMO.
>>      >>
>>      >> More experienced Apache members might be opine better in case I
>>     got it wrong !
>>      >>
>>      >>
>>      >> Regards,
>>      >> Mridul
>>      >>
>>      >>
>>      >> On Thu, Mar 17, 2016 at 12:55 PM, Cody Koeninger
>>     <[email protected] <mailto:[email protected]>> wrote:
>>      >>> Why would a PMC vote be necessary on every code deletion?
>>      >>>
>>      >>> There was a Jira and pull request discussion about the
>>     submodules that
>>      >>> have been removed so far.
>>      >>>
>>      >>> https://issues.apache.org/jira/browse/SPARK-13843
>>      >>>
>>      >>> There's another ongoing one about Kafka specifically
>>      >>>
>>      >>> https://issues.apache.org/jira/browse/SPARK-13877
>>      >>>
>>      >>>
>>      >>> On Thu, Mar 17, 2016 at 2:49 PM, Mridul Muralidharan
>>     <[email protected] <mailto:[email protected]>> wrote:
>>      >>>>
>>      >>>> I was not aware of a discussion in Dev list about this - agree
>>     with most of
>>      >>>> the observations.
>>      >>>> In addition, I did not see PMC signoff on moving (sub-)modules
>>     out.
>>      >>>>
>>      >>>> Regards
>>      >>>> Mridul
>>      >>>>
>>      >>>>
>>      >>>>
>>      >>>> On Thursday, March 17, 2016, Marcelo Vanzin
>>     <[email protected] <mailto:[email protected]>> wrote:
>>      >>>>>
>>      >>>>> Hello all,
>>      >>>>>
>>      >>>>> Recently a lot of the streaming backends were moved to a
>> separate
>>      >>>>> project on github and removed from the main Spark repo.
>>      >>>>>
>>      >>>>> While I think the idea is great, I'm a little worried about the
>>      >>>>> execution. Some concerns were already raised on the bug
>> mentioned
>>      >>>>> above, but I'd like to have a more explicit discussion about
>>     this so
>>      >>>>> things don't fall through the cracks.
>>      >>>>>
>>      >>>>> Mainly I have three concerns.
>>      >>>>>
>>      >>>>> i. Ownership
>>      >>>>>
>>      >>>>> That code used to be run by the ASF, but now it's hosted in a
>>     github
>>      >>>>> repo owned not by the ASF. That sounds a little sub-optimal,
>>     if not
>>      >>>>> problematic.
>>      >>>>>
>>      >>>>> ii. Governance
>>      >>>>>
>>      >>>>> Similar to the above; who has commit access to the above
>>     repos? Will
>>      >>>>> all the Spark committers, present and future, have commit
>>     access to
>>      >>>>> all of those repos? Are they still going to be considered part
>> of
>>      >>>>> Spark and have release management done through the Spark
>>     community?
>>      >>>>>
>>      >>>>>
>>      >>>>> For both of the questions above, why are they not turned into
>>      >>>>> sub-projects of Spark and hosted on the ASF repos? I believe
>>     there is
>>      >>>>> a mechanism to do that, without the need to keep the code in
>>     the main
>>      >>>>> Spark repo, right?
>>      >>>>>
>>      >>>>> iii. Usability
>>      >>>>>
>>      >>>>> This is another thing I don't see discussed. For Scala-based
>> code
>>      >>>>> things don't change much, I guess, if the artifact names
>>     don't change
>>      >>>>> (another reason to keep things in the ASF?), but what about
>>     python?
>>      >>>>> How are pyspark users expected to get that code going
>>     forward, since
>>      >>>>> it's not in Spark's pyspark.zip anymore?
>>      >>>>>
>>      >>>>>
>>      >>>>> Is there an easy way of keeping these things within the ASF
>> Spark
>>      >>>>> project? I think that would be better for everybody.
>>      >>>>>
>>      >>>>> --
>>      >>>>> Marcelo
>>      >>>>>
>>      >>>>>
>>     ---------------------------------------------------------------------
>>      >>>>> To unsubscribe, e-mail: [email protected]
>>     <mailto:[email protected]>
>>      >>>>> For additional commands, e-mail: [email protected]
>>     <mailto:[email protected]>
>>      >>>>>
>>      >>>>
>>      >>
>>      >>
>>     ---------------------------------------------------------------------
>>      >> To unsubscribe, e-mail: [email protected]
>>     <mailto:[email protected]>
>>      >> For additional commands, e-mail: [email protected]
>>     <mailto:[email protected]>
>>      >>
>>      >
>>      >
>> ---------------------------------------------------------------------
>>      > To unsubscribe, e-mail: [email protected]
>>     <mailto:[email protected]>
>>      > For additional commands, e-mail: [email protected]
>>     <mailto:[email protected]>
>>      >
>>
>>     ---------------------------------------------------------------------
>>     To unsubscribe, e-mail: [email protected]
>>     <mailto:[email protected]>
>>     For additional commands, e-mail: [email protected]
>>     <mailto:[email protected]>
>>
>>
>>
>>
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>>
>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]
>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/

Re: Creating Spark Extras project, was Re: SPARK-13843 and future of streaming backends

Reply via email to