+1

Looking forward to this...


Phil

This message optimized for indexing by NSA PRISM


On Wed, May 31, 2017 at 12:54 PM, Neelesh Salian
<neeleshssal...@gmail.com> wrote:
> +1 (non-binding)
> Thanks for putting this together.
>
> On May 31, 2017 9:46 AM, "Marcelo Vanzin" <van...@cloudera.com> wrote:
>
>> +1 (non-binding)
>>
>> On Wed, May 31, 2017 at 6:03 AM, Sean Busbey <bus...@apache.org> wrote:
>> > Hi folks!
>> >
>> > I'm calling a vote to accept "Livy" into the Apache Incubator.
>> >
>> > The full proposal is available below, and is also available in the wiki:
>> >
>> > https://wiki.apache.org/incubator/LivyProposal
>> >
>> > For additional context, please see the discussion thread:
>> >
>> > https://s.apache.org/incubator-livy-proposal-thread
>> >
>> > Please cast your vote:
>> >
>> > [ ] +1, bring Livy into Incubator
>> > [ ] -1, do not bring Livy into Incubator, because...
>> >
>> > The vote will open at least for 72 hours and only votes from the
>> Incubator
>> > PMC are binding.
>> >
>> > I start with my vote:
>> > +1
>> >
>> > ----
>> >
>> > = Abstract =
>> >
>> > Livy is web service that exposes a REST interface for managing long
>> running
>> > Apache Spark contexts in your cluster. With Livy, new applications can be
>> > built on top of Apache Spark that require fine grained interaction with
>> many
>> > Spark contexts.
>> >
>> > = Proposal =
>> >
>> > Livy is an open-source REST service for Apache Spark. Livy enables
>> > applications to submit Spark applications and retrieve results without a
>> > co-location requirement on the Spark cluster.
>> >
>> > We propose to contribute the Livy codebase and associated artifacts (e.g.
>> > documentation, web-site context etc) to the Apache Software Foundation.
>> >
>> > = Background =
>> >
>> > Apache Spark is a fast and general purpose distributed compute engine,
>> with
>> > a versatile API. It enables processing of large quantities of static data
>> > distributed over a cluster of machines, as well as processing of
>> continuous
>> > streams of data. It is the preferred distributed data processing engine
>> for
>> > data engineering, stream processing and data science workloads. Each
>> Spark
>> > application uses a construct called the SparkContext, which is the
>> > application’s connection or entry point to the Spark engine. Each Spark
>> > application will have its own SparkContext.
>> >
>> > Livy enables clients to interact with one or more Spark sessions through
>> the
>> > Livy Server, which acts as a proxy layer. Livy Clients have fine grained
>> > control over the lifecycle of the Spark sessions, as well as the ability
>> to
>> > submit jobs and retrieve results, all over HTTP. Clients have two modes
>> of
>> > interaction: RPC Client API, available in Java and Python, which allows
>> > results to be retrieved as Java or Python objects. The serialization and
>> > deserialization of the results is handled by the Livy framework. HTTP
>> based
>> > API that allows submission of code snippets, and retrieval of the
>> results in
>> > different formats.
>> >
>> > Multi-tenant resource allocation and security: Livy enables multiple
>> > independent Spark sessions to be managed simultaneously. Multiple clients
>> > can also interact simultaneously with the same Spark session and share
>> the
>> > resources of that Spark session. Livy can also enforce secure,
>> authenticated
>> > communication between the clients and their respective Spark sessions.
>> >
>> > More information on Livy can be found at the existing open source
>> website:
>> > http://livy.io/
>> >
>> > = Rationale =
>> >
>> > Users want to use Spark’s powerful processing engine and API as the data
>> > processing backend for interactive applications. However, the job
>> submission
>> > and application interaction mechanisms built into Apache Spark are
>> > insufficient and cumbersome for multi-user interactive applications.
>> >
>> > The primary mechanism for applications to submit Spark jobs is via
>> > spark-submit
>> > (http://spark.apache.org/docs/latest/submitting-applications.html),
>> which is
>> > available as a command line tool as well as a programmatic API. However,
>> > spark-submit has the following limitations that make it difficult to
>> build
>> > interactive applications: It is slow: each invocation of spark-submit
>> > involves a setup phase where cluster resources are acquired, new
>> processes
>> > are forked, etc. This setup phase runs for many seconds, or even minutes,
>> > and hence is too slow for interactive applications. It is cumbersome and
>> > lacks flexibility: application code and dependencies have to be
>> pre-compiled
>> > and submitted as jars, and can not be submitted interactively.
>> >
>> > Apache Spark comes with an ODBC/JDBC server, which can be used to submit
>> SQL
>> > queries to Spark. However, this solution is limited to SQL and does not
>> > allow the client to leverage the rest of the Spark API, such as RDDs,
>> MLlib
>> > and Streaming.
>> >
>> > A third way of using Spark is via its command-line shell, which allows
>> the
>> > interactive submission of snippets of Spark code. However, the shell
>> entails
>> > running Spark code on the client machine and hence is not a viable
>> mechanism
>> > for remote clients to submit Spark jobs.
>> >
>> > Livy solves the limitations of the above three mechanisms, and provides
>> the
>> > full Spark API as a multi-tenant service to remote clients.
>> >
>> > Since the open source release of Livy in late 2015, we have seen
>> tremendous
>> > interest among a diverse set of application developers and ISVs that
>> want to
>> > build applications with Apache Spark. To make Livy a robust and flexible
>> > solution that will enable a broad and growing set of applications, it is
>> > important to grow a large and varied community of contributors.
>> >
>> > = Initial Goals =
>> >
>> >   * Move existing codebase, website, documentation and mailing lists to
>> >     Apache-hosted infrastructure
>> >   * Work with the infrastructure team to implement and approve our code
>> >     review, build, and testing workflows in the context of the ASF
>> >   * Incremental development and releases per Apache guidelines
>> >
>> > = Current Status =
>> >
>> > The Livy project began at Cloudera, as a part of the Hue project.
>> Cloudera
>> > soon realized the broad applicability of Livy, and separated it out into
>> an
>> > independent project in Nov 2015.
>> >
>> > == Releases ==
>> >
>> > Livy has undergone two public releases, tagged here:
>> >
>> >  * https://github.com/cloudera/livy/releases/tag/v0.2.0
>> >  * https://github.com/cloudera/livy/releases/tag/v0.3.0
>> >
>> > Tarballs and zip files were created for each release and hosted on
>> github.
>> > Upon joining the incubator, we will adopt a more typical ASF release
>> > process.
>> >
>> > == Source ==
>> >
>> > Livy’s source is currently hosted on Github at:
>> > https://github.com/cloudera/livy
>> >
>> > This repository will be transitioned to Apache’s git hosting during
>> > incubation.
>> >
>> > == Code review ==
>> >
>> > Livy’s code reviews are currently public and hosted on github as pull
>> > request reviews at: https://github.com/cloudera/livy/pulls
>> > The Livy developer community so far is happy with github pull request
>> > reviews and hopes to continue this after being admitted to the ASF.
>> >
>> > == Issue Tracking ==
>> >
>> > Livy’s bug and feature tracking is hosted on JIRA at:
>> > https://issues.cloudera.org/projects/LIVY/summary
>> > This JIRA instance contains bugs and development discussion dating back 1
>> > year and will provide an initial seed for the ASF JIRA
>> >
>> > == Community Discussion ==
>> >
>> > Livy has several public discussion forums:
>> >
>> >  * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-dev
>> >  * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-user
>> >
>> > == Development Practices ==
>> >
>> > The Livy project follows a review before commit philosophy. Every commit
>> > automatically runs through the unit tests and generates coverage reports
>> > presented as a pull request comment. Our experience with this process
>> leads
>> > us to believe that it helps ease new contributors into the project. They
>> get
>> > feedback quickly on common mistakes, lowering the burden on reviewers.
>> Those
>> > same reviewers get to lead by example, showing the new contributors that
>> we
>> > value feedback within our community even when changes are done by more
>> > experienced folks.
>> >
>> > == Meritocracy ==
>> >
>> > We believe strongly in meritocracy when electing committers and PMC
>> members.
>> > In the past few months, the project has added two new committers from two
>> > different organisations, in recognition of their significant
>> contributions
>> > to the project. We will encourage contributions and participation of all
>> > types, and ensure that contributors are appropriately recognized.
>> >
>> > == Community ==
>> >
>> > Though Livy is relatively new as a standalone open source project, it has
>> > already seen promising growth in its community across several
>> organizations:
>> > Cloudera is the original development sponsor for Livy
>> > Microsoft pushed the development of the interpreter fixing high
>> availability
>> > issues and adding additional features.
>> > Hortonworks has contributed the security features to Livy allowing
>> kerberos
>> > and impersonation to work with Spark
>> > IBM is starting to make contributions to the Livy project
>> > A number of other patches contributed by community members
>> >
>> > Livy currently relies on Google Groups for mailing lists. These lists
>> have
>> > been active since the end of 2015/start of 2016. Currently, Livy’s user
>> > mailing list has 173 subscribers and has hosted a total of 227 topic
>> > threads. Livy’s developer list has 49 subscribers and has hosted 79 topic
>> > threads.
>> >
>> > == Core Developers ==
>> >
>> > The early contributions to Livy were made by Cloudera engineers. In 2016,
>> > engineers from Microsoft and Hortonworks joined the core developer
>> > community.
>> >
>> > == Alignment ==
>> >
>> > Livy is built upon Apache Spark, and other Apache projects like Apache
>> > Hadoop YARN. It’s used as a building block by Apache Zeppelin. These
>> > community connections combined with our focus on development practices
>> that
>> > emphasize community engagement with a path to meritocratic recognition
>> > naturally align us with the ASF.
>> >
>> > = Known Risks =
>> >
>> > == Orphaned Products ==
>> >
>> > The risk of Livy being abandoned is low because it is supported by three
>> > major big-data software vendors. Moreover, Livy is already used to power
>> > multiple releases of services and products used in production.
>> >
>> > == Inexperience with Open Source ==
>> >
>> > Several of the initial committers are experienced open source developers,
>> > several being committers and/or PMC members on other ASF projects (Spark,
>> > YARN).
>> >
>> > == Homogenous Developers ==
>> >
>> > The project already has a diverse developer base. It has contributions
>> from
>> > 3 major organisations (Cloudera, Microsoft and Hortonworks), and is used
>> in
>> > diverse applications, in diverse settings (On-Prem and Cloud).
>> >
>> > == Reliance on salaried Developers ==
>> >
>> > The contributions to the Livy project to date have been made by salaried
>> > engineers from Cloudera, Microsoft and Hortonworks. One of the
>> individuals
>> > on the initial committer list has since left Microsoft and is currently
>> > unaffiliated. The remaining contributors are from Cloudera and
>> Hortonworks.
>> > Since there are at least two major organizations involved, the risk of
>> > reliance on a single group of salaried developers is mitigated. The Livy
>> > user base is diverse, with users from across the globe, including users
>> from
>> > academic settings. We aim to further diversify the Livy user and
>> contributor
>> > base.
>> >
>> > == Relationships with other Apache projects ==
>> >
>> > Livy is closely tied to the Apache Spark project and currently addresses
>> the
>> > scenarios for a REST based batch and interactive gateway for Spark jobs
>> on
>> > YARN. Given the growing number of integrations with Livy, keeping it
>> outside
>> > of Apache Spark aligns with the desire of the Apache Spark community to
>> > reduce the number of external dependencies in the Spark project.
>> > Specifically, the Apache Spark community has previously expressed a
>> desire
>> > to keep job servers independent from the project.<<FootNote(See, for
>> > example, discussion of the Ooyala Spark Job Server in SPARK-818)>>
>> > Furthermore, while Livy common usage is closely tied to Spark deployments
>> > right now, its core building blocks can be reused elsewhere.  Livy’s
>> Remote
>> > REPL could be used as a library for interactive scenarios in non-Spark
>> > projects. In the future, integrations with cluster managers like Apache
>> > Mesos and others could also be added.
>> >
>> > The features provided by Livy have already been integrated with existing
>> > projects like Jupyter and Apache Zeppelin for their interactive Spark use
>> > cases. This validates the need for a project like Livy and provides an
>> > active downstream user base that the Livy community can interact with to
>> > seed future interest in the project.
>> >
>> > Livy serves a similar purpose to Apache Toree (incubating) but differs in
>> > making session management, security and impersonation a focal design
>> point.
>> >
>> > == An Excessive Fascination with the Apache Brand ==
>> >
>> > The primary motivation for submitting Livy to the ASF is to grow a
>> diverse
>> > and strong community. We wish to encourage diverse organisations,
>> including
>> > ISVs, to adopt Livy and contribute to Livy without any concerns about
>> > ownership or licensing.
>> >
>> > = Documentation =
>> >
>> > Documentation can be found on the Livy website http://livy.io/
>> >
>> > The Livy web site is version controlled on the ‘gh-pages’ branch of the
>> > above repository.
>> > Additional documentation is provided on the github wiki:
>> > https://github.com/cloudera/livy/wiki
>> > APis are documented within the source code as JavaDoc style documentation
>> > comments.
>> >
>> > = Initial Source =
>> >
>> > The initial source code for Livy is hosted at
>> > https://github.com/cloudera/livy
>> >
>> > = Source and Intellectual Property submission plan =
>> >
>> > The Livy codebase and web site is currently hosted on GitHub and will be
>> > transitioned to the ASF repositories during incubation. Livy is already
>> > licensed under the Apache 2.0 license. Cloudera has collected ICLAs and
>> > CCLAs from all committers. There are, however, some contributions
>> recently
>> > from authors that have not signed the CCLA and ICLA. If necessary for a
>> > successful SGA, we’ll seek the necessary documentation or replace the
>> > contributions.
>> >
>> > The “Livy” name is not a registered trademark. We will need to do a
>> > trademark search and make sure it is available for the Apache Foundation
>> > prior to graduation.
>> >
>> > Cloudera currently owns the domain name: http://livy.io/. Once all the
>> > documentation has moved over to ASF infrastructure, the main landing page
>> > will become livy.incubator.apache.org and the old domain will just act
>> as a
>> > redirect.
>> >
>> > = External Dependencies =
>> >
>> > The list below covers the non-Apache dependencies of the project and
>> their
>> > licenses.
>> >
>> >  * Jetty: Apache 2.0
>> >  * Dropwizard Metrics: Apache 2.0
>> >  * FasterXML Jackson: Apache 2.0
>> >  * Netty: Apache 2.0
>> >  * Scala: BSD
>> >  * Py4J: BSD
>> >  * Scalatra: BSD
>> >
>> > Build/test-only dependencies:
>> >
>> >  * Mockito: MIT
>> >  * JUnit: Eclipse
>> >
>> > = Required Resources =
>> >
>> > == Mailing Lists ==
>> >
>> >  * priv...@livy.incubator.apache.org (PPMC)
>> >  * d...@livy.incubator.apache.org (dev mailing list)
>> >  * u...@livy.incubator.apache.org (User questions)
>> >  * comm...@livy.incubator.apache.org (subscribers shouldn’t be able to
>> post)
>> >  * iss...@livy.incubator.apache.org (subscribers shouldn’t be able to
>> post)
>> >
>> > == Git Repository ==
>> >
>> > git://git.apache.org/incubator-livy
>> >
>> > == Issue Tracking ==
>> >
>> > We would like to import our current JIRA project into the ASF JIRA, such
>> > that our historical commit message and code comments continue to
>> reference
>> > the appropriate bug numbers.
>> >
>> > = Initial Committers =
>> >
>> >  * Marcelo Vanzin (van...@cloudera.com)
>> >  * Alex Man (alex@alexman.space)
>> >  * Jeff Zhang (zjf...@gmail.com)
>> >  * Saisai Shao (ss...@hortonworks.com)
>> >  * Kostas Sakellis (kos...@cloudera.com)
>> >
>> > = Affiliations =
>> >
>> > The initial set of committers includes people employed by Cloudera and
>> > Hortonworks as well as one currently independent contributor.
>> >
>> > = Additional Interested Contributors =
>> >
>> > Those interested in getting involved with the project as we enter
>> incubation
>> > are encouraged to list themselves here.
>> >
>> >   * Ismaël Mejía (ieme...@apache.org)
>> >
>> > = Sponsors =
>> >
>> > == Champion ==
>> >
>> > Sean Busbey (bus...@apache.org)
>> >
>> > == Nominated Mentors ==
>> >
>> >  * Bikas Saha (bi...@apache.org)
>> >  * Brock Noland (br...@phdata.io)
>> >  * Luciano Resende (lrese...@apache.org)
>> >
>> > == Sponsoring Entity ==
>> >
>> > We ask that the Incubator PMC sponsor this proposal.
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> > For additional commands, e-mail: general-h...@incubator.apache.org
>> >
>>
>>
>>
>> --
>> Marcelo
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Reply via email to