+1 On Thu, Jun 1, 2017 at 11:14 AM Hitesh Shah <hit...@apache.org> wrote:
> +1 > > -- Hitesh > > On Wed, May 31, 2017 at 6:03 AM, Sean Busbey <bus...@apache.org> wrote: > > > Hi folks! > > > > I'm calling a vote to accept "Livy" into the Apache Incubator. > > > > The full proposal is available below, and is also available in the wiki: > > > > https://wiki.apache.org/incubator/LivyProposal > > > > For additional context, please see the discussion thread: > > > > https://s.apache.org/incubator-livy-proposal-thread > > > > Please cast your vote: > > > > [ ] +1, bring Livy into Incubator > > [ ] -1, do not bring Livy into Incubator, because... > > > > The vote will open at least for 72 hours and only votes from the > Incubator > > PMC are binding. > > > > I start with my vote: > > +1 > > > > ---- > > > > = Abstract = > > > > Livy is web service that exposes a REST interface for managing long > running > > Apache Spark contexts in your cluster. With Livy, new applications can be > > built on top of Apache Spark that require fine grained interaction with > > many > > Spark contexts. > > > > = Proposal = > > > > Livy is an open-source REST service for Apache Spark. Livy enables > > applications to submit Spark applications and retrieve results without a > > co-location requirement on the Spark cluster. > > > > We propose to contribute the Livy codebase and associated artifacts (e.g. > > documentation, web-site context etc) to the Apache Software Foundation. > > > > = Background = > > > > Apache Spark is a fast and general purpose distributed compute engine, > with > > a versatile API. It enables processing of large quantities of static data > > distributed over a cluster of machines, as well as processing of > continuous > > streams of data. It is the preferred distributed data processing engine > for > > data engineering, stream processing and data science workloads. Each > Spark > > application uses a construct called the SparkContext, which is the > > application’s connection or entry point to the Spark engine. Each Spark > > application will have its own SparkContext. > > > > Livy enables clients to interact with one or more Spark sessions through > > the > > Livy Server, which acts as a proxy layer. Livy Clients have fine grained > > control over the lifecycle of the Spark sessions, as well as the ability > to > > submit jobs and retrieve results, all over HTTP. Clients have two modes > of > > interaction: RPC Client API, available in Java and Python, which allows > > results to be retrieved as Java or Python objects. The serialization and > > deserialization of the results is handled by the Livy framework. HTTP > based > > API that allows submission of code snippets, and retrieval of the results > > in > > different formats. > > > > Multi-tenant resource allocation and security: Livy enables multiple > > independent Spark sessions to be managed simultaneously. Multiple clients > > can also interact simultaneously with the same Spark session and share > the > > resources of that Spark session. Livy can also enforce secure, > > authenticated > > communication between the clients and their respective Spark sessions. > > > > More information on Livy can be found at the existing open source > website: > > http://livy.io/ > > > > = Rationale = > > > > Users want to use Spark’s powerful processing engine and API as the data > > processing backend for interactive applications. However, the job > > submission > > and application interaction mechanisms built into Apache Spark are > > insufficient and cumbersome for multi-user interactive applications. > > > > The primary mechanism for applications to submit Spark jobs is via > > spark-submit > > (http://spark.apache.org/docs/latest/submitting-applications.html), > which > > is > > available as a command line tool as well as a programmatic API. However, > > spark-submit has the following limitations that make it difficult to > build > > interactive applications: It is slow: each invocation of spark-submit > > involves a setup phase where cluster resources are acquired, new > processes > > are forked, etc. This setup phase runs for many seconds, or even minutes, > > and hence is too slow for interactive applications. It is cumbersome and > > lacks flexibility: application code and dependencies have to be > > pre-compiled > > and submitted as jars, and can not be submitted interactively. > > > > Apache Spark comes with an ODBC/JDBC server, which can be used to submit > > SQL > > queries to Spark. However, this solution is limited to SQL and does not > > allow the client to leverage the rest of the Spark API, such as RDDs, > MLlib > > and Streaming. > > > > A third way of using Spark is via its command-line shell, which allows > the > > interactive submission of snippets of Spark code. However, the shell > > entails > > running Spark code on the client machine and hence is not a viable > > mechanism > > for remote clients to submit Spark jobs. > > > > Livy solves the limitations of the above three mechanisms, and provides > the > > full Spark API as a multi-tenant service to remote clients. > > > > Since the open source release of Livy in late 2015, we have seen > tremendous > > interest among a diverse set of application developers and ISVs that want > > to > > build applications with Apache Spark. To make Livy a robust and flexible > > solution that will enable a broad and growing set of applications, it is > > important to grow a large and varied community of contributors. > > > > = Initial Goals = > > > > * Move existing codebase, website, documentation and mailing lists to > > Apache-hosted infrastructure > > * Work with the infrastructure team to implement and approve our code > > review, build, and testing workflows in the context of the ASF > > * Incremental development and releases per Apache guidelines > > > > = Current Status = > > > > The Livy project began at Cloudera, as a part of the Hue project. > Cloudera > > soon realized the broad applicability of Livy, and separated it out into > an > > independent project in Nov 2015. > > > > == Releases == > > > > Livy has undergone two public releases, tagged here: > > > > * https://github.com/cloudera/livy/releases/tag/v0.2.0 > > * https://github.com/cloudera/livy/releases/tag/v0.3.0 > > > > Tarballs and zip files were created for each release and hosted on > github. > > Upon joining the incubator, we will adopt a more typical ASF release > > process. > > > > == Source == > > > > Livy’s source is currently hosted on Github at: > > https://github.com/cloudera/livy > > > > This repository will be transitioned to Apache’s git hosting during > > incubation. > > > > == Code review == > > > > Livy’s code reviews are currently public and hosted on github as pull > > request reviews at: https://github.com/cloudera/livy/pulls > > The Livy developer community so far is happy with github pull request > > reviews and hopes to continue this after being admitted to the ASF. > > > > == Issue Tracking == > > > > Livy’s bug and feature tracking is hosted on JIRA at: > > https://issues.cloudera.org/projects/LIVY/summary > > This JIRA instance contains bugs and development discussion dating back 1 > > year and will provide an initial seed for the ASF JIRA > > > > == Community Discussion == > > > > Livy has several public discussion forums: > > > > * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-dev > > * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-user > > > > == Development Practices == > > > > The Livy project follows a review before commit philosophy. Every commit > > automatically runs through the unit tests and generates coverage reports > > presented as a pull request comment. Our experience with this process > leads > > us to believe that it helps ease new contributors into the project. They > > get > > feedback quickly on common mistakes, lowering the burden on reviewers. > > Those > > same reviewers get to lead by example, showing the new contributors that > we > > value feedback within our community even when changes are done by more > > experienced folks. > > > > == Meritocracy == > > > > We believe strongly in meritocracy when electing committers and PMC > > members. > > In the past few months, the project has added two new committers from two > > different organisations, in recognition of their significant > contributions > > to the project. We will encourage contributions and participation of all > > types, and ensure that contributors are appropriately recognized. > > > > == Community == > > > > Though Livy is relatively new as a standalone open source project, it has > > already seen promising growth in its community across several > > organizations: > > Cloudera is the original development sponsor for Livy > > Microsoft pushed the development of the interpreter fixing high > > availability > > issues and adding additional features. > > Hortonworks has contributed the security features to Livy allowing > kerberos > > and impersonation to work with Spark > > IBM is starting to make contributions to the Livy project > > A number of other patches contributed by community members > > > > Livy currently relies on Google Groups for mailing lists. These lists > have > > been active since the end of 2015/start of 2016. Currently, Livy’s user > > mailing list has 173 subscribers and has hosted a total of 227 topic > > threads. Livy’s developer list has 49 subscribers and has hosted 79 topic > > threads. > > > > == Core Developers == > > > > The early contributions to Livy were made by Cloudera engineers. In 2016, > > engineers from Microsoft and Hortonworks joined the core developer > > community. > > > > == Alignment == > > > > Livy is built upon Apache Spark, and other Apache projects like Apache > > Hadoop YARN. It’s used as a building block by Apache Zeppelin. These > > community connections combined with our focus on development practices > that > > emphasize community engagement with a path to meritocratic recognition > > naturally align us with the ASF. > > > > = Known Risks = > > > > == Orphaned Products == > > > > The risk of Livy being abandoned is low because it is supported by three > > major big-data software vendors. Moreover, Livy is already used to power > > multiple releases of services and products used in production. > > > > == Inexperience with Open Source == > > > > Several of the initial committers are experienced open source developers, > > several being committers and/or PMC members on other ASF projects (Spark, > > YARN). > > > > == Homogenous Developers == > > > > The project already has a diverse developer base. It has contributions > from > > 3 major organisations (Cloudera, Microsoft and Hortonworks), and is used > in > > diverse applications, in diverse settings (On-Prem and Cloud). > > > > == Reliance on salaried Developers == > > > > The contributions to the Livy project to date have been made by salaried > > engineers from Cloudera, Microsoft and Hortonworks. One of the > individuals > > on the initial committer list has since left Microsoft and is currently > > unaffiliated. The remaining contributors are from Cloudera and > Hortonworks. > > Since there are at least two major organizations involved, the risk of > > reliance on a single group of salaried developers is mitigated. The Livy > > user base is diverse, with users from across the globe, including users > > from > > academic settings. We aim to further diversify the Livy user and > > contributor > > base. > > > > == Relationships with other Apache projects == > > > > Livy is closely tied to the Apache Spark project and currently addresses > > the > > scenarios for a REST based batch and interactive gateway for Spark jobs > on > > YARN. Given the growing number of integrations with Livy, keeping it > > outside > > of Apache Spark aligns with the desire of the Apache Spark community to > > reduce the number of external dependencies in the Spark project. > > Specifically, the Apache Spark community has previously expressed a > desire > > to keep job servers independent from the project.<<FootNote(See, for > > example, discussion of the Ooyala Spark Job Server in SPARK-818)>> > > Furthermore, while Livy common usage is closely tied to Spark deployments > > right now, its core building blocks can be reused elsewhere. Livy’s > Remote > > REPL could be used as a library for interactive scenarios in non-Spark > > projects. In the future, integrations with cluster managers like Apache > > Mesos and others could also be added. > > > > The features provided by Livy have already been integrated with existing > > projects like Jupyter and Apache Zeppelin for their interactive Spark use > > cases. This validates the need for a project like Livy and provides an > > active downstream user base that the Livy community can interact with to > > seed future interest in the project. > > > > Livy serves a similar purpose to Apache Toree (incubating) but differs in > > making session management, security and impersonation a focal design > point. > > > > == An Excessive Fascination with the Apache Brand == > > > > The primary motivation for submitting Livy to the ASF is to grow a > diverse > > and strong community. We wish to encourage diverse organisations, > including > > ISVs, to adopt Livy and contribute to Livy without any concerns about > > ownership or licensing. > > > > = Documentation = > > > > Documentation can be found on the Livy website http://livy.io/ > > > > The Livy web site is version controlled on the ‘gh-pages’ branch of the > > above repository. > > Additional documentation is provided on the github wiki: > > https://github.com/cloudera/livy/wiki > > APis are documented within the source code as JavaDoc style documentation > > comments. > > > > = Initial Source = > > > > The initial source code for Livy is hosted at > > https://github.com/cloudera/livy > > > > = Source and Intellectual Property submission plan = > > > > The Livy codebase and web site is currently hosted on GitHub and will be > > transitioned to the ASF repositories during incubation. Livy is already > > licensed under the Apache 2.0 license. Cloudera has collected ICLAs and > > CCLAs from all committers. There are, however, some contributions > recently > > from authors that have not signed the CCLA and ICLA. If necessary for a > > successful SGA, we’ll seek the necessary documentation or replace the > > contributions. > > > > The “Livy” name is not a registered trademark. We will need to do a > > trademark search and make sure it is available for the Apache Foundation > > prior to graduation. > > > > Cloudera currently owns the domain name: http://livy.io/. Once all the > > documentation has moved over to ASF infrastructure, the main landing page > > will become livy.incubator.apache.org and the old domain will just act > as > > a > > redirect. > > > > = External Dependencies = > > > > The list below covers the non-Apache dependencies of the project and > their > > licenses. > > > > * Jetty: Apache 2.0 > > * Dropwizard Metrics: Apache 2.0 > > * FasterXML Jackson: Apache 2.0 > > * Netty: Apache 2.0 > > * Scala: BSD > > * Py4J: BSD > > * Scalatra: BSD > > > > Build/test-only dependencies: > > > > * Mockito: MIT > > * JUnit: Eclipse > > > > = Required Resources = > > > > == Mailing Lists == > > > > * priv...@livy.incubator.apache.org (PPMC) > > * d...@livy.incubator.apache.org (dev mailing list) > > * u...@livy.incubator.apache.org (User questions) > > * comm...@livy.incubator.apache.org (subscribers shouldn’t be able to > > post) > > * iss...@livy.incubator.apache.org (subscribers shouldn’t be able to > > post) > > > > == Git Repository == > > > > git://git.apache.org/incubator-livy > > > > == Issue Tracking == > > > > We would like to import our current JIRA project into the ASF JIRA, such > > that our historical commit message and code comments continue to > reference > > the appropriate bug numbers. > > > > = Initial Committers = > > > > * Marcelo Vanzin (van...@cloudera.com) > > * Alex Man (alex@alexman.space) > > * Jeff Zhang (zjf...@gmail.com) > > * Saisai Shao (ss...@hortonworks.com) > > * Kostas Sakellis (kos...@cloudera.com) > > > > = Affiliations = > > > > The initial set of committers includes people employed by Cloudera and > > Hortonworks as well as one currently independent contributor. > > > > = Additional Interested Contributors = > > > > Those interested in getting involved with the project as we enter > > incubation > > are encouraged to list themselves here. > > > > * Ismaël Mejía (ieme...@apache.org) > > > > = Sponsors = > > > > == Champion == > > > > Sean Busbey (bus...@apache.org) > > > > == Nominated Mentors == > > > > * Bikas Saha (bi...@apache.org) > > * Brock Noland (br...@phdata.io) > > * Luciano Resende (lrese...@apache.org) > > > > == Sponsoring Entity == > > > > We ask that the Incubator PMC sponsor this proposal. > > > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > > For additional commands, e-mail: general-h...@incubator.apache.org > > > > >