+1 Looking forward to this...
Phil This message optimized for indexing by NSA PRISM On Wed, May 31, 2017 at 12:54 PM, Neelesh Salian <neeleshssal...@gmail.com> wrote: > +1 (non-binding) > Thanks for putting this together. > > On May 31, 2017 9:46 AM, "Marcelo Vanzin" <van...@cloudera.com> wrote: > >> +1 (non-binding) >> >> On Wed, May 31, 2017 at 6:03 AM, Sean Busbey <bus...@apache.org> wrote: >> > Hi folks! >> > >> > I'm calling a vote to accept "Livy" into the Apache Incubator. >> > >> > The full proposal is available below, and is also available in the wiki: >> > >> > https://wiki.apache.org/incubator/LivyProposal >> > >> > For additional context, please see the discussion thread: >> > >> > https://s.apache.org/incubator-livy-proposal-thread >> > >> > Please cast your vote: >> > >> > [ ] +1, bring Livy into Incubator >> > [ ] -1, do not bring Livy into Incubator, because... >> > >> > The vote will open at least for 72 hours and only votes from the >> Incubator >> > PMC are binding. >> > >> > I start with my vote: >> > +1 >> > >> > ---- >> > >> > = Abstract = >> > >> > Livy is web service that exposes a REST interface for managing long >> running >> > Apache Spark contexts in your cluster. With Livy, new applications can be >> > built on top of Apache Spark that require fine grained interaction with >> many >> > Spark contexts. >> > >> > = Proposal = >> > >> > Livy is an open-source REST service for Apache Spark. Livy enables >> > applications to submit Spark applications and retrieve results without a >> > co-location requirement on the Spark cluster. >> > >> > We propose to contribute the Livy codebase and associated artifacts (e.g. >> > documentation, web-site context etc) to the Apache Software Foundation. >> > >> > = Background = >> > >> > Apache Spark is a fast and general purpose distributed compute engine, >> with >> > a versatile API. It enables processing of large quantities of static data >> > distributed over a cluster of machines, as well as processing of >> continuous >> > streams of data. It is the preferred distributed data processing engine >> for >> > data engineering, stream processing and data science workloads. Each >> Spark >> > application uses a construct called the SparkContext, which is the >> > application’s connection or entry point to the Spark engine. Each Spark >> > application will have its own SparkContext. >> > >> > Livy enables clients to interact with one or more Spark sessions through >> the >> > Livy Server, which acts as a proxy layer. Livy Clients have fine grained >> > control over the lifecycle of the Spark sessions, as well as the ability >> to >> > submit jobs and retrieve results, all over HTTP. Clients have two modes >> of >> > interaction: RPC Client API, available in Java and Python, which allows >> > results to be retrieved as Java or Python objects. The serialization and >> > deserialization of the results is handled by the Livy framework. HTTP >> based >> > API that allows submission of code snippets, and retrieval of the >> results in >> > different formats. >> > >> > Multi-tenant resource allocation and security: Livy enables multiple >> > independent Spark sessions to be managed simultaneously. Multiple clients >> > can also interact simultaneously with the same Spark session and share >> the >> > resources of that Spark session. Livy can also enforce secure, >> authenticated >> > communication between the clients and their respective Spark sessions. >> > >> > More information on Livy can be found at the existing open source >> website: >> > http://livy.io/ >> > >> > = Rationale = >> > >> > Users want to use Spark’s powerful processing engine and API as the data >> > processing backend for interactive applications. However, the job >> submission >> > and application interaction mechanisms built into Apache Spark are >> > insufficient and cumbersome for multi-user interactive applications. >> > >> > The primary mechanism for applications to submit Spark jobs is via >> > spark-submit >> > (http://spark.apache.org/docs/latest/submitting-applications.html), >> which is >> > available as a command line tool as well as a programmatic API. However, >> > spark-submit has the following limitations that make it difficult to >> build >> > interactive applications: It is slow: each invocation of spark-submit >> > involves a setup phase where cluster resources are acquired, new >> processes >> > are forked, etc. This setup phase runs for many seconds, or even minutes, >> > and hence is too slow for interactive applications. It is cumbersome and >> > lacks flexibility: application code and dependencies have to be >> pre-compiled >> > and submitted as jars, and can not be submitted interactively. >> > >> > Apache Spark comes with an ODBC/JDBC server, which can be used to submit >> SQL >> > queries to Spark. However, this solution is limited to SQL and does not >> > allow the client to leverage the rest of the Spark API, such as RDDs, >> MLlib >> > and Streaming. >> > >> > A third way of using Spark is via its command-line shell, which allows >> the >> > interactive submission of snippets of Spark code. However, the shell >> entails >> > running Spark code on the client machine and hence is not a viable >> mechanism >> > for remote clients to submit Spark jobs. >> > >> > Livy solves the limitations of the above three mechanisms, and provides >> the >> > full Spark API as a multi-tenant service to remote clients. >> > >> > Since the open source release of Livy in late 2015, we have seen >> tremendous >> > interest among a diverse set of application developers and ISVs that >> want to >> > build applications with Apache Spark. To make Livy a robust and flexible >> > solution that will enable a broad and growing set of applications, it is >> > important to grow a large and varied community of contributors. >> > >> > = Initial Goals = >> > >> > * Move existing codebase, website, documentation and mailing lists to >> > Apache-hosted infrastructure >> > * Work with the infrastructure team to implement and approve our code >> > review, build, and testing workflows in the context of the ASF >> > * Incremental development and releases per Apache guidelines >> > >> > = Current Status = >> > >> > The Livy project began at Cloudera, as a part of the Hue project. >> Cloudera >> > soon realized the broad applicability of Livy, and separated it out into >> an >> > independent project in Nov 2015. >> > >> > == Releases == >> > >> > Livy has undergone two public releases, tagged here: >> > >> > * https://github.com/cloudera/livy/releases/tag/v0.2.0 >> > * https://github.com/cloudera/livy/releases/tag/v0.3.0 >> > >> > Tarballs and zip files were created for each release and hosted on >> github. >> > Upon joining the incubator, we will adopt a more typical ASF release >> > process. >> > >> > == Source == >> > >> > Livy’s source is currently hosted on Github at: >> > https://github.com/cloudera/livy >> > >> > This repository will be transitioned to Apache’s git hosting during >> > incubation. >> > >> > == Code review == >> > >> > Livy’s code reviews are currently public and hosted on github as pull >> > request reviews at: https://github.com/cloudera/livy/pulls >> > The Livy developer community so far is happy with github pull request >> > reviews and hopes to continue this after being admitted to the ASF. >> > >> > == Issue Tracking == >> > >> > Livy’s bug and feature tracking is hosted on JIRA at: >> > https://issues.cloudera.org/projects/LIVY/summary >> > This JIRA instance contains bugs and development discussion dating back 1 >> > year and will provide an initial seed for the ASF JIRA >> > >> > == Community Discussion == >> > >> > Livy has several public discussion forums: >> > >> > * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-dev >> > * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-user >> > >> > == Development Practices == >> > >> > The Livy project follows a review before commit philosophy. Every commit >> > automatically runs through the unit tests and generates coverage reports >> > presented as a pull request comment. Our experience with this process >> leads >> > us to believe that it helps ease new contributors into the project. They >> get >> > feedback quickly on common mistakes, lowering the burden on reviewers. >> Those >> > same reviewers get to lead by example, showing the new contributors that >> we >> > value feedback within our community even when changes are done by more >> > experienced folks. >> > >> > == Meritocracy == >> > >> > We believe strongly in meritocracy when electing committers and PMC >> members. >> > In the past few months, the project has added two new committers from two >> > different organisations, in recognition of their significant >> contributions >> > to the project. We will encourage contributions and participation of all >> > types, and ensure that contributors are appropriately recognized. >> > >> > == Community == >> > >> > Though Livy is relatively new as a standalone open source project, it has >> > already seen promising growth in its community across several >> organizations: >> > Cloudera is the original development sponsor for Livy >> > Microsoft pushed the development of the interpreter fixing high >> availability >> > issues and adding additional features. >> > Hortonworks has contributed the security features to Livy allowing >> kerberos >> > and impersonation to work with Spark >> > IBM is starting to make contributions to the Livy project >> > A number of other patches contributed by community members >> > >> > Livy currently relies on Google Groups for mailing lists. These lists >> have >> > been active since the end of 2015/start of 2016. Currently, Livy’s user >> > mailing list has 173 subscribers and has hosted a total of 227 topic >> > threads. Livy’s developer list has 49 subscribers and has hosted 79 topic >> > threads. >> > >> > == Core Developers == >> > >> > The early contributions to Livy were made by Cloudera engineers. In 2016, >> > engineers from Microsoft and Hortonworks joined the core developer >> > community. >> > >> > == Alignment == >> > >> > Livy is built upon Apache Spark, and other Apache projects like Apache >> > Hadoop YARN. It’s used as a building block by Apache Zeppelin. These >> > community connections combined with our focus on development practices >> that >> > emphasize community engagement with a path to meritocratic recognition >> > naturally align us with the ASF. >> > >> > = Known Risks = >> > >> > == Orphaned Products == >> > >> > The risk of Livy being abandoned is low because it is supported by three >> > major big-data software vendors. Moreover, Livy is already used to power >> > multiple releases of services and products used in production. >> > >> > == Inexperience with Open Source == >> > >> > Several of the initial committers are experienced open source developers, >> > several being committers and/or PMC members on other ASF projects (Spark, >> > YARN). >> > >> > == Homogenous Developers == >> > >> > The project already has a diverse developer base. It has contributions >> from >> > 3 major organisations (Cloudera, Microsoft and Hortonworks), and is used >> in >> > diverse applications, in diverse settings (On-Prem and Cloud). >> > >> > == Reliance on salaried Developers == >> > >> > The contributions to the Livy project to date have been made by salaried >> > engineers from Cloudera, Microsoft and Hortonworks. One of the >> individuals >> > on the initial committer list has since left Microsoft and is currently >> > unaffiliated. The remaining contributors are from Cloudera and >> Hortonworks. >> > Since there are at least two major organizations involved, the risk of >> > reliance on a single group of salaried developers is mitigated. The Livy >> > user base is diverse, with users from across the globe, including users >> from >> > academic settings. We aim to further diversify the Livy user and >> contributor >> > base. >> > >> > == Relationships with other Apache projects == >> > >> > Livy is closely tied to the Apache Spark project and currently addresses >> the >> > scenarios for a REST based batch and interactive gateway for Spark jobs >> on >> > YARN. Given the growing number of integrations with Livy, keeping it >> outside >> > of Apache Spark aligns with the desire of the Apache Spark community to >> > reduce the number of external dependencies in the Spark project. >> > Specifically, the Apache Spark community has previously expressed a >> desire >> > to keep job servers independent from the project.<<FootNote(See, for >> > example, discussion of the Ooyala Spark Job Server in SPARK-818)>> >> > Furthermore, while Livy common usage is closely tied to Spark deployments >> > right now, its core building blocks can be reused elsewhere. Livy’s >> Remote >> > REPL could be used as a library for interactive scenarios in non-Spark >> > projects. In the future, integrations with cluster managers like Apache >> > Mesos and others could also be added. >> > >> > The features provided by Livy have already been integrated with existing >> > projects like Jupyter and Apache Zeppelin for their interactive Spark use >> > cases. This validates the need for a project like Livy and provides an >> > active downstream user base that the Livy community can interact with to >> > seed future interest in the project. >> > >> > Livy serves a similar purpose to Apache Toree (incubating) but differs in >> > making session management, security and impersonation a focal design >> point. >> > >> > == An Excessive Fascination with the Apache Brand == >> > >> > The primary motivation for submitting Livy to the ASF is to grow a >> diverse >> > and strong community. We wish to encourage diverse organisations, >> including >> > ISVs, to adopt Livy and contribute to Livy without any concerns about >> > ownership or licensing. >> > >> > = Documentation = >> > >> > Documentation can be found on the Livy website http://livy.io/ >> > >> > The Livy web site is version controlled on the ‘gh-pages’ branch of the >> > above repository. >> > Additional documentation is provided on the github wiki: >> > https://github.com/cloudera/livy/wiki >> > APis are documented within the source code as JavaDoc style documentation >> > comments. >> > >> > = Initial Source = >> > >> > The initial source code for Livy is hosted at >> > https://github.com/cloudera/livy >> > >> > = Source and Intellectual Property submission plan = >> > >> > The Livy codebase and web site is currently hosted on GitHub and will be >> > transitioned to the ASF repositories during incubation. Livy is already >> > licensed under the Apache 2.0 license. Cloudera has collected ICLAs and >> > CCLAs from all committers. There are, however, some contributions >> recently >> > from authors that have not signed the CCLA and ICLA. If necessary for a >> > successful SGA, we’ll seek the necessary documentation or replace the >> > contributions. >> > >> > The “Livy” name is not a registered trademark. We will need to do a >> > trademark search and make sure it is available for the Apache Foundation >> > prior to graduation. >> > >> > Cloudera currently owns the domain name: http://livy.io/. Once all the >> > documentation has moved over to ASF infrastructure, the main landing page >> > will become livy.incubator.apache.org and the old domain will just act >> as a >> > redirect. >> > >> > = External Dependencies = >> > >> > The list below covers the non-Apache dependencies of the project and >> their >> > licenses. >> > >> > * Jetty: Apache 2.0 >> > * Dropwizard Metrics: Apache 2.0 >> > * FasterXML Jackson: Apache 2.0 >> > * Netty: Apache 2.0 >> > * Scala: BSD >> > * Py4J: BSD >> > * Scalatra: BSD >> > >> > Build/test-only dependencies: >> > >> > * Mockito: MIT >> > * JUnit: Eclipse >> > >> > = Required Resources = >> > >> > == Mailing Lists == >> > >> > * priv...@livy.incubator.apache.org (PPMC) >> > * d...@livy.incubator.apache.org (dev mailing list) >> > * u...@livy.incubator.apache.org (User questions) >> > * comm...@livy.incubator.apache.org (subscribers shouldn’t be able to >> post) >> > * iss...@livy.incubator.apache.org (subscribers shouldn’t be able to >> post) >> > >> > == Git Repository == >> > >> > git://git.apache.org/incubator-livy >> > >> > == Issue Tracking == >> > >> > We would like to import our current JIRA project into the ASF JIRA, such >> > that our historical commit message and code comments continue to >> reference >> > the appropriate bug numbers. >> > >> > = Initial Committers = >> > >> > * Marcelo Vanzin (van...@cloudera.com) >> > * Alex Man (alex@alexman.space) >> > * Jeff Zhang (zjf...@gmail.com) >> > * Saisai Shao (ss...@hortonworks.com) >> > * Kostas Sakellis (kos...@cloudera.com) >> > >> > = Affiliations = >> > >> > The initial set of committers includes people employed by Cloudera and >> > Hortonworks as well as one currently independent contributor. >> > >> > = Additional Interested Contributors = >> > >> > Those interested in getting involved with the project as we enter >> incubation >> > are encouraged to list themselves here. >> > >> > * Ismaël Mejía (ieme...@apache.org) >> > >> > = Sponsors = >> > >> > == Champion == >> > >> > Sean Busbey (bus...@apache.org) >> > >> > == Nominated Mentors == >> > >> > * Bikas Saha (bi...@apache.org) >> > * Brock Noland (br...@phdata.io) >> > * Luciano Resende (lrese...@apache.org) >> > >> > == Sponsoring Entity == >> > >> > We ask that the Incubator PMC sponsor this proposal. >> > >> > >> > --------------------------------------------------------------------- >> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> > For additional commands, e-mail: general-h...@incubator.apache.org >> > >> >> >> >> -- >> Marcelo >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org