A missing backend element with a community around it, definitely a great project to have at Apache.
+1 (non-binding) Ismaël On Wed, May 31, 2017 at 3:29 PM, larry mccay <larry.mc...@gmail.com> wrote: > This will be a great addition. > > +1 > > On Wed, May 31, 2017 at 9:03 AM, Sean Busbey <bus...@apache.org> wrote: > >> Hi folks! >> >> I'm calling a vote to accept "Livy" into the Apache Incubator. >> >> The full proposal is available below, and is also available in the wiki: >> >> https://wiki.apache.org/incubator/LivyProposal >> >> For additional context, please see the discussion thread: >> >> https://s.apache.org/incubator-livy-proposal-thread >> >> Please cast your vote: >> >> [ ] +1, bring Livy into Incubator >> [ ] -1, do not bring Livy into Incubator, because... >> >> The vote will open at least for 72 hours and only votes from the Incubator >> PMC are binding. >> >> I start with my vote: >> +1 >> >> ---- >> >> = Abstract = >> >> Livy is web service that exposes a REST interface for managing long running >> Apache Spark contexts in your cluster. With Livy, new applications can be >> built on top of Apache Spark that require fine grained interaction with >> many >> Spark contexts. >> >> = Proposal = >> >> Livy is an open-source REST service for Apache Spark. Livy enables >> applications to submit Spark applications and retrieve results without a >> co-location requirement on the Spark cluster. >> >> We propose to contribute the Livy codebase and associated artifacts (e.g. >> documentation, web-site context etc) to the Apache Software Foundation. >> >> = Background = >> >> Apache Spark is a fast and general purpose distributed compute engine, with >> a versatile API. It enables processing of large quantities of static data >> distributed over a cluster of machines, as well as processing of continuous >> streams of data. It is the preferred distributed data processing engine for >> data engineering, stream processing and data science workloads. Each Spark >> application uses a construct called the SparkContext, which is the >> application’s connection or entry point to the Spark engine. Each Spark >> application will have its own SparkContext. >> >> Livy enables clients to interact with one or more Spark sessions through >> the >> Livy Server, which acts as a proxy layer. Livy Clients have fine grained >> control over the lifecycle of the Spark sessions, as well as the ability to >> submit jobs and retrieve results, all over HTTP. Clients have two modes of >> interaction: RPC Client API, available in Java and Python, which allows >> results to be retrieved as Java or Python objects. The serialization and >> deserialization of the results is handled by the Livy framework. HTTP based >> API that allows submission of code snippets, and retrieval of the results >> in >> different formats. >> >> Multi-tenant resource allocation and security: Livy enables multiple >> independent Spark sessions to be managed simultaneously. Multiple clients >> can also interact simultaneously with the same Spark session and share the >> resources of that Spark session. Livy can also enforce secure, >> authenticated >> communication between the clients and their respective Spark sessions. >> >> More information on Livy can be found at the existing open source website: >> http://livy.io/ >> >> = Rationale = >> >> Users want to use Spark’s powerful processing engine and API as the data >> processing backend for interactive applications. However, the job >> submission >> and application interaction mechanisms built into Apache Spark are >> insufficient and cumbersome for multi-user interactive applications. >> >> The primary mechanism for applications to submit Spark jobs is via >> spark-submit >> (http://spark.apache.org/docs/latest/submitting-applications.html), which >> is >> available as a command line tool as well as a programmatic API. However, >> spark-submit has the following limitations that make it difficult to build >> interactive applications: It is slow: each invocation of spark-submit >> involves a setup phase where cluster resources are acquired, new processes >> are forked, etc. This setup phase runs for many seconds, or even minutes, >> and hence is too slow for interactive applications. It is cumbersome and >> lacks flexibility: application code and dependencies have to be >> pre-compiled >> and submitted as jars, and can not be submitted interactively. >> >> Apache Spark comes with an ODBC/JDBC server, which can be used to submit >> SQL >> queries to Spark. However, this solution is limited to SQL and does not >> allow the client to leverage the rest of the Spark API, such as RDDs, MLlib >> and Streaming. >> >> A third way of using Spark is via its command-line shell, which allows the >> interactive submission of snippets of Spark code. However, the shell >> entails >> running Spark code on the client machine and hence is not a viable >> mechanism >> for remote clients to submit Spark jobs. >> >> Livy solves the limitations of the above three mechanisms, and provides the >> full Spark API as a multi-tenant service to remote clients. >> >> Since the open source release of Livy in late 2015, we have seen tremendous >> interest among a diverse set of application developers and ISVs that want >> to >> build applications with Apache Spark. To make Livy a robust and flexible >> solution that will enable a broad and growing set of applications, it is >> important to grow a large and varied community of contributors. >> >> = Initial Goals = >> >> * Move existing codebase, website, documentation and mailing lists to >> Apache-hosted infrastructure >> * Work with the infrastructure team to implement and approve our code >> review, build, and testing workflows in the context of the ASF >> * Incremental development and releases per Apache guidelines >> >> = Current Status = >> >> The Livy project began at Cloudera, as a part of the Hue project. Cloudera >> soon realized the broad applicability of Livy, and separated it out into an >> independent project in Nov 2015. >> >> == Releases == >> >> Livy has undergone two public releases, tagged here: >> >> * https://github.com/cloudera/livy/releases/tag/v0.2.0 >> * https://github.com/cloudera/livy/releases/tag/v0.3.0 >> >> Tarballs and zip files were created for each release and hosted on github. >> Upon joining the incubator, we will adopt a more typical ASF release >> process. >> >> == Source == >> >> Livy’s source is currently hosted on Github at: >> https://github.com/cloudera/livy >> >> This repository will be transitioned to Apache’s git hosting during >> incubation. >> >> == Code review == >> >> Livy’s code reviews are currently public and hosted on github as pull >> request reviews at: https://github.com/cloudera/livy/pulls >> The Livy developer community so far is happy with github pull request >> reviews and hopes to continue this after being admitted to the ASF. >> >> == Issue Tracking == >> >> Livy’s bug and feature tracking is hosted on JIRA at: >> https://issues.cloudera.org/projects/LIVY/summary >> This JIRA instance contains bugs and development discussion dating back 1 >> year and will provide an initial seed for the ASF JIRA >> >> == Community Discussion == >> >> Livy has several public discussion forums: >> >> * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-dev >> * https://groups.google.com/a/cloudera.org/forum/#!forum/livy-user >> >> == Development Practices == >> >> The Livy project follows a review before commit philosophy. Every commit >> automatically runs through the unit tests and generates coverage reports >> presented as a pull request comment. Our experience with this process leads >> us to believe that it helps ease new contributors into the project. They >> get >> feedback quickly on common mistakes, lowering the burden on reviewers. >> Those >> same reviewers get to lead by example, showing the new contributors that we >> value feedback within our community even when changes are done by more >> experienced folks. >> >> == Meritocracy == >> >> We believe strongly in meritocracy when electing committers and PMC >> members. >> In the past few months, the project has added two new committers from two >> different organisations, in recognition of their significant contributions >> to the project. We will encourage contributions and participation of all >> types, and ensure that contributors are appropriately recognized. >> >> == Community == >> >> Though Livy is relatively new as a standalone open source project, it has >> already seen promising growth in its community across several >> organizations: >> Cloudera is the original development sponsor for Livy >> Microsoft pushed the development of the interpreter fixing high >> availability >> issues and adding additional features. >> Hortonworks has contributed the security features to Livy allowing kerberos >> and impersonation to work with Spark >> IBM is starting to make contributions to the Livy project >> A number of other patches contributed by community members >> >> Livy currently relies on Google Groups for mailing lists. These lists have >> been active since the end of 2015/start of 2016. Currently, Livy’s user >> mailing list has 173 subscribers and has hosted a total of 227 topic >> threads. Livy’s developer list has 49 subscribers and has hosted 79 topic >> threads. >> >> == Core Developers == >> >> The early contributions to Livy were made by Cloudera engineers. In 2016, >> engineers from Microsoft and Hortonworks joined the core developer >> community. >> >> == Alignment == >> >> Livy is built upon Apache Spark, and other Apache projects like Apache >> Hadoop YARN. It’s used as a building block by Apache Zeppelin. These >> community connections combined with our focus on development practices that >> emphasize community engagement with a path to meritocratic recognition >> naturally align us with the ASF. >> >> = Known Risks = >> >> == Orphaned Products == >> >> The risk of Livy being abandoned is low because it is supported by three >> major big-data software vendors. Moreover, Livy is already used to power >> multiple releases of services and products used in production. >> >> == Inexperience with Open Source == >> >> Several of the initial committers are experienced open source developers, >> several being committers and/or PMC members on other ASF projects (Spark, >> YARN). >> >> == Homogenous Developers == >> >> The project already has a diverse developer base. It has contributions from >> 3 major organisations (Cloudera, Microsoft and Hortonworks), and is used in >> diverse applications, in diverse settings (On-Prem and Cloud). >> >> == Reliance on salaried Developers == >> >> The contributions to the Livy project to date have been made by salaried >> engineers from Cloudera, Microsoft and Hortonworks. One of the individuals >> on the initial committer list has since left Microsoft and is currently >> unaffiliated. The remaining contributors are from Cloudera and Hortonworks. >> Since there are at least two major organizations involved, the risk of >> reliance on a single group of salaried developers is mitigated. The Livy >> user base is diverse, with users from across the globe, including users >> from >> academic settings. We aim to further diversify the Livy user and >> contributor >> base. >> >> == Relationships with other Apache projects == >> >> Livy is closely tied to the Apache Spark project and currently addresses >> the >> scenarios for a REST based batch and interactive gateway for Spark jobs on >> YARN. Given the growing number of integrations with Livy, keeping it >> outside >> of Apache Spark aligns with the desire of the Apache Spark community to >> reduce the number of external dependencies in the Spark project. >> Specifically, the Apache Spark community has previously expressed a desire >> to keep job servers independent from the project.<<FootNote(See, for >> example, discussion of the Ooyala Spark Job Server in SPARK-818)>> >> Furthermore, while Livy common usage is closely tied to Spark deployments >> right now, its core building blocks can be reused elsewhere. Livy’s Remote >> REPL could be used as a library for interactive scenarios in non-Spark >> projects. In the future, integrations with cluster managers like Apache >> Mesos and others could also be added. >> >> The features provided by Livy have already been integrated with existing >> projects like Jupyter and Apache Zeppelin for their interactive Spark use >> cases. This validates the need for a project like Livy and provides an >> active downstream user base that the Livy community can interact with to >> seed future interest in the project. >> >> Livy serves a similar purpose to Apache Toree (incubating) but differs in >> making session management, security and impersonation a focal design point. >> >> == An Excessive Fascination with the Apache Brand == >> >> The primary motivation for submitting Livy to the ASF is to grow a diverse >> and strong community. We wish to encourage diverse organisations, including >> ISVs, to adopt Livy and contribute to Livy without any concerns about >> ownership or licensing. >> >> = Documentation = >> >> Documentation can be found on the Livy website http://livy.io/ >> >> The Livy web site is version controlled on the ‘gh-pages’ branch of the >> above repository. >> Additional documentation is provided on the github wiki: >> https://github.com/cloudera/livy/wiki >> APis are documented within the source code as JavaDoc style documentation >> comments. >> >> = Initial Source = >> >> The initial source code for Livy is hosted at >> https://github.com/cloudera/livy >> >> = Source and Intellectual Property submission plan = >> >> The Livy codebase and web site is currently hosted on GitHub and will be >> transitioned to the ASF repositories during incubation. Livy is already >> licensed under the Apache 2.0 license. Cloudera has collected ICLAs and >> CCLAs from all committers. There are, however, some contributions recently >> from authors that have not signed the CCLA and ICLA. If necessary for a >> successful SGA, we’ll seek the necessary documentation or replace the >> contributions. >> >> The “Livy” name is not a registered trademark. We will need to do a >> trademark search and make sure it is available for the Apache Foundation >> prior to graduation. >> >> Cloudera currently owns the domain name: http://livy.io/. Once all the >> documentation has moved over to ASF infrastructure, the main landing page >> will become livy.incubator.apache.org and the old domain will just act as >> a >> redirect. >> >> = External Dependencies = >> >> The list below covers the non-Apache dependencies of the project and their >> licenses. >> >> * Jetty: Apache 2.0 >> * Dropwizard Metrics: Apache 2.0 >> * FasterXML Jackson: Apache 2.0 >> * Netty: Apache 2.0 >> * Scala: BSD >> * Py4J: BSD >> * Scalatra: BSD >> >> Build/test-only dependencies: >> >> * Mockito: MIT >> * JUnit: Eclipse >> >> = Required Resources = >> >> == Mailing Lists == >> >> * priv...@livy.incubator.apache.org (PPMC) >> * d...@livy.incubator.apache.org (dev mailing list) >> * u...@livy.incubator.apache.org (User questions) >> * comm...@livy.incubator.apache.org (subscribers shouldn’t be able to >> post) >> * iss...@livy.incubator.apache.org (subscribers shouldn’t be able to >> post) >> >> == Git Repository == >> >> git://git.apache.org/incubator-livy >> >> == Issue Tracking == >> >> We would like to import our current JIRA project into the ASF JIRA, such >> that our historical commit message and code comments continue to reference >> the appropriate bug numbers. >> >> = Initial Committers = >> >> * Marcelo Vanzin (van...@cloudera.com) >> * Alex Man (alex@alexman.space) >> * Jeff Zhang (zjf...@gmail.com) >> * Saisai Shao (ss...@hortonworks.com) >> * Kostas Sakellis (kos...@cloudera.com) >> >> = Affiliations = >> >> The initial set of committers includes people employed by Cloudera and >> Hortonworks as well as one currently independent contributor. >> >> = Additional Interested Contributors = >> >> Those interested in getting involved with the project as we enter >> incubation >> are encouraged to list themselves here. >> >> * Ismaël Mejía (ieme...@apache.org) >> >> = Sponsors = >> >> == Champion == >> >> Sean Busbey (bus...@apache.org) >> >> == Nominated Mentors == >> >> * Bikas Saha (bi...@apache.org) >> * Brock Noland (br...@phdata.io) >> * Luciano Resende (lrese...@apache.org) >> >> == Sponsoring Entity == >> >> We ask that the Incubator PMC sponsor this proposal. >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org