Abstract
Livy is a web service that exposes a REST interface for managing long
running Apache Spark contexts in your cluster. With Livy, new applications
can be built on top of Apache Spark that require fine grained interaction
with many Spark contexts [1].
While this project has been well regarded and used in many contexts as the
defacto standard API to Spark environments, it has been incubating for over
5 years without graduation to a TLP and it has become difficult to
impossible for fixes and improvements to be contributed as the current
community seems to have moved on.
There has been discussion regarding retirement of this podling where there
seems to be some increasing interest in joining and reviving the community
[2].
The intent of this proposal is to avoid retiring a well regarded, actively
used and rather mature project by reviving the PPMC and community with new
folks that have a vested interest in the project and health of the
community.
Proposal
We propose to revive the PPMC with a set of contributors and maintainers as
mentors, PPMC members and committers.
The retirement DISCUSS thread [2] has shown a growing interest in providing
new committers and bringing improvements and fixes from organization’s
internally maintained forks back to a revived community.
General Approach to Revival:
-
Add new Mentors
-
Larry McCay, [email protected] , Cloudera
-
Sunil Govindan, [email protected], Cloudera
-
Imran Rashid - [email protected], Cloudera
-
Add new Committers/PPMC
-
Larry McCay, [email protected], Cloudera
-
Vinod Kumar Vavilapalli, [email protected], Cloudera
-
Gyorgy Gal, ggal ,[email protected], Cloudera
-
Wing Yew Poon, [email protected], Cloudera
-
Xilang Yan, [email protected], Shopee
-
Jianzhen Wu, [email protected], Shopee
-
Nagella Jagadeewara Rao, [email protected], Visa
-
Pralab Kumar, [email protected], Visa
-
Prasad Shrikant, [email protected], Visa
-
Brahma Reddy Battula, [email protected], Visa
-
Invite existing PPMC members to opt-in or otherwise go emeritus
-
Jean-Baptiste Onofré, [email protected], Talend (opted-in via
Retirement DISCUSS thread [2])
-
Invite existing Committers/PPMC members to opt-in or otherwise go
emeritus
-
Establish Roadmap via follow up DISCUSS thread
-
Known Improvements from Forks which will need proposals and
discussion:
-
Adding HA for Livy
-
Updating security capabilities (eg. kerberos for jdbc, fixing bugs
in encryption)
-
Expanding the support for kubernetes
-
Responding to CVEs in dependencies (eg. log4j, thrift)
-
Livy rest cluster - IS THIS SAME AS HA for Livy ABOVE?
-
Support multi Spark versions
-
Implemented a metrics system for Livy
-
Support customize batch/interactive session lifecycle event
handler, default log event with log4j, very helpful for
trouble shooting
-
Optimize log to track which session id the log message came from,
also very helpful for trouble shooting
-
Support customize Spark config optimization rules, can be used to
optimize config for users’ job
-
A set of command line tool which can be used to replace Spark’s
spark-submit, pyspark, spark-sql but actually submit
application in Livy
-
We are planning to implement a JDBC state store, and allow multi
Livy Thrift sessions to share one backend Spark application
in the next few
months.
-
These items and others that are brought to community may need
consolidation or multiple configurable options and will need to
be part of
the discussion
-
One-pager Livy Improvement Proposals (LIP) may make sense to drive
these discussions and convergence
-
Feature Branch Strategy for large changes
-
Large features are hard to review we will need to define a
strategy
-
Determine the Improvements to be delivered across first 3 Releases
with Target Release Dates
-
Ensure CVE and Dependency management hygiene is in place
The above approach will usher the community back to an active status with a
Roadmap of 3 or more release plans and security hygiene in place.
Development Practices
The Livy project follows a review before commit philosophy. Every commit
automatically runs through the unit tests and generates coverage reports
presented as a pull request comment. Our experience with this process leads
us to believe that it helps ease new contributors into the project. They
get feedback quickly on common mistakes, lowering the burden on reviewers.
Those same reviewers get to lead by example, showing the new contributors
that we value feedback within our community even when changes are done by
more experienced folks. Taken from the original Apache Livy Proposal [1],
this should continue to be true. As mentioned, Livy is a mature project and
as such RTC is the most appropriate for continued quality and awareness.
1.
Original Apache Livy Proposal
https://cwiki.apache.org/confluence/display/incubator/LivyProposal
2.
Retirement DISCUSS thread
https://lists.apache.org/thread/gcstsrhbp91c5mm55htqn1l3djv8m7o0