[VOTE] Accept Coral into the Apache Incubator

Byung-Gon Chun Thu, 01 Feb 2018 06:08:12 -0800

Hi all,

I would like to start a VOTE to propose the Coral project as a podling into
the Apache Incubator.


The ASF voting rules are described at https://www.apache.org/foundation/
voting.html

A vote for accepting a new Apache Incubator podling is a majority vote for
which only Incubator PMC member votes are binding.

This vote will run for at least 72 hours. Please VOTE as follows.
[] +1 Accept Coral into the Apache Incubator
[] +0 Abstain
[] -1 Do not accept Coral into the Apache Incubator because ...

The proposal is listed below, but you can also access it on the wiki:
https://wiki.apache.org/incubator/CoralProposal

= CoralProposal =

== Abstract ==
Coral is a data processing system for flexible employment with
different execution scenarios for various deployment characteristics
on clusters.

== Proposal ==
Today, there is a wide variety of data processing systems with
different designs for better performance and datacenter efficiency.
They include processing data on specific resource environments and
running jobs with specific attributes. Although each system
successfully solves the problems it targets, most systems are designed
in the way that runtime behaviors are built tightly inside the system
core to hide the complexity of distributed computing. This makes it
hard for a single system to support different deployment
characteristics with different runtime behaviors without substantial
effort.

Coral is a data processing system that aims to flexibly control the
runtime behaviors of a job to adapt to varying deployment
characteristics. Moreover, it provides a means of extending the
system’s capabilities and incorporating the extensions to the flexible
job execution.

In order to be able to easily modify runtime behaviors to adapt to
varying deployment characteristics, Coral exposes runtime behaviors to
be flexibly configured and modified at both compile-time and runtime
through a set of high-level graph pass interfaces.

We hope to contribute to the big data processing community by enabling
more flexibility and extensibility in job executions. Furthermore, we
can benefit more together as a community when we work together as a
community to mature the system with more use cases and understanding
of diverse deployment characteristics. The Apache Software Foundation
is the perfect place to achieve these aspirations.

== Background ==
Many data processing systems have distinctive runtime behaviors
optimized and configured for specific deployment characteristics like
different resource environments and for handling special job
attributes.

For example, much research have been conducted to overcome the
challenge of running data processing jobs on cheap, unreliable
transient resources. Likewise, techniques for disaggregating different
types of resources, like memory, CPU and GPU, are being actively
developed to use datacenter resources more efficiently. Many
researchers are also working to run data processing jobs in even more
diverse environments, such as across distant datacenters. Similarly,
for special job attributes, many works take different approaches, such
as runtime optimization, to solve problems like data skew, and to
optimize systems for data processing jobs with small-scale input data.

Although each of the systems performs well with the jobs and in the
environments they target, they perform poorly with unconsidered cases,
and do not consider supporting multiple deployment characteristics on
a single system in their designs.

For an application writer to optimize an application to perform well
on a certain system engraved with its underlying behaviors, it
requires a deep understanding of the system itself, which is an
overhead that often requires a lot of time and effort. Moreover, for a
developer to modify such system behaviors, it requires modifications
of the system core, which requires an even deeper understanding of the
system itself.

With this background, Coral is designed to represent all of its jobs
as an Intermediate Representation (IR) DAG. In the Coral compiler,
user applications from various programming models (ex. Apache Beam)
are submitted, transformed to an IR DAG, and optimized/customized for
the deployment characteristics. In the IR DAG optimization phase, the
DAG is modified through a series of compiler “passes” which reshape or
annotate the DAG with an expression of the underlying runtime
behaviors. The IR DAG is then submitted as an execution plan for the
Coral runtime. The runtime includes the unmodified parts of data
processing in the backbone which is transparently integrated with
configurable components exposed for further extension.

== Rationale ==
Coral’s vision lies in providing means for flexibly supporting a wide
variety of job execution scenarios for users while facilitating system
developers to extend the execution framework with various
functionalities at the same time. The capabilities of the system can
be extended as it grows to meet a more variety of execution scenarios.
We require inputs from users and developers from diverse domains in
order to make it a more thriving and useful project. The Apache
Software Foundation provides the best tools and community to support
this vision.

== Initial Goals ==
Initial goals will be to move the existing codebase to Apache and
integrate with the Apache development process. We further plan to
develop our system to meet the needs for more execution scenarios for
a more variety of deployment characteristics.

== Current Status ==
Coral codebase is currently hosted in a repository at github.com. The
current version has been developed by system developers at Seoul
National University, Viva Republica, Samsung, and LG.

== Meritocracy ==
We plan to strongly support meritocracy. We will discuss the
requirements in an open forum, and those that continuously contribute
to Coral with the passion to strengthen the system will be invited as
committers. Contributors that enrich Coral by providing various use
cases, various implementations of the configurable components
including ideas for optimization techniques will be especially
welcome. Committers with a deep understanding of the system’s
technical aspects as a whole and its philosophy will definitely be
voted as the PMC. We will monitor community participation so that
privileges can be extended to those that contribute.

== Community ==
We hope to expand our contribution community by becoming an Apache
incubator project. The contributions will come from both users and
system developers interested in flexibility and extensibility of job
executions that Coral can support. We expect users to mainly
contribute to diversify the use cases and deployment characteristics,
and developers to  contribute to implement them.

== Alignment ==
Apache Spark is one of many popular data processing frameworks. The
system is designed towards optimizing jobs using RDDs in memory and
many other optimizations built tightly within the framework. In
contrast to Spark, Coral aims to provide more flexibility for job
execution in an easy manner.

Apache Tez enables developers to build complex task DAGs with control
over the control plane of job execution. In Coral, a high-level
programming layer (ex. Apache Beam) is automatically converted to a
basic IR DAG and can be converted to any IR DAG through a series of
easy user writable passes, that can both reshape and modify the
annotation (of execution properties) of the DAG. Moreover, Coral
leaves more parts of the job execution configurable, such as the
scheduler and the data plane. As opposed to providing a set of
properties for solid optimization, Coral’s configurable parts can be
easily extended and explored by implementing the pre-defined
interfaces. For example, an arbitrary intermediate data store can be
added.

Coral currently supports Apache Beam programs and we are working on
supporting Apache Spark programs as well. Coral also utilizes Apache
REEF for container management, which allows Coral to run in Apache
YARN and Apache Mesos clusters. If necessary, we plan to contribute to
and collaborate with these other Apache projects for the benefit of
all. We plan to extend such integrations with more Apache softwares.
Apache software foundation already hosts many major big-data systems,
and we expect to help further growth of the big-data community by
having Coral within the Apache foundation.

== Known Risks ==
=== Orphaned Products ===
The risk of the Coral project being orphaned is minimal. There is
already plenty of work that arduously support different deployment
characteristics, and we propose a general way to implement them with
flexible and extensible configuration knobs. The domain of data
processing is already of high interest, and this domain is expected to
evolve continuously with various other purposes, such as resource
disaggregation and using transient resources for better datacenter
resource utilization.

=== Inexperience with Open Source ===
The initial committers include PMC members and committers of other
Apache projects. They have experience with open source projects,
starting from their incubation to the top-level. They have been
involved in the open source development process, and are familiar with
releasing code under an open source license.

=== Homogeneous Developers ===
The initial set of committers is from a limited set of organizations,
but we expect to attract new contributors from diverse organizations
and will thus grow organically once approved for incubation. Our prior
experience with other open source projects will help various
contributors to actively participate in our project.

=== Reliance on Salaried Developers ===
Many developers are from Seoul National University. This is not applicable.

=== Relationships with Other Apache Products ===
Coral positions itself among multiple Apache products. It runs on
Apache REEF for container management. It also utilizes many useful
development tools including Apache Maven, Apache Log4J, and multiple
Apache Commons components. Coral supports the Apache Beam programming
model for user applications. We are currently working on supporting
the Apache Spark programming APIs as well.

=== An Excessive Fascination with the Apache Brand ===
We hope to make Coral a powerful system for data processing, meeting
various needs for different deployment characteristics, under a more
variety of environments. We see the limitations of simply putting code
on GitHub, and we believe the Apache community will help the growth of
Coral for the project to become a positively impactful and innovative
open source software. We believe Coral is a great fit for the Apache
Software Foundation due to the collaboration it aims to achieve from
the big data processing community.

== Documentation ==
The current documentation for Coral is at https://snuspl.github.io/coral/.

== Initial Source ==
The Coral codebase is currently hosted at https://github.com/snuspl/coral.

== External Dependencies ==
To the best of our knowledge, all Coral dependencies are distributed
under Apache compatible licenses. Upon acceptance to the incubator, we
would begin a thorough analysis of all transitive dependencies to
verify this fact and further introduce license checking into the build
and release process.

== Cryptography ==
Not applicable.

== Required Resources ==
=== Mailing Lists ===
We will operate two mailing lists as follows:
   * Coral PMC discussions: priv...@coral.incubator.apache.org
   * Coral developers: d...@coral.incubator.apache.org

=== Git Repositories ===
Upon incubation: https://github.com/apache/incubator-coral.
After the incubation, we would like to move the existing repo
https://github.com/snuspl/coral to the Apache infrastructure

=== Issue Tracking ===
Coral currently tracks its issues using the Github issue tracker:
https://github.com/snuspl/coral/issues. We plan to migrate to Apache
JIRA.

== Initial Committers ==
  * Byung-Gon Chun
  * Jeongyoon Eo
  * Geon-Woo Kim
  * Joo Yeon Kim
  * Gyewon Lee
  * Jung-Gil Lee
  * Sanha Lee
  * Wooyeon Lee
  * Yunseong Lee
  * JangHo Seo
  * Won Wook Song
  * Taegeon Um
  * Youngseok Yang

== Affiliations ==
  * SNU (Seoul National University)
    * Byung-Gon Chun
    * Jeongyoon Eo
    * Geon-Woo Kim
    * Gyewon Lee
    * Sanha Lee
    * Wooyeon Lee
    * Yunseong Lee
    * JangHo Seo
    * Won Wook Song
    * Taegeon Um
    * Youngseok Yang

  * LG
    * Jung-Gil Lee

  * Samsung
    * Joo Yeon Kim

  * Viva Republica
    * Geon-Woo Kim

== Sponsors ==
=== Champions ===
Byung-Gon Chun

=== Mentors ===
  * Hyunsik Choi
  * Byung-Gon Chun
  * Jean-Baptiste Onofré
  * Markus Weimer
  * Reynold Xin

=== Sponsoring Entity ===
The Apache Incubator


Thanks!
Byung-Gon Chun

[VOTE] Accept Coral into the Apache Incubator

Reply via email to