Hi, 72 hours has passed and the vote for accepting Coral into the Apache Incubator has passed with:
9 binding "+1" votes, 1 non-binding "+1" votes, and no "-1” votes. Binding votes: Kevin A. McGrail Davor Bonaci Dave Fisher Hyunsik Choi Leif Hedstrom Jean-Baptiste Onofré Romain Manni-Bucau Mark Struberg Byung-Gon Chun Non-binding votes: Clebert Suconic Thanks to everyone who voted. On Thu, Feb 1, 2018 at 11:07 PM, Byung-Gon Chun <bgc...@gmail.com> wrote: > Hi all, > > I would like to start a VOTE to propose the Coral project as a podling > into the Apache Incubator. > > The ASF voting rules are described at https://www.apache.org/foundation/ > voting.html > > A vote for accepting a new Apache Incubator podling is a majority vote for > which only Incubator PMC member votes are binding. > > This vote will run for at least 72 hours. Please VOTE as follows. > [] +1 Accept Coral into the Apache Incubator > [] +0 Abstain > [] -1 Do not accept Coral into the Apache Incubator because ... > > The proposal is listed below, but you can also access it on the wiki: > https://wiki.apache.org/incubator/CoralProposal > > = CoralProposal = > > == Abstract == > Coral is a data processing system for flexible employment with different > execution scenarios for various deployment characteristics on clusters. > > == Proposal == > Today, there is a wide variety of data processing systems with different > designs for better performance and datacenter efficiency. They include > processing data on specific resource environments and running jobs with > specific attributes. Although each system successfully solves the problems it > targets, most systems are designed in the way that runtime behaviors are > built tightly inside the system core to hide the complexity of distributed > computing. This makes it hard for a single system to support different > deployment characteristics with different runtime behaviors without > substantial effort. > > Coral is a data processing system that aims to flexibly control the runtime > behaviors of a job to adapt to varying deployment characteristics. Moreover, > it provides a means of extending the system’s capabilities and incorporating > the extensions to the flexible job execution. > > In order to be able to easily modify runtime behaviors to adapt to varying > deployment characteristics, Coral exposes runtime behaviors to be flexibly > configured and modified at both compile-time and runtime through a set of > high-level graph pass interfaces. > > We hope to contribute to the big data processing community by enabling more > flexibility and extensibility in job executions. Furthermore, we can benefit > more together as a community when we work together as a community to mature > the system with more use cases and understanding of diverse deployment > characteristics. The Apache Software Foundation is the perfect place to > achieve these aspirations. > > == Background == > Many data processing systems have distinctive runtime behaviors optimized and > configured for specific deployment characteristics like different resource > environments and for handling special job attributes. > > For example, much research have been conducted to overcome the challenge of > running data processing jobs on cheap, unreliable transient resources. > Likewise, techniques for disaggregating different types of resources, like > memory, CPU and GPU, are being actively developed to use datacenter resources > more efficiently. Many researchers are also working to run data processing > jobs in even more diverse environments, such as across distant datacenters. > Similarly, for special job attributes, many works take different approaches, > such as runtime optimization, to solve problems like data skew, and to > optimize systems for data processing jobs with small-scale input data. > > Although each of the systems performs well with the jobs and in the > environments they target, they perform poorly with unconsidered cases, and do > not consider supporting multiple deployment characteristics on a single > system in their designs. > > For an application writer to optimize an application to perform well on a > certain system engraved with its underlying behaviors, it requires a deep > understanding of the system itself, which is an overhead that often requires > a lot of time and effort. Moreover, for a developer to modify such system > behaviors, it requires modifications of the system core, which requires an > even deeper understanding of the system itself. > > With this background, Coral is designed to represent all of its jobs as an > Intermediate Representation (IR) DAG. In the Coral compiler, user > applications from various programming models (ex. Apache Beam) are submitted, > transformed to an IR DAG, and optimized/customized for the deployment > characteristics. In the IR DAG optimization phase, the DAG is modified > through a series of compiler “passes” which reshape or annotate the DAG with > an expression of the underlying runtime behaviors. The IR DAG is then > submitted as an execution plan for the Coral runtime. The runtime includes > the unmodified parts of data processing in the backbone which is > transparently integrated with configurable components exposed for further > extension. > > == Rationale == > Coral’s vision lies in providing means for flexibly supporting a wide variety > of job execution scenarios for users while facilitating system developers to > extend the execution framework with various functionalities at the same time. > The capabilities of the system can be extended as it grows to meet a more > variety of execution scenarios. We require inputs from users and developers > from diverse domains in order to make it a more thriving and useful project. > The Apache Software Foundation provides the best tools and community to > support this vision. > > == Initial Goals == > Initial goals will be to move the existing codebase to Apache and integrate > with the Apache development process. We further plan to develop our system to > meet the needs for more execution scenarios for a more variety of deployment > characteristics. > > == Current Status == > Coral codebase is currently hosted in a repository at github.com. The current > version has been developed by system developers at Seoul National University, > Viva Republica, Samsung, and LG. > > == Meritocracy == > We plan to strongly support meritocracy. We will discuss the requirements in > an open forum, and those that continuously contribute to Coral with the > passion to strengthen the system will be invited as committers. Contributors > that enrich Coral by providing various use cases, various implementations of > the configurable components including ideas for optimization techniques will > be especially welcome. Committers with a deep understanding of the system’s > technical aspects as a whole and its philosophy will definitely be voted as > the PMC. We will monitor community participation so that privileges can be > extended to those that contribute. > > == Community == > We hope to expand our contribution community by becoming an Apache incubator > project. The contributions will come from both users and system developers > interested in flexibility and extensibility of job executions that Coral can > support. We expect users to mainly contribute to diversify the use cases and > deployment characteristics, and developers to contribute to implement them. > > == Alignment == > Apache Spark is one of many popular data processing frameworks. The system is > designed towards optimizing jobs using RDDs in memory and many other > optimizations built tightly within the framework. In contrast to Spark, Coral > aims to provide more flexibility for job execution in an easy manner. > > Apache Tez enables developers to build complex task DAGs with control over > the control plane of job execution. In Coral, a high-level programming layer > (ex. Apache Beam) is automatically converted to a basic IR DAG and can be > converted to any IR DAG through a series of easy user writable passes, that > can both reshape and modify the annotation (of execution properties) of the > DAG. Moreover, Coral leaves more parts of the job execution configurable, > such as the scheduler and the data plane. As opposed to providing a set of > properties for solid optimization, Coral’s configurable parts can be easily > extended and explored by implementing the pre-defined interfaces. For > example, an arbitrary intermediate data store can be added. > > Coral currently supports Apache Beam programs and we are working on > supporting Apache Spark programs as well. Coral also utilizes Apache REEF for > container management, which allows Coral to run in Apache YARN and Apache > Mesos clusters. If necessary, we plan to contribute to and collaborate with > these other Apache projects for the benefit of all. We plan to extend such > integrations with more Apache softwares. Apache software foundation already > hosts many major big-data systems, and we expect to help further growth of > the big-data community by having Coral within the Apache foundation. > > == Known Risks == > === Orphaned Products === > The risk of the Coral project being orphaned is minimal. There is already > plenty of work that arduously support different deployment characteristics, > and we propose a general way to implement them with flexible and extensible > configuration knobs. The domain of data processing is already of high > interest, and this domain is expected to evolve continuously with various > other purposes, such as resource disaggregation and using transient resources > for better datacenter resource utilization. > > === Inexperience with Open Source === > The initial committers include PMC members and committers of other Apache > projects. They have experience with open source projects, starting from their > incubation to the top-level. They have been involved in the open source > development process, and are familiar with releasing code under an open > source license. > > === Homogeneous Developers === > The initial set of committers is from a limited set of organizations, but we > expect to attract new contributors from diverse organizations and will thus > grow organically once approved for incubation. Our prior experience with > other open source projects will help various contributors to actively > participate in our project. > > === Reliance on Salaried Developers === > Many developers are from Seoul National University. This is not applicable. > > === Relationships with Other Apache Products === > Coral positions itself among multiple Apache products. It runs on Apache REEF > for container management. It also utilizes many useful development tools > including Apache Maven, Apache Log4J, and multiple Apache Commons components. > Coral supports the Apache Beam programming model for user applications. We > are currently working on supporting the Apache Spark programming APIs as well. > > === An Excessive Fascination with the Apache Brand === > We hope to make Coral a powerful system for data processing, meeting various > needs for different deployment characteristics, under a more variety of > environments. We see the limitations of simply putting code on GitHub, and we > believe the Apache community will help the growth of Coral for the project to > become a positively impactful and innovative open source software. We believe > Coral is a great fit for the Apache Software Foundation due to the > collaboration it aims to achieve from the big data processing community. > > == Documentation == > The current documentation for Coral is at https://snuspl.github.io/coral/. > > == Initial Source == > The Coral codebase is currently hosted at https://github.com/snuspl/coral. > > == External Dependencies == > To the best of our knowledge, all Coral dependencies are distributed under > Apache compatible licenses. Upon acceptance to the incubator, we would begin > a thorough analysis of all transitive dependencies to verify this fact and > further introduce license checking into the build and release process. > > == Cryptography == > Not applicable. > > == Required Resources == > === Mailing Lists === > We will operate two mailing lists as follows: > * Coral PMC discussions: priv...@coral.incubator.apache.org > * Coral developers: d...@coral.incubator.apache.org > > === Git Repositories === > Upon incubation: https://github.com/apache/incubator-coral. > After the incubation, we would like to move the existing repo > https://github.com/snuspl/coral to the Apache infrastructure > > === Issue Tracking === > Coral currently tracks its issues using the Github issue tracker: > https://github.com/snuspl/coral/issues. We plan to migrate to Apache JIRA. > > == Initial Committers == > * Byung-Gon Chun > * Jeongyoon Eo > * Geon-Woo Kim > * Joo Yeon Kim > * Gyewon Lee > * Jung-Gil Lee > * Sanha Lee > * Wooyeon Lee > * Yunseong Lee > * JangHo Seo > * Won Wook Song > * Taegeon Um > * Youngseok Yang > > == Affiliations == > * SNU (Seoul National University) > * Byung-Gon Chun > * Jeongyoon Eo > * Geon-Woo Kim > * Gyewon Lee > * Sanha Lee > * Wooyeon Lee > * Yunseong Lee > * JangHo Seo > * Won Wook Song > * Taegeon Um > * Youngseok Yang > > * LG > * Jung-Gil Lee > > * Samsung > * Joo Yeon Kim > > * Viva Republica > * Geon-Woo Kim > > == Sponsors == > === Champions === > Byung-Gon Chun > > === Mentors === > * Hyunsik Choi > * Byung-Gon Chun > * Jean-Baptiste Onofré > * Markus Weimer > * Reynold Xin > > === Sponsoring Entity === > The Apache Incubator > > > Thanks! > Byung-Gon Chun > -- Byung-Gon Chun