Thanks, Davor! I will add you to the mentor list of Coral. On Fri, Feb 2, 2018 at 4:23 AM, Davor Bonaci <da...@apache.org> wrote:
> +1 (binding) > > Also, happy to help, mentor, or be a connection with the Beam PMC, as > appropriate. > > On Thu, Feb 1, 2018 at 9:54 AM, Kevin A. McGrail <kmcgr...@apache.org> > wrote: > >> +1 Binding >> >> >> On 2/1/2018 9:07 AM, Byung-Gon Chun wrote: >> >>> Hi all, >>> >>> I would like to start a VOTE to propose the Coral project as a podling >>> into >>> the Apache Incubator. >>> >>> The ASF voting rules are described at https://www.apache.org/foundation/ >>> voting.html >>> >>> A vote for accepting a new Apache Incubator podling is a majority vote >>> for >>> which only Incubator PMC member votes are binding. >>> >>> This vote will run for at least 72 hours. Please VOTE as follows. >>> [] +1 Accept Coral into the Apache Incubator >>> [] +0 Abstain >>> [] -1 Do not accept Coral into the Apache Incubator because ... >>> >>> The proposal is listed below, but you can also access it on the wiki: >>> https://wiki.apache.org/incubator/CoralProposal >>> >>> = CoralProposal = >>> >>> == Abstract == >>> Coral is a data processing system for flexible employment with >>> different execution scenarios for various deployment characteristics >>> on clusters. >>> >>> == Proposal == >>> Today, there is a wide variety of data processing systems with >>> different designs for better performance and datacenter efficiency. >>> They include processing data on specific resource environments and >>> running jobs with specific attributes. Although each system >>> successfully solves the problems it targets, most systems are designed >>> in the way that runtime behaviors are built tightly inside the system >>> core to hide the complexity of distributed computing. This makes it >>> hard for a single system to support different deployment >>> characteristics with different runtime behaviors without substantial >>> effort. >>> >>> Coral is a data processing system that aims to flexibly control the >>> runtime behaviors of a job to adapt to varying deployment >>> characteristics. Moreover, it provides a means of extending the >>> system’s capabilities and incorporating the extensions to the flexible >>> job execution. >>> >>> In order to be able to easily modify runtime behaviors to adapt to >>> varying deployment characteristics, Coral exposes runtime behaviors to >>> be flexibly configured and modified at both compile-time and runtime >>> through a set of high-level graph pass interfaces. >>> >>> We hope to contribute to the big data processing community by enabling >>> more flexibility and extensibility in job executions. Furthermore, we >>> can benefit more together as a community when we work together as a >>> community to mature the system with more use cases and understanding >>> of diverse deployment characteristics. The Apache Software Foundation >>> is the perfect place to achieve these aspirations. >>> >>> == Background == >>> Many data processing systems have distinctive runtime behaviors >>> optimized and configured for specific deployment characteristics like >>> different resource environments and for handling special job >>> attributes. >>> >>> For example, much research have been conducted to overcome the >>> challenge of running data processing jobs on cheap, unreliable >>> transient resources. Likewise, techniques for disaggregating different >>> types of resources, like memory, CPU and GPU, are being actively >>> developed to use datacenter resources more efficiently. Many >>> researchers are also working to run data processing jobs in even more >>> diverse environments, such as across distant datacenters. Similarly, >>> for special job attributes, many works take different approaches, such >>> as runtime optimization, to solve problems like data skew, and to >>> optimize systems for data processing jobs with small-scale input data. >>> >>> Although each of the systems performs well with the jobs and in the >>> environments they target, they perform poorly with unconsidered cases, >>> and do not consider supporting multiple deployment characteristics on >>> a single system in their designs. >>> >>> For an application writer to optimize an application to perform well >>> on a certain system engraved with its underlying behaviors, it >>> requires a deep understanding of the system itself, which is an >>> overhead that often requires a lot of time and effort. Moreover, for a >>> developer to modify such system behaviors, it requires modifications >>> of the system core, which requires an even deeper understanding of the >>> system itself. >>> >>> With this background, Coral is designed to represent all of its jobs >>> as an Intermediate Representation (IR) DAG. In the Coral compiler, >>> user applications from various programming models (ex. Apache Beam) >>> are submitted, transformed to an IR DAG, and optimized/customized for >>> the deployment characteristics. In the IR DAG optimization phase, the >>> DAG is modified through a series of compiler “passes” which reshape or >>> annotate the DAG with an expression of the underlying runtime >>> behaviors. The IR DAG is then submitted as an execution plan for the >>> Coral runtime. The runtime includes the unmodified parts of data >>> processing in the backbone which is transparently integrated with >>> configurable components exposed for further extension. >>> >>> == Rationale == >>> Coral’s vision lies in providing means for flexibly supporting a wide >>> variety of job execution scenarios for users while facilitating system >>> developers to extend the execution framework with various >>> functionalities at the same time. The capabilities of the system can >>> be extended as it grows to meet a more variety of execution scenarios. >>> We require inputs from users and developers from diverse domains in >>> order to make it a more thriving and useful project. The Apache >>> Software Foundation provides the best tools and community to support >>> this vision. >>> >>> == Initial Goals == >>> Initial goals will be to move the existing codebase to Apache and >>> integrate with the Apache development process. We further plan to >>> develop our system to meet the needs for more execution scenarios for >>> a more variety of deployment characteristics. >>> >>> == Current Status == >>> Coral codebase is currently hosted in a repository at github.com. The >>> current version has been developed by system developers at Seoul >>> National University, Viva Republica, Samsung, and LG. >>> >>> == Meritocracy == >>> We plan to strongly support meritocracy. We will discuss the >>> requirements in an open forum, and those that continuously contribute >>> to Coral with the passion to strengthen the system will be invited as >>> committers. Contributors that enrich Coral by providing various use >>> cases, various implementations of the configurable components >>> including ideas for optimization techniques will be especially >>> welcome. Committers with a deep understanding of the system’s >>> technical aspects as a whole and its philosophy will definitely be >>> voted as the PMC. We will monitor community participation so that >>> privileges can be extended to those that contribute. >>> >>> == Community == >>> We hope to expand our contribution community by becoming an Apache >>> incubator project. The contributions will come from both users and >>> system developers interested in flexibility and extensibility of job >>> executions that Coral can support. We expect users to mainly >>> contribute to diversify the use cases and deployment characteristics, >>> and developers to contribute to implement them. >>> >>> == Alignment == >>> Apache Spark is one of many popular data processing frameworks. The >>> system is designed towards optimizing jobs using RDDs in memory and >>> many other optimizations built tightly within the framework. In >>> contrast to Spark, Coral aims to provide more flexibility for job >>> execution in an easy manner. >>> >>> Apache Tez enables developers to build complex task DAGs with control >>> over the control plane of job execution. In Coral, a high-level >>> programming layer (ex. Apache Beam) is automatically converted to a >>> basic IR DAG and can be converted to any IR DAG through a series of >>> easy user writable passes, that can both reshape and modify the >>> annotation (of execution properties) of the DAG. Moreover, Coral >>> leaves more parts of the job execution configurable, such as the >>> scheduler and the data plane. As opposed to providing a set of >>> properties for solid optimization, Coral’s configurable parts can be >>> easily extended and explored by implementing the pre-defined >>> interfaces. For example, an arbitrary intermediate data store can be >>> added. >>> >>> Coral currently supports Apache Beam programs and we are working on >>> supporting Apache Spark programs as well. Coral also utilizes Apache >>> REEF for container management, which allows Coral to run in Apache >>> YARN and Apache Mesos clusters. If necessary, we plan to contribute to >>> and collaborate with these other Apache projects for the benefit of >>> all. We plan to extend such integrations with more Apache softwares. >>> Apache software foundation already hosts many major big-data systems, >>> and we expect to help further growth of the big-data community by >>> having Coral within the Apache foundation. >>> >>> == Known Risks == >>> === Orphaned Products === >>> The risk of the Coral project being orphaned is minimal. There is >>> already plenty of work that arduously support different deployment >>> characteristics, and we propose a general way to implement them with >>> flexible and extensible configuration knobs. The domain of data >>> processing is already of high interest, and this domain is expected to >>> evolve continuously with various other purposes, such as resource >>> disaggregation and using transient resources for better datacenter >>> resource utilization. >>> >>> === Inexperience with Open Source === >>> The initial committers include PMC members and committers of other >>> Apache projects. They have experience with open source projects, >>> starting from their incubation to the top-level. They have been >>> involved in the open source development process, and are familiar with >>> releasing code under an open source license. >>> >>> === Homogeneous Developers === >>> The initial set of committers is from a limited set of organizations, >>> but we expect to attract new contributors from diverse organizations >>> and will thus grow organically once approved for incubation. Our prior >>> experience with other open source projects will help various >>> contributors to actively participate in our project. >>> >>> === Reliance on Salaried Developers === >>> Many developers are from Seoul National University. This is not >>> applicable. >>> >>> === Relationships with Other Apache Products === >>> Coral positions itself among multiple Apache products. It runs on >>> Apache REEF for container management. It also utilizes many useful >>> development tools including Apache Maven, Apache Log4J, and multiple >>> Apache Commons components. Coral supports the Apache Beam programming >>> model for user applications. We are currently working on supporting >>> the Apache Spark programming APIs as well. >>> >>> === An Excessive Fascination with the Apache Brand === >>> We hope to make Coral a powerful system for data processing, meeting >>> various needs for different deployment characteristics, under a more >>> variety of environments. We see the limitations of simply putting code >>> on GitHub, and we believe the Apache community will help the growth of >>> Coral for the project to become a positively impactful and innovative >>> open source software. We believe Coral is a great fit for the Apache >>> Software Foundation due to the collaboration it aims to achieve from >>> the big data processing community. >>> >>> == Documentation == >>> The current documentation for Coral is at https://snuspl.github.io/coral >>> /. >>> >>> == Initial Source == >>> The Coral codebase is currently hosted at https://github.com/snuspl/cora >>> l. >>> >>> == External Dependencies == >>> To the best of our knowledge, all Coral dependencies are distributed >>> under Apache compatible licenses. Upon acceptance to the incubator, we >>> would begin a thorough analysis of all transitive dependencies to >>> verify this fact and further introduce license checking into the build >>> and release process. >>> >>> == Cryptography == >>> Not applicable. >>> >>> == Required Resources == >>> === Mailing Lists === >>> We will operate two mailing lists as follows: >>> * Coral PMC discussions: priv...@coral.incubator.apache.org >>> * Coral developers: d...@coral.incubator.apache.org >>> >>> === Git Repositories === >>> Upon incubation: https://github.com/apache/incubator-coral. >>> After the incubation, we would like to move the existing repo >>> https://github.com/snuspl/coral to the Apache infrastructure >>> >>> === Issue Tracking === >>> Coral currently tracks its issues using the Github issue tracker: >>> https://github.com/snuspl/coral/issues. We plan to migrate to Apache >>> JIRA. >>> >>> == Initial Committers == >>> * Byung-Gon Chun >>> * Jeongyoon Eo >>> * Geon-Woo Kim >>> * Joo Yeon Kim >>> * Gyewon Lee >>> * Jung-Gil Lee >>> * Sanha Lee >>> * Wooyeon Lee >>> * Yunseong Lee >>> * JangHo Seo >>> * Won Wook Song >>> * Taegeon Um >>> * Youngseok Yang >>> >>> == Affiliations == >>> * SNU (Seoul National University) >>> * Byung-Gon Chun >>> * Jeongyoon Eo >>> * Geon-Woo Kim >>> * Gyewon Lee >>> * Sanha Lee >>> * Wooyeon Lee >>> * Yunseong Lee >>> * JangHo Seo >>> * Won Wook Song >>> * Taegeon Um >>> * Youngseok Yang >>> >>> * LG >>> * Jung-Gil Lee >>> >>> * Samsung >>> * Joo Yeon Kim >>> >>> * Viva Republica >>> * Geon-Woo Kim >>> >>> == Sponsors == >>> === Champions === >>> Byung-Gon Chun >>> >>> === Mentors === >>> * Hyunsik Choi >>> * Byung-Gon Chun >>> * Jean-Baptiste Onofré >>> * Markus Weimer >>> * Reynold Xin >>> >>> === Sponsoring Entity === >>> The Apache Incubator >>> >>> >>> Thanks! >>> Byung-Gon Chun >>> >>> >> -- >> Kevin A. McGrail >> Asst. Treasurer & VP Fundraising, Apache Software Foundation >> Chair Emeritus Apache SpamAssassin Project >> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> > -- Byung-Gon Chun