Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Romain Manni-Bucau Fri, 26 Jan 2018 13:02:08 -0800

Le 26 janv. 2018 21:53, "Byung-Gon Chun" <bgc...@gmail.com> a écrit :


On Sat, Jan 27, 2018 at 5:41 AM, Romain Manni-Bucau <rmannibu...@gmail.com>
wrote:

> Why not doing a beam subproject? Any blocker?
>
>
Thanks for the question, Romain.

We have a flexible, efficient runtime that supports various user programs
(e.g., Beam and Spark programs).
We are taking advantage of Beam as a programming layer, but our focus is
more on optimizing execution on various deployment scenarios.
We also plan to support other programming layers.



I tend to think it can converge since beam is about portability and
complementary IMHO. Can be worth PoCing.



> Otherwise +1 to have it @asf, makes a lot of sense.
>
>
Thanks for the support!

-Gon


> Le 26 janv. 2018 20:58, "Byung-Gon Chun" <bgc...@gmail.com> a écrit :
>
> > On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <da...@apache.org> wrote:
> >
> > > Great work -- I think this technology has a lot of promise, and I'd
> love
> > to
> > > see its evolution inside the Foundation.
> > >
> > >
> > Thanks, Davor!
> >
> >
> > > Parts of it, like the Onyx Intermediate Representation [1], overlap
> with
> > > the work-in-progress inside the Apache Beam project ("portability").
> We'd
> > > love to work together on this -- would you be open to such
> collaboration?
> > > If so, it may not be necessary to start from scratch, and leverage the
> > work
> > > already done.
> > >
> > >
> > Sure. We're open to collaboration.
> >
> >
> > > Regarding the name, Onyx would likely have to be renamed, due to a
> > conflict
> > > with a related technology [2].
> > >
> > >
> > Thanks for pointing it out. It's difficult to come up with a good short
> > name. :)
> > Do you have any suggestion?
> >
> > Thanks!
> > -Gon
> >
> > ---
> > Byung-Gon Chun
> >
> >
> >
> > > Davor
> > >
> > > [1] https://snuspl.github.io/onyx/docs/ir/
> > > [2] http://www.onyxplatform.org/
> > >
> > > On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <bgc...@gmail.com>
> > wrote:
> > >
> > > > Dear Apache Incubator Community,
> > > >
> > > > Please accept the following proposal for presentation and
discussion:
> > > > https://wiki.apache.org/incubator/OnyxProposal
> > > >
> > > > Onyx is a data processing system that aims to flexibly control the
> > > runtime
> > > > behaviors of a job to adapt to varying deployment characteristics
> > (e.g.,
> > > > harnessing transient resources in datacenters, cross-datacenter
> > > deployment,
> > > > changing runtime based on job characteristics, etc.). Onyx provides
> > ways
> > > to
> > > > extend the system’s capabilities and incorporate the extensions to
> the
> > > > flexible job execution.
> > > > Onyx translates a user program (e.g., Apache Beam, Apache Spark)
into
> > an
> > > > Intermediate Representation (IR) DAG, which Onyx optimizes and
> deploys
> > > > based on a deployment policy.
> > > >
> > > > I've attached the proposal below.
> > > >
> > > > Best regards,
> > > > Byung-Gon Chun
> > > >
> > > > = OnyxProposal =
> > > >
> > > > == Abstract ==
> > > > Onyx is a data processing system for flexible employment with
> > > > different execution scenarios for various deployment characteristics
> > > > on clusters.
> > > >
> > > > == Proposal ==
> > > > Today, there is a wide variety of data processing systems with
> > > > different designs for better performance and datacenter efficiency.
> > > > They include processing data on specific resource environments and
> > > > running jobs with specific attributes. Although each system
> > > > successfully solves the problems it targets, most systems are
> designed
> > > > in the way that runtime behaviors are built tightly inside the
system
> > > > core to hide the complexity of distributed computing. This makes it
> > > > hard for a single system to support different deployment
> > > > characteristics with different runtime behaviors without substantial
> > > > effort.
> > > >
> > > > Onyx is a data processing system that aims to flexibly control the
> > > > runtime behaviors of a job to adapt to varying deployment
> > > > characteristics. Moreover, it provides a means of extending the
> > > > system’s capabilities and incorporating the extensions to the
> flexible
> > > > job execution.
> > > >
> > > > In order to be able to easily modify runtime behaviors to adapt to
> > > > varying deployment characteristics, Onyx exposes runtime behaviors
to
> > > > be flexibly configured and modified at both compile-time and runtime
> > > > through a set of high-level graph pass interfaces.
> > > >
> > > > We hope to contribute to the big data processing community by
> enabling
> > > > more flexibility and extensibility in job executions. Furthermore,
we
> > > > can benefit more together as a community when we work together as a
> > > > community to mature the system with more use cases and understanding
> > > > of diverse deployment characteristics. The Apache Software
Foundation
> > > > is the perfect place to achieve these aspirations.
> > > >
> > > > == Background ==
> > > > Many data processing systems have distinctive runtime behaviors
> > > > optimized and configured for specific deployment characteristics
like
> > > > different resource environments and for handling special job
> > > > attributes.
> > > >
> > > > For example, much research have been conducted to overcome the
> > > > challenge of running data processing jobs on cheap, unreliable
> > > > transient resources. Likewise, techniques for disaggregating
> different
> > > > types of resources, like memory, CPU and GPU, are being actively
> > > > developed to use datacenter resources more efficiently. Many
> > > > researchers are also working to run data processing jobs in even
more
> > > > diverse environments, such as across distant datacenters. Similarly,
> > > > for special job attributes, many works take different approaches,
> such
> > > > as runtime optimization, to solve problems like data skew, and to
> > > > optimize systems for data processing jobs with small-scale input
> data.
> > > >
> > > > Although each of the systems performs well with the jobs and in the
> > > > environments they target, they perform poorly with unconsidered
> cases,
> > > > and do not consider supporting multiple deployment characteristics
on
> > > > a single system in their designs.
> > > >
> > > > For an application writer to optimize an application to perform well
> > > > on a certain system engraved with its underlying behaviors, it
> > > > requires a deep understanding of the system itself, which is an
> > > > overhead that often requires a lot of time and effort. Moreover, for
> a
> > > > developer to modify such system behaviors, it requires modifications
> > > > of the system core, which requires an even deeper understanding of
> the
> > > > system itself.
> > > >
> > > > With this background, Onyx is designed to represent all of its jobs
> as
> > > > an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> > > > applications from various programming models (ex. Apache Beam) are
> > > > submitted, transformed to an IR DAG, and optimized/customized for
the
> > > > deployment characteristics. In the IR DAG optimization phase, the
DAG
> > > > is modified through a series of compiler “passes” which reshape or
> > > > annotate the DAG with an expression of the underlying runtime
> > > > behaviors. The IR DAG is then submitted as an execution plan for the
> > > > Onyx runtime. The runtime includes the unmodified parts of data
> > > > processing in the backbone which is transparently integrated with
> > > > configurable components exposed for further extension.
> > > >
> > > > == Rationale ==
> > > > Onyx’s vision lies in providing means for flexibly supporting a wide
> > > > variety of job execution scenarios for users while facilitating
> system
> > > > developers to extend the execution framework with various
> > > > functionalities at the same time. The capabilities of the system can
> > > > be extended as it grows to meet a more variety of execution
> scenarios.
> > > > We require inputs from users and developers from diverse domains in
> > > > order to make it a more thriving and useful project. The Apache
> > > > Software Foundation provides the best tools and community to support
> > > > this vision.
> > > >
> > > > == Initial Goals ==
> > > > Initial goals will be to move the existing codebase to Apache and
> > > > integrate with the Apache development process. We further plan to
> > > > develop our system to meet the needs for more execution scenarios
for
> > > > a more variety of deployment characteristics.
> > > >
> > > > == Current Status ==
> > > > Onyx codebase is currently hosted in a repository at github.com. The
> > > > current version has been developed by system developers at Seoul
> > > > National University, Viva Republica, Samsung, and LG.
> > > >
> > > > == Meritocracy ==
> > > > We plan to strongly support meritocracy. We will discuss the
> > > > requirements in an open forum, and those that continuously
contribute
> > > > to Onyx with the passion to strengthen the system will be invited as
> > > > committers. Contributors that enrich Onyx by providing various use
> > > > cases, various implementations of the configurable components
> > > > including ideas for optimization techniques will be especially
> > > > welcome. Committers with a deep understanding of the system’s
> > > > technical aspects as a whole and its philosophy will definitely be
> > > > voted as the PMC. We will monitor community participation so that
> > > > privileges can be extended to those that contribute.
> > > >
> > > > == Community ==
> > > > We hope to expand our contribution community by becoming an Apache
> > > > incubator project. The contributions will come from both users and
> > > > system developers interested in flexibility and extensibility of job
> > > > executions that Onyx can support. We expect users to mainly
> contribute
> > > > to diversify the use cases and deployment characteristics, and
> > > > developers to  contribute to implement them.
> > > >
> > > > == Alignment ==
> > > > Apache Spark is one of many popular data processing frameworks. The
> > > > system is designed towards optimizing jobs using RDDs in memory and
> > > > many other optimizations built tightly within the framework. In
> > > > contrast to Spark, Onyx aims to provide more flexibility for job
> > > > execution in an easy manner.
> > > >
> > > > Apache Tez enables developers to build complex task DAGs with
control
> > > > over the control plane of job execution. In Onyx, a high-level
> > > > programming layer (ex. Apache Beam) is automatically converted to a
> > > > basic IR DAG and can be converted to any IR DAG through a series of
> > > > easy user writable passes, that can both reshape and modify the
> > > > annotation (of execution properties) of the DAG. Moreover, Onyx
> leaves
> > > > more parts of the job execution configurable, such as the scheduler
> > > > and the data plane. As opposed to providing a set of properties for
> > > > solid optimization, Onyx’s configurable parts can be easily extended
> > > > and explored by implementing the pre-defined interfaces. For
example,
> > > > an arbitrary intermediate data store can be added.
> > > >
> > > > Onyx currently supports Apache Beam programs and we are working on
> > > > supporting Apache Spark programs as well. Onyx also utilizes Apache
> > > > REEF for container management, which allows Onyx to run in Apache
> YARN
> > > > and Apache Mesos clusters. If necessary, we plan to contribute to
and
> > > > collaborate with these other Apache projects for the benefit of all.
> > > > We plan to extend such integrations with more Apache softwares.
> Apache
> > > > software foundation already hosts many major big-data systems, and
we
> > > > expect to help further growth of the big-data community by having
> Onyx
> > > > within the Apache foundation.
> > > >
> > > > == Known Risks ==
> > > > === Orphaned Products ===
> > > > The risk of the Onyx project being orphaned is minimal. There is
> > > > already plenty of work that arduously support different deployment
> > > > characteristics, and we propose a general way to implement them with
> > > > flexible and extensible configuration knobs. The domain of data
> > > > processing is already of high interest, and this domain is expected
> to
> > > > evolve continuously with various other purposes, such as resource
> > > > disaggregation and using transient resources for better datacenter
> > > > resource utilization.
> > > >
> > > > === Inexperience with Open Source ===
> > > > The initial committers include PMC members and committers of other
> > > > Apache projects. They have experience with open source projects,
> > > > starting from their incubation to the top-level. They have been
> > > > involved in the open source development process, and are familiar
> with
> > > > releasing code under an open source license.
> > > >
> > > > === Homogeneous Developers ===
> > > > The initial set of committers is from a limited set of
organizations,
> > > > but we expect to attract new contributors from diverse organizations
> > > > and will thus grow organically once approved for incubation. Our
> prior
> > > > experience with other open source projects will help various
> > > > contributors to actively participate in our project.
> > > >
> > > > === Reliance on Salaried Developers ===
> > > > Many developers are from Seoul National University. This is not
> > > applicable.
> > > >
> > > > === Relationships with Other Apache Products ===
> > > > Onyx positions itself among multiple Apache products. It runs on
> > > > Apache REEF for container management. It also utilizes many useful
> > > > development tools including Apache Maven, Apache Log4J, and multiple
> > > > Apache Commons components. Onyx supports the Apache Beam programming
> > > > model for user applications. We are currently working on supporting
> > > > the Apache Spark programming APIs as well.
> > > >
> > > > === An Excessive Fascination with the Apache Brand ===
> > > > We hope to make Onyx a powerful system for data processing, meeting
> > > > various needs for different deployment characteristics, under a more
> > > > variety of environments. We see the limitations of simply putting
> code
> > > > on GitHub, and we believe the Apache community will help the growth
> of
> > > > Onyx for the project to become a positively impactful and innovative
> > > > open source software. We believe Onyx is a great fit for the Apache
> > > > Software Foundation due to the collaboration it aims to achieve from
> > > > the big data processing community.
> > > >
> > > > == Documentation ==
> > > > The current documentation for Onyx is at
> > https://snuspl.github.io/onyx/.
> > > >
> > > > == Initial Source ==
> > > > The Onyx codebase is currently hosted at
> > https://github.com/snuspl/onyx.
> > > >
> > > > == External Dependencies ==
> > > > To the best of our knowledge, all Onyx dependencies are distributed
> > > > under Apache compatible licenses. Upon acceptance to the incubator,
> we
> > > > would begin a thorough analysis of all transitive dependencies to
> > > > verify this fact and further introduce license checking into the
> build
> > > > and release process.
> > > >
> > > > == Cryptography ==
> > > > Not applicable.
> > > >
> > > > == Required Resources ==
> > > > === Mailing Lists ===
> > > > We will operate two mailing lists as follows:
> > > >    * Onyx PMC discussions: priv...@onyx.incubator.apache.org
> > > >    * Onyx developers: d...@onyx.incubator.apache.org
> > > >
> > > > === Git Repositories ===
> > > > Upon incubation: https://github.com/apache/incubator-onyx.
> > > > After the incubation, we would like to move the existing repo
> > > > https://github.com/snuspl/onyx to the Apache infrastructure
> > > >
> > > > === Issue Tracking ===
> > > > Onyx currently tracks its issues using the Github issue tracker:
> > > > https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> > > > JIRA.
> > > >
> > > > == Initial Committers ==
> > > >   * Byung-Gon Chun
> > > >   * Jeongyoon Eo
> > > >   * Geon-Woo Kim
> > > >   * Joo Yeon Kim
> > > >   * Gyewon Lee
> > > >   * Jung-Gil Lee
> > > >   * Sanha Lee
> > > >   * Wooyeon Lee
> > > >   * Yunseong Lee
> > > >   * JangHo Seo
> > > >   * Won Wook Song
> > > >   * Taegeon Um
> > > >   * Youngseok Yang
> > > >
> > > > == Affiliations ==
> > > >   * SNU (Seoul National University)
> > > >     * Byung-Gon Chun
> > > >     * Jeongyoon Eo
> > > >     * Geon-Woo Kim
> > > >     * Gyewon Lee
> > > >     * Sanha Lee
> > > >     * Wooyeon Lee
> > > >     * Yunseong Lee
> > > >     * JangHo Seo
> > > >     * Won Wook Song
> > > >     * Taegeon Um
> > > >     * Youngseok Yang
> > > >
> > > >   * LG
> > > >     * Jung-Gil Lee
> > > >
> > > >   * Samsung
> > > >     * Joo Yeon Kim
> > > >
> > > >   * Viva Republica
> > > >     * Geon-Woo Kim
> > > >
> > > > == Sponsors ==
> > > > === Champions ===
> > > > Byung-Gon Chun
> > > >
> > > > === Mentors ===
> > > >   * Hyunsik Choi
> > > >   * Byung-Gon Chun
> > > >   * Markus Weimer
> > > >   * Reynold Xin
> > > >
> > > > === Sponsoring Entity ===
> > > > The Apache Incubator
> > > >
> > > >
> > > >
> > > > --
> > > > Byung-Gon Chun
> > > >
> > >
> >
> >
> >
> > --
> > Byung-Gon Chun
> >
>



--
Byung-Gon Chun

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Reply via email to