Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Byung-Gon Chun Sun, 28 Jan 2018 01:51:01 -0800

Thank you for all the information! It looks like Surf doesn't work.

If possible, we'd like to keep Onyx.
Another name we came up with is Coral.


Thanks!
-Gon


On Sun, Jan 28, 2018 at 4:21 AM, Leif Hedstrom <zw...@apache.org> wrote:

> Did we rule out Onyx for sure? Just because some other project might use
> it on say github doesn’t necessarily exclude us from having an Apache Onyx?
>
> FWIW, I agree that surf is too similar in pronunciation to Apache serf. :)
>
> Cheers,
>
> — Leif
>
> > On Jan 27, 2018, at 07:31, Dave Fisher <dave2w...@comcast.net> wrote:
> >
> > Checking “Serf Software” which sounds the same.
> >
> > (1) there is already Apache Serf
> > (2) Serf is a product from Hashicorp at https://www.serf.io/. This
> would definitely confuse as it is apparently comparable to ZooKeeper.
> >
> > Regards,
> > Dave
> >
> > Sent from my iPhone
> >
> >> On Jan 27, 2018, at 3:12 AM, sebb <seb...@gmail.com> wrote:
> >>
> >> A brief search for 'Surf Software' shows quite a few hits.
> >> I have not looked to see if they would be likely to be confused with
> >> this project or cause problems for others.
> >>
> >> But it as though there might be a problem:
> >> Surfer -  Golden Software
> >> surf @ sourceforge
> >> Surf Software company
> >>
> >>
> >>> On 27 January 2018 at 08:03, Byung-Gon Chun <bgc...@gmail.com> wrote:
> >>> Since we cannot use the name Onyx, we would like to change the project
> name
> >>> to Surf.
> >>> I hope that this name works.
> >>>
> >>> -Gon
> >>>
> >>> ---
> >>> Byung-Gon Chun
> >>>
> >>>
> >>>> On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun <bgc...@gmail.com>
> wrote:
> >>>>
> >>>>
> >>>>
> >>>>> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <da...@apache.org>
> wrote:
> >>>>>
> >>>>> Great work -- I think this technology has a lot of promise, and I'd
> love
> >>>>> to
> >>>>> see its evolution inside the Foundation.
> >>>>>
> >>>>>
> >>>> Thanks, Davor!
> >>>>
> >>>>
> >>>>> Parts of it, like the Onyx Intermediate Representation [1], overlap
> with
> >>>>> the work-in-progress inside the Apache Beam project ("portability").
> We'd
> >>>>> love to work together on this -- would you be open to such
> collaboration?
> >>>>> If so, it may not be necessary to start from scratch, and leverage
> the
> >>>>> work
> >>>>> already done.
> >>>>>
> >>>>>
> >>>> Sure. We're open to collaboration.
> >>>>
> >>>>
> >>>>> Regarding the name, Onyx would likely have to be renamed, due to a
> >>>>> conflict
> >>>>> with a related technology [2].
> >>>>>
> >>>>>
> >>>> Thanks for pointing it out. It's difficult to come up with a good
> short
> >>>> name. :)
> >>>> Do you have any suggestion?
> >>>>
> >>>> Thanks!
> >>>> -Gon
> >>>>
> >>>> ---
> >>>> Byung-Gon Chun
> >>>>
> >>>>
> >>>>
> >>>>> Davor
> >>>>>
> >>>>> [1] https://snuspl.github.io/onyx/docs/ir/
> >>>>> [2] http://www.onyxplatform.org/
> >>>>>
> >>>>>> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <bgc...@gmail.com>
> wrote:
> >>>>>>
> >>>>>> Dear Apache Incubator Community,
> >>>>>>
> >>>>>> Please accept the following proposal for presentation and
> discussion:
> >>>>>> https://wiki.apache.org/incubator/OnyxProposal
> >>>>>>
> >>>>>> Onyx is a data processing system that aims to flexibly control the
> >>>>> runtime
> >>>>>> behaviors of a job to adapt to varying deployment characteristics
> (e.g.,
> >>>>>> harnessing transient resources in datacenters, cross-datacenter
> >>>>> deployment,
> >>>>>> changing runtime based on job characteristics, etc.). Onyx provides
> >>>>> ways to
> >>>>>> extend the system’s capabilities and incorporate the extensions to
> the
> >>>>>> flexible job execution.
> >>>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark)
> into an
> >>>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and
> deploys
> >>>>>> based on a deployment policy.
> >>>>>>
> >>>>>> I've attached the proposal below.
> >>>>>>
> >>>>>> Best regards,
> >>>>>> Byung-Gon Chun
> >>>>>>
> >>>>>> = OnyxProposal =
> >>>>>>
> >>>>>> == Abstract ==
> >>>>>> Onyx is a data processing system for flexible employment with
> >>>>>> different execution scenarios for various deployment characteristics
> >>>>>> on clusters.
> >>>>>>
> >>>>>> == Proposal ==
> >>>>>> Today, there is a wide variety of data processing systems with
> >>>>>> different designs for better performance and datacenter efficiency.
> >>>>>> They include processing data on specific resource environments and
> >>>>>> running jobs with specific attributes. Although each system
> >>>>>> successfully solves the problems it targets, most systems are
> designed
> >>>>>> in the way that runtime behaviors are built tightly inside the
> system
> >>>>>> core to hide the complexity of distributed computing. This makes it
> >>>>>> hard for a single system to support different deployment
> >>>>>> characteristics with different runtime behaviors without substantial
> >>>>>> effort.
> >>>>>>
> >>>>>> Onyx is a data processing system that aims to flexibly control the
> >>>>>> runtime behaviors of a job to adapt to varying deployment
> >>>>>> characteristics. Moreover, it provides a means of extending the
> >>>>>> system’s capabilities and incorporating the extensions to the
> flexible
> >>>>>> job execution.
> >>>>>>
> >>>>>> In order to be able to easily modify runtime behaviors to adapt to
> >>>>>> varying deployment characteristics, Onyx exposes runtime behaviors
> to
> >>>>>> be flexibly configured and modified at both compile-time and runtime
> >>>>>> through a set of high-level graph pass interfaces.
> >>>>>>
> >>>>>> We hope to contribute to the big data processing community by
> enabling
> >>>>>> more flexibility and extensibility in job executions. Furthermore,
> we
> >>>>>> can benefit more together as a community when we work together as a
> >>>>>> community to mature the system with more use cases and understanding
> >>>>>> of diverse deployment characteristics. The Apache Software
> Foundation
> >>>>>> is the perfect place to achieve these aspirations.
> >>>>>>
> >>>>>> == Background ==
> >>>>>> Many data processing systems have distinctive runtime behaviors
> >>>>>> optimized and configured for specific deployment characteristics
> like
> >>>>>> different resource environments and for handling special job
> >>>>>> attributes.
> >>>>>>
> >>>>>> For example, much research have been conducted to overcome the
> >>>>>> challenge of running data processing jobs on cheap, unreliable
> >>>>>> transient resources. Likewise, techniques for disaggregating
> different
> >>>>>> types of resources, like memory, CPU and GPU, are being actively
> >>>>>> developed to use datacenter resources more efficiently. Many
> >>>>>> researchers are also working to run data processing jobs in even
> more
> >>>>>> diverse environments, such as across distant datacenters. Similarly,
> >>>>>> for special job attributes, many works take different approaches,
> such
> >>>>>> as runtime optimization, to solve problems like data skew, and to
> >>>>>> optimize systems for data processing jobs with small-scale input
> data.
> >>>>>>
> >>>>>> Although each of the systems performs well with the jobs and in the
> >>>>>> environments they target, they perform poorly with unconsidered
> cases,
> >>>>>> and do not consider supporting multiple deployment characteristics
> on
> >>>>>> a single system in their designs.
> >>>>>>
> >>>>>> For an application writer to optimize an application to perform well
> >>>>>> on a certain system engraved with its underlying behaviors, it
> >>>>>> requires a deep understanding of the system itself, which is an
> >>>>>> overhead that often requires a lot of time and effort. Moreover,
> for a
> >>>>>> developer to modify such system behaviors, it requires modifications
> >>>>>> of the system core, which requires an even deeper understanding of
> the
> >>>>>> system itself.
> >>>>>>
> >>>>>> With this background, Onyx is designed to represent all of its jobs
> as
> >>>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
> >>>>>> applications from various programming models (ex. Apache Beam) are
> >>>>>> submitted, transformed to an IR DAG, and optimized/customized for
> the
> >>>>>> deployment characteristics. In the IR DAG optimization phase, the
> DAG
> >>>>>> is modified through a series of compiler “passes” which reshape or
> >>>>>> annotate the DAG with an expression of the underlying runtime
> >>>>>> behaviors. The IR DAG is then submitted as an execution plan for the
> >>>>>> Onyx runtime. The runtime includes the unmodified parts of data
> >>>>>> processing in the backbone which is transparently integrated with
> >>>>>> configurable components exposed for further extension.
> >>>>>>
> >>>>>> == Rationale ==
> >>>>>> Onyx’s vision lies in providing means for flexibly supporting a wide
> >>>>>> variety of job execution scenarios for users while facilitating
> system
> >>>>>> developers to extend the execution framework with various
> >>>>>> functionalities at the same time. The capabilities of the system can
> >>>>>> be extended as it grows to meet a more variety of execution
> scenarios.
> >>>>>> We require inputs from users and developers from diverse domains in
> >>>>>> order to make it a more thriving and useful project. The Apache
> >>>>>> Software Foundation provides the best tools and community to support
> >>>>>> this vision.
> >>>>>>
> >>>>>> == Initial Goals ==
> >>>>>> Initial goals will be to move the existing codebase to Apache and
> >>>>>> integrate with the Apache development process. We further plan to
> >>>>>> develop our system to meet the needs for more execution scenarios
> for
> >>>>>> a more variety of deployment characteristics.
> >>>>>>
> >>>>>> == Current Status ==
> >>>>>> Onyx codebase is currently hosted in a repository at github.com.
> The
> >>>>>> current version has been developed by system developers at Seoul
> >>>>>> National University, Viva Republica, Samsung, and LG.
> >>>>>>
> >>>>>> == Meritocracy ==
> >>>>>> We plan to strongly support meritocracy. We will discuss the
> >>>>>> requirements in an open forum, and those that continuously
> contribute
> >>>>>> to Onyx with the passion to strengthen the system will be invited as
> >>>>>> committers. Contributors that enrich Onyx by providing various use
> >>>>>> cases, various implementations of the configurable components
> >>>>>> including ideas for optimization techniques will be especially
> >>>>>> welcome. Committers with a deep understanding of the system’s
> >>>>>> technical aspects as a whole and its philosophy will definitely be
> >>>>>> voted as the PMC. We will monitor community participation so that
> >>>>>> privileges can be extended to those that contribute.
> >>>>>>
> >>>>>> == Community ==
> >>>>>> We hope to expand our contribution community by becoming an Apache
> >>>>>> incubator project. The contributions will come from both users and
> >>>>>> system developers interested in flexibility and extensibility of job
> >>>>>> executions that Onyx can support. We expect users to mainly
> contribute
> >>>>>> to diversify the use cases and deployment characteristics, and
> >>>>>> developers to  contribute to implement them.
> >>>>>>
> >>>>>> == Alignment ==
> >>>>>> Apache Spark is one of many popular data processing frameworks. The
> >>>>>> system is designed towards optimizing jobs using RDDs in memory and
> >>>>>> many other optimizations built tightly within the framework. In
> >>>>>> contrast to Spark, Onyx aims to provide more flexibility for job
> >>>>>> execution in an easy manner.
> >>>>>>
> >>>>>> Apache Tez enables developers to build complex task DAGs with
> control
> >>>>>> over the control plane of job execution. In Onyx, a high-level
> >>>>>> programming layer (ex. Apache Beam) is automatically converted to a
> >>>>>> basic IR DAG and can be converted to any IR DAG through a series of
> >>>>>> easy user writable passes, that can both reshape and modify the
> >>>>>> annotation (of execution properties) of the DAG. Moreover, Onyx
> leaves
> >>>>>> more parts of the job execution configurable, such as the scheduler
> >>>>>> and the data plane. As opposed to providing a set of properties for
> >>>>>> solid optimization, Onyx’s configurable parts can be easily extended
> >>>>>> and explored by implementing the pre-defined interfaces. For
> example,
> >>>>>> an arbitrary intermediate data store can be added.
> >>>>>>
> >>>>>> Onyx currently supports Apache Beam programs and we are working on
> >>>>>> supporting Apache Spark programs as well. Onyx also utilizes Apache
> >>>>>> REEF for container management, which allows Onyx to run in Apache
> YARN
> >>>>>> and Apache Mesos clusters. If necessary, we plan to contribute to
> and
> >>>>>> collaborate with these other Apache projects for the benefit of all.
> >>>>>> We plan to extend such integrations with more Apache softwares.
> Apache
> >>>>>> software foundation already hosts many major big-data systems, and
> we
> >>>>>> expect to help further growth of the big-data community by having
> Onyx
> >>>>>> within the Apache foundation.
> >>>>>>
> >>>>>> == Known Risks ==
> >>>>>> === Orphaned Products ===
> >>>>>> The risk of the Onyx project being orphaned is minimal. There is
> >>>>>> already plenty of work that arduously support different deployment
> >>>>>> characteristics, and we propose a general way to implement them with
> >>>>>> flexible and extensible configuration knobs. The domain of data
> >>>>>> processing is already of high interest, and this domain is expected
> to
> >>>>>> evolve continuously with various other purposes, such as resource
> >>>>>> disaggregation and using transient resources for better datacenter
> >>>>>> resource utilization.
> >>>>>>
> >>>>>> === Inexperience with Open Source ===
> >>>>>> The initial committers include PMC members and committers of other
> >>>>>> Apache projects. They have experience with open source projects,
> >>>>>> starting from their incubation to the top-level. They have been
> >>>>>> involved in the open source development process, and are familiar
> with
> >>>>>> releasing code under an open source license.
> >>>>>>
> >>>>>> === Homogeneous Developers ===
> >>>>>> The initial set of committers is from a limited set of
> organizations,
> >>>>>> but we expect to attract new contributors from diverse organizations
> >>>>>> and will thus grow organically once approved for incubation. Our
> prior
> >>>>>> experience with other open source projects will help various
> >>>>>> contributors to actively participate in our project.
> >>>>>>
> >>>>>> === Reliance on Salaried Developers ===
> >>>>>> Many developers are from Seoul National University. This is not
> >>>>> applicable.
> >>>>>>
> >>>>>> === Relationships with Other Apache Products ===
> >>>>>> Onyx positions itself among multiple Apache products. It runs on
> >>>>>> Apache REEF for container management. It also utilizes many useful
> >>>>>> development tools including Apache Maven, Apache Log4J, and multiple
> >>>>>> Apache Commons components. Onyx supports the Apache Beam programming
> >>>>>> model for user applications. We are currently working on supporting
> >>>>>> the Apache Spark programming APIs as well.
> >>>>>>
> >>>>>> === An Excessive Fascination with the Apache Brand ===
> >>>>>> We hope to make Onyx a powerful system for data processing, meeting
> >>>>>> various needs for different deployment characteristics, under a more
> >>>>>> variety of environments. We see the limitations of simply putting
> code
> >>>>>> on GitHub, and we believe the Apache community will help the growth
> of
> >>>>>> Onyx for the project to become a positively impactful and innovative
> >>>>>> open source software. We believe Onyx is a great fit for the Apache
> >>>>>> Software Foundation due to the collaboration it aims to achieve from
> >>>>>> the big data processing community.
> >>>>>>
> >>>>>> == Documentation ==
> >>>>>> The current documentation for Onyx is at
> https://snuspl.github.io/onyx/
> >>>>> .
> >>>>>>
> >>>>>> == Initial Source ==
> >>>>>> The Onyx codebase is currently hosted at
> https://github.com/snuspl/onyx
> >>>>> .
> >>>>>>
> >>>>>> == External Dependencies ==
> >>>>>> To the best of our knowledge, all Onyx dependencies are distributed
> >>>>>> under Apache compatible licenses. Upon acceptance to the incubator,
> we
> >>>>>> would begin a thorough analysis of all transitive dependencies to
> >>>>>> verify this fact and further introduce license checking into the
> build
> >>>>>> and release process.
> >>>>>>
> >>>>>> == Cryptography ==
> >>>>>> Not applicable.
> >>>>>>
> >>>>>> == Required Resources ==
> >>>>>> === Mailing Lists ===
> >>>>>> We will operate two mailing lists as follows:
> >>>>>>  * Onyx PMC discussions: priv...@onyx.incubator.apache.org
> >>>>>>  * Onyx developers: d...@onyx.incubator.apache.org
> >>>>>>
> >>>>>> === Git Repositories ===
> >>>>>> Upon incubation: https://github.com/apache/incubator-onyx.
> >>>>>> After the incubation, we would like to move the existing repo
> >>>>>> https://github.com/snuspl/onyx to the Apache infrastructure
> >>>>>>
> >>>>>> === Issue Tracking ===
> >>>>>> Onyx currently tracks its issues using the Github issue tracker:
> >>>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
> >>>>>> JIRA.
> >>>>>>
> >>>>>> == Initial Committers ==
> >>>>>> * Byung-Gon Chun
> >>>>>> * Jeongyoon Eo
> >>>>>> * Geon-Woo Kim
> >>>>>> * Joo Yeon Kim
> >>>>>> * Gyewon Lee
> >>>>>> * Jung-Gil Lee
> >>>>>> * Sanha Lee
> >>>>>> * Wooyeon Lee
> >>>>>> * Yunseong Lee
> >>>>>> * JangHo Seo
> >>>>>> * Won Wook Song
> >>>>>> * Taegeon Um
> >>>>>> * Youngseok Yang
> >>>>>>
> >>>>>> == Affiliations ==
> >>>>>> * SNU (Seoul National University)
> >>>>>>   * Byung-Gon Chun
> >>>>>>   * Jeongyoon Eo
> >>>>>>   * Geon-Woo Kim
> >>>>>>   * Gyewon Lee
> >>>>>>   * Sanha Lee
> >>>>>>   * Wooyeon Lee
> >>>>>>   * Yunseong Lee
> >>>>>>   * JangHo Seo
> >>>>>>   * Won Wook Song
> >>>>>>   * Taegeon Um
> >>>>>>   * Youngseok Yang
> >>>>>>
> >>>>>> * LG
> >>>>>>   * Jung-Gil Lee
> >>>>>>
> >>>>>> * Samsung
> >>>>>>   * Joo Yeon Kim
> >>>>>>
> >>>>>> * Viva Republica
> >>>>>>   * Geon-Woo Kim
> >>>>>>
> >>>>>> == Sponsors ==
> >>>>>> === Champions ===
> >>>>>> Byung-Gon Chun
> >>>>>>
> >>>>>> === Mentors ===
> >>>>>> * Hyunsik Choi
> >>>>>> * Byung-Gon Chun
> >>>>>> * Markus Weimer
> >>>>>> * Reynold Xin
> >>>>>>
> >>>>>> === Sponsoring Entity ===
> >>>>>> The Apache Incubator
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> Byung-Gon Chun
> >>>>>>
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> Byung-Gon Chun
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Byung-Gon Chun
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> >> For additional commands, e-mail: general-h...@incubator.apache.org
> >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> > For additional commands, e-mail: general-h...@incubator.apache.org
> >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>
>


-- 
Byung-Gon Chun

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Reply via email to