Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Leif Hedstrom Sat, 27 Jan 2018 11:21:52 -0800

Did we rule out Onyx for sure? Just because some other project might use it on 
say github doesn’t necessarily exclude us from having an Apache Onyx?


FWIW, I agree that surf is too similar in pronunciation to Apache serf. :)

Cheers,

— Leif 

> On Jan 27, 2018, at 07:31, Dave Fisher <dave2w...@comcast.net> wrote:
> 
> Checking “Serf Software” which sounds the same.
> 
> (1) there is already Apache Serf
> (2) Serf is a product from Hashicorp at https://www.serf.io/. This would 
> definitely confuse as it is apparently comparable to ZooKeeper.
> 
> Regards,
> Dave
> 
> Sent from my iPhone
> 
>> On Jan 27, 2018, at 3:12 AM, sebb <seb...@gmail.com> wrote:
>> 
>> A brief search for 'Surf Software' shows quite a few hits.
>> I have not looked to see if they would be likely to be confused with
>> this project or cause problems for others.
>> 
>> But it as though there might be a problem:
>> Surfer -  Golden Software
>> surf @ sourceforge
>> Surf Software company
>> 
>> 
>>> On 27 January 2018 at 08:03, Byung-Gon Chun <bgc...@gmail.com> wrote:
>>> Since we cannot use the name Onyx, we would like to change the project name
>>> to Surf.
>>> I hope that this name works.
>>> 
>>> -Gon
>>> 
>>> ---
>>> Byung-Gon Chun
>>> 
>>> 
>>>> On Sat, Jan 27, 2018 at 4:57 AM, Byung-Gon Chun <bgc...@gmail.com> wrote:
>>>> 
>>>> 
>>>> 
>>>>> On Sat, Jan 27, 2018 at 4:09 AM, Davor Bonaci <da...@apache.org> wrote:
>>>>> 
>>>>> Great work -- I think this technology has a lot of promise, and I'd love
>>>>> to
>>>>> see its evolution inside the Foundation.
>>>>> 
>>>>> 
>>>> Thanks, Davor!
>>>> 
>>>> 
>>>>> Parts of it, like the Onyx Intermediate Representation [1], overlap with
>>>>> the work-in-progress inside the Apache Beam project ("portability"). We'd
>>>>> love to work together on this -- would you be open to such collaboration?
>>>>> If so, it may not be necessary to start from scratch, and leverage the
>>>>> work
>>>>> already done.
>>>>> 
>>>>> 
>>>> Sure. We're open to collaboration.
>>>> 
>>>> 
>>>>> Regarding the name, Onyx would likely have to be renamed, due to a
>>>>> conflict
>>>>> with a related technology [2].
>>>>> 
>>>>> 
>>>> Thanks for pointing it out. It's difficult to come up with a good short
>>>> name. :)
>>>> Do you have any suggestion?
>>>> 
>>>> Thanks!
>>>> -Gon
>>>> 
>>>> ---
>>>> Byung-Gon Chun
>>>> 
>>>> 
>>>> 
>>>>> Davor
>>>>> 
>>>>> [1] https://snuspl.github.io/onyx/docs/ir/
>>>>> [2] http://www.onyxplatform.org/
>>>>> 
>>>>>> On Thu, Jan 25, 2018 at 3:28 PM, Byung-Gon Chun <bgc...@gmail.com> wrote:
>>>>>> 
>>>>>> Dear Apache Incubator Community,
>>>>>> 
>>>>>> Please accept the following proposal for presentation and discussion:
>>>>>> https://wiki.apache.org/incubator/OnyxProposal
>>>>>> 
>>>>>> Onyx is a data processing system that aims to flexibly control the
>>>>> runtime
>>>>>> behaviors of a job to adapt to varying deployment characteristics (e.g.,
>>>>>> harnessing transient resources in datacenters, cross-datacenter
>>>>> deployment,
>>>>>> changing runtime based on job characteristics, etc.). Onyx provides
>>>>> ways to
>>>>>> extend the system’s capabilities and incorporate the extensions to the
>>>>>> flexible job execution.
>>>>>> Onyx translates a user program (e.g., Apache Beam, Apache Spark) into an
>>>>>> Intermediate Representation (IR) DAG, which Onyx optimizes and deploys
>>>>>> based on a deployment policy.
>>>>>> 
>>>>>> I've attached the proposal below.
>>>>>> 
>>>>>> Best regards,
>>>>>> Byung-Gon Chun
>>>>>> 
>>>>>> = OnyxProposal =
>>>>>> 
>>>>>> == Abstract ==
>>>>>> Onyx is a data processing system for flexible employment with
>>>>>> different execution scenarios for various deployment characteristics
>>>>>> on clusters.
>>>>>> 
>>>>>> == Proposal ==
>>>>>> Today, there is a wide variety of data processing systems with
>>>>>> different designs for better performance and datacenter efficiency.
>>>>>> They include processing data on specific resource environments and
>>>>>> running jobs with specific attributes. Although each system
>>>>>> successfully solves the problems it targets, most systems are designed
>>>>>> in the way that runtime behaviors are built tightly inside the system
>>>>>> core to hide the complexity of distributed computing. This makes it
>>>>>> hard for a single system to support different deployment
>>>>>> characteristics with different runtime behaviors without substantial
>>>>>> effort.
>>>>>> 
>>>>>> Onyx is a data processing system that aims to flexibly control the
>>>>>> runtime behaviors of a job to adapt to varying deployment
>>>>>> characteristics. Moreover, it provides a means of extending the
>>>>>> system’s capabilities and incorporating the extensions to the flexible
>>>>>> job execution.
>>>>>> 
>>>>>> In order to be able to easily modify runtime behaviors to adapt to
>>>>>> varying deployment characteristics, Onyx exposes runtime behaviors to
>>>>>> be flexibly configured and modified at both compile-time and runtime
>>>>>> through a set of high-level graph pass interfaces.
>>>>>> 
>>>>>> We hope to contribute to the big data processing community by enabling
>>>>>> more flexibility and extensibility in job executions. Furthermore, we
>>>>>> can benefit more together as a community when we work together as a
>>>>>> community to mature the system with more use cases and understanding
>>>>>> of diverse deployment characteristics. The Apache Software Foundation
>>>>>> is the perfect place to achieve these aspirations.
>>>>>> 
>>>>>> == Background ==
>>>>>> Many data processing systems have distinctive runtime behaviors
>>>>>> optimized and configured for specific deployment characteristics like
>>>>>> different resource environments and for handling special job
>>>>>> attributes.
>>>>>> 
>>>>>> For example, much research have been conducted to overcome the
>>>>>> challenge of running data processing jobs on cheap, unreliable
>>>>>> transient resources. Likewise, techniques for disaggregating different
>>>>>> types of resources, like memory, CPU and GPU, are being actively
>>>>>> developed to use datacenter resources more efficiently. Many
>>>>>> researchers are also working to run data processing jobs in even more
>>>>>> diverse environments, such as across distant datacenters. Similarly,
>>>>>> for special job attributes, many works take different approaches, such
>>>>>> as runtime optimization, to solve problems like data skew, and to
>>>>>> optimize systems for data processing jobs with small-scale input data.
>>>>>> 
>>>>>> Although each of the systems performs well with the jobs and in the
>>>>>> environments they target, they perform poorly with unconsidered cases,
>>>>>> and do not consider supporting multiple deployment characteristics on
>>>>>> a single system in their designs.
>>>>>> 
>>>>>> For an application writer to optimize an application to perform well
>>>>>> on a certain system engraved with its underlying behaviors, it
>>>>>> requires a deep understanding of the system itself, which is an
>>>>>> overhead that often requires a lot of time and effort. Moreover, for a
>>>>>> developer to modify such system behaviors, it requires modifications
>>>>>> of the system core, which requires an even deeper understanding of the
>>>>>> system itself.
>>>>>> 
>>>>>> With this background, Onyx is designed to represent all of its jobs as
>>>>>> an Intermediate Representation (IR) DAG. In the Onyx compiler, user
>>>>>> applications from various programming models (ex. Apache Beam) are
>>>>>> submitted, transformed to an IR DAG, and optimized/customized for the
>>>>>> deployment characteristics. In the IR DAG optimization phase, the DAG
>>>>>> is modified through a series of compiler “passes” which reshape or
>>>>>> annotate the DAG with an expression of the underlying runtime
>>>>>> behaviors. The IR DAG is then submitted as an execution plan for the
>>>>>> Onyx runtime. The runtime includes the unmodified parts of data
>>>>>> processing in the backbone which is transparently integrated with
>>>>>> configurable components exposed for further extension.
>>>>>> 
>>>>>> == Rationale ==
>>>>>> Onyx’s vision lies in providing means for flexibly supporting a wide
>>>>>> variety of job execution scenarios for users while facilitating system
>>>>>> developers to extend the execution framework with various
>>>>>> functionalities at the same time. The capabilities of the system can
>>>>>> be extended as it grows to meet a more variety of execution scenarios.
>>>>>> We require inputs from users and developers from diverse domains in
>>>>>> order to make it a more thriving and useful project. The Apache
>>>>>> Software Foundation provides the best tools and community to support
>>>>>> this vision.
>>>>>> 
>>>>>> == Initial Goals ==
>>>>>> Initial goals will be to move the existing codebase to Apache and
>>>>>> integrate with the Apache development process. We further plan to
>>>>>> develop our system to meet the needs for more execution scenarios for
>>>>>> a more variety of deployment characteristics.
>>>>>> 
>>>>>> == Current Status ==
>>>>>> Onyx codebase is currently hosted in a repository at github.com. The
>>>>>> current version has been developed by system developers at Seoul
>>>>>> National University, Viva Republica, Samsung, and LG.
>>>>>> 
>>>>>> == Meritocracy ==
>>>>>> We plan to strongly support meritocracy. We will discuss the
>>>>>> requirements in an open forum, and those that continuously contribute
>>>>>> to Onyx with the passion to strengthen the system will be invited as
>>>>>> committers. Contributors that enrich Onyx by providing various use
>>>>>> cases, various implementations of the configurable components
>>>>>> including ideas for optimization techniques will be especially
>>>>>> welcome. Committers with a deep understanding of the system’s
>>>>>> technical aspects as a whole and its philosophy will definitely be
>>>>>> voted as the PMC. We will monitor community participation so that
>>>>>> privileges can be extended to those that contribute.
>>>>>> 
>>>>>> == Community ==
>>>>>> We hope to expand our contribution community by becoming an Apache
>>>>>> incubator project. The contributions will come from both users and
>>>>>> system developers interested in flexibility and extensibility of job
>>>>>> executions that Onyx can support. We expect users to mainly contribute
>>>>>> to diversify the use cases and deployment characteristics, and
>>>>>> developers to  contribute to implement them.
>>>>>> 
>>>>>> == Alignment ==
>>>>>> Apache Spark is one of many popular data processing frameworks. The
>>>>>> system is designed towards optimizing jobs using RDDs in memory and
>>>>>> many other optimizations built tightly within the framework. In
>>>>>> contrast to Spark, Onyx aims to provide more flexibility for job
>>>>>> execution in an easy manner.
>>>>>> 
>>>>>> Apache Tez enables developers to build complex task DAGs with control
>>>>>> over the control plane of job execution. In Onyx, a high-level
>>>>>> programming layer (ex. Apache Beam) is automatically converted to a
>>>>>> basic IR DAG and can be converted to any IR DAG through a series of
>>>>>> easy user writable passes, that can both reshape and modify the
>>>>>> annotation (of execution properties) of the DAG. Moreover, Onyx leaves
>>>>>> more parts of the job execution configurable, such as the scheduler
>>>>>> and the data plane. As opposed to providing a set of properties for
>>>>>> solid optimization, Onyx’s configurable parts can be easily extended
>>>>>> and explored by implementing the pre-defined interfaces. For example,
>>>>>> an arbitrary intermediate data store can be added.
>>>>>> 
>>>>>> Onyx currently supports Apache Beam programs and we are working on
>>>>>> supporting Apache Spark programs as well. Onyx also utilizes Apache
>>>>>> REEF for container management, which allows Onyx to run in Apache YARN
>>>>>> and Apache Mesos clusters. If necessary, we plan to contribute to and
>>>>>> collaborate with these other Apache projects for the benefit of all.
>>>>>> We plan to extend such integrations with more Apache softwares. Apache
>>>>>> software foundation already hosts many major big-data systems, and we
>>>>>> expect to help further growth of the big-data community by having Onyx
>>>>>> within the Apache foundation.
>>>>>> 
>>>>>> == Known Risks ==
>>>>>> === Orphaned Products ===
>>>>>> The risk of the Onyx project being orphaned is minimal. There is
>>>>>> already plenty of work that arduously support different deployment
>>>>>> characteristics, and we propose a general way to implement them with
>>>>>> flexible and extensible configuration knobs. The domain of data
>>>>>> processing is already of high interest, and this domain is expected to
>>>>>> evolve continuously with various other purposes, such as resource
>>>>>> disaggregation and using transient resources for better datacenter
>>>>>> resource utilization.
>>>>>> 
>>>>>> === Inexperience with Open Source ===
>>>>>> The initial committers include PMC members and committers of other
>>>>>> Apache projects. They have experience with open source projects,
>>>>>> starting from their incubation to the top-level. They have been
>>>>>> involved in the open source development process, and are familiar with
>>>>>> releasing code under an open source license.
>>>>>> 
>>>>>> === Homogeneous Developers ===
>>>>>> The initial set of committers is from a limited set of organizations,
>>>>>> but we expect to attract new contributors from diverse organizations
>>>>>> and will thus grow organically once approved for incubation. Our prior
>>>>>> experience with other open source projects will help various
>>>>>> contributors to actively participate in our project.
>>>>>> 
>>>>>> === Reliance on Salaried Developers ===
>>>>>> Many developers are from Seoul National University. This is not
>>>>> applicable.
>>>>>> 
>>>>>> === Relationships with Other Apache Products ===
>>>>>> Onyx positions itself among multiple Apache products. It runs on
>>>>>> Apache REEF for container management. It also utilizes many useful
>>>>>> development tools including Apache Maven, Apache Log4J, and multiple
>>>>>> Apache Commons components. Onyx supports the Apache Beam programming
>>>>>> model for user applications. We are currently working on supporting
>>>>>> the Apache Spark programming APIs as well.
>>>>>> 
>>>>>> === An Excessive Fascination with the Apache Brand ===
>>>>>> We hope to make Onyx a powerful system for data processing, meeting
>>>>>> various needs for different deployment characteristics, under a more
>>>>>> variety of environments. We see the limitations of simply putting code
>>>>>> on GitHub, and we believe the Apache community will help the growth of
>>>>>> Onyx for the project to become a positively impactful and innovative
>>>>>> open source software. We believe Onyx is a great fit for the Apache
>>>>>> Software Foundation due to the collaboration it aims to achieve from
>>>>>> the big data processing community.
>>>>>> 
>>>>>> == Documentation ==
>>>>>> The current documentation for Onyx is at https://snuspl.github.io/onyx/
>>>>> .
>>>>>> 
>>>>>> == Initial Source ==
>>>>>> The Onyx codebase is currently hosted at https://github.com/snuspl/onyx
>>>>> .
>>>>>> 
>>>>>> == External Dependencies ==
>>>>>> To the best of our knowledge, all Onyx dependencies are distributed
>>>>>> under Apache compatible licenses. Upon acceptance to the incubator, we
>>>>>> would begin a thorough analysis of all transitive dependencies to
>>>>>> verify this fact and further introduce license checking into the build
>>>>>> and release process.
>>>>>> 
>>>>>> == Cryptography ==
>>>>>> Not applicable.
>>>>>> 
>>>>>> == Required Resources ==
>>>>>> === Mailing Lists ===
>>>>>> We will operate two mailing lists as follows:
>>>>>>  * Onyx PMC discussions: priv...@onyx.incubator.apache.org
>>>>>>  * Onyx developers: d...@onyx.incubator.apache.org
>>>>>> 
>>>>>> === Git Repositories ===
>>>>>> Upon incubation: https://github.com/apache/incubator-onyx.
>>>>>> After the incubation, we would like to move the existing repo
>>>>>> https://github.com/snuspl/onyx to the Apache infrastructure
>>>>>> 
>>>>>> === Issue Tracking ===
>>>>>> Onyx currently tracks its issues using the Github issue tracker:
>>>>>> https://github.com/snuspl/onyx/issues. We plan to migrate to Apache
>>>>>> JIRA.
>>>>>> 
>>>>>> == Initial Committers ==
>>>>>> * Byung-Gon Chun
>>>>>> * Jeongyoon Eo
>>>>>> * Geon-Woo Kim
>>>>>> * Joo Yeon Kim
>>>>>> * Gyewon Lee
>>>>>> * Jung-Gil Lee
>>>>>> * Sanha Lee
>>>>>> * Wooyeon Lee
>>>>>> * Yunseong Lee
>>>>>> * JangHo Seo
>>>>>> * Won Wook Song
>>>>>> * Taegeon Um
>>>>>> * Youngseok Yang
>>>>>> 
>>>>>> == Affiliations ==
>>>>>> * SNU (Seoul National University)
>>>>>>   * Byung-Gon Chun
>>>>>>   * Jeongyoon Eo
>>>>>>   * Geon-Woo Kim
>>>>>>   * Gyewon Lee
>>>>>>   * Sanha Lee
>>>>>>   * Wooyeon Lee
>>>>>>   * Yunseong Lee
>>>>>>   * JangHo Seo
>>>>>>   * Won Wook Song
>>>>>>   * Taegeon Um
>>>>>>   * Youngseok Yang
>>>>>> 
>>>>>> * LG
>>>>>>   * Jung-Gil Lee
>>>>>> 
>>>>>> * Samsung
>>>>>>   * Joo Yeon Kim
>>>>>> 
>>>>>> * Viva Republica
>>>>>>   * Geon-Woo Kim
>>>>>> 
>>>>>> == Sponsors ==
>>>>>> === Champions ===
>>>>>> Byung-Gon Chun
>>>>>> 
>>>>>> === Mentors ===
>>>>>> * Hyunsik Choi
>>>>>> * Byung-Gon Chun
>>>>>> * Markus Weimer
>>>>>> * Reynold Xin
>>>>>> 
>>>>>> === Sponsoring Entity ===
>>>>>> The Apache Incubator
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> --
>>>>>> Byung-Gon Chun
>>>>>> 
>>>>> 
>>>> 
>>>> 
>>>> 
>>>> --
>>>> Byung-Gon Chun
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> Byung-Gon Chun
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>> For additional commands, e-mail: general-h...@incubator.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [PROPOSAL] Onyx - proposal for Apache Incubation

Reply via email to