Re: [VOTE] Accept S2Graph into Apache Incubation

Edward J. Yoon Tue, 24 Nov 2015 13:58:17 -0800

+1 (binding)

Good luck.


On Wed, Nov 25, 2015 at 4:33 AM, Jakob Homan <jgho...@gmail.com> wrote:
> +1 (binding)
>
> On 24 November 2015 at 09:55, Julien Le Dem <jul...@dremio.com> wrote:
>> +1 (binding)
>>
>> On Tue, Nov 24, 2015 at 9:48 AM, Stack <st...@duboce.net> wrote:
>>
>>> +1 (binding)
>>>
>>> On Mon, Nov 23, 2015 at 4:53 PM, Hyunsik Choi <hyun...@apache.org> wrote:
>>>
>>> > Hello folks,
>>> >
>>> > Thanks for all the feedback on the S2Graph Proposal.
>>> >
>>> > I would like to call for a [VOTE] on S2Graph joining the ASF as an
>>> > incubation project.
>>> >
>>> > The vote is open for at least 72 hours:
>>> >
>>> > [ ] +1 accept S2Graph in the Incubator
>>> > [ ] ±0
>>> > [ ] -1 (please give reason)
>>> >
>>> > S2Graph provides a scalable distributed graph database engine over a
>>> > key/value store such as HBase. S2Graph provides a fully asynchronous
>>> > API to manipulate data as a property graph model and fast
>>> > breadth-first-search queries over the graph. S2Graph is designed for
>>> > OLTP-like workloads on graph data sets instead of batch processing,
>>> > and it also provides INSERT/UPDATE operations on them.
>>> >
>>> > The proposal is available on the wiki here:
>>> > https://wiki.apache.org/incubator/S2GraphProposal
>>> >
>>> > Best regards,
>>> > Hyunsik
>>> >
>>> >
>>> > <COPY of the proposal wiki>
>>> >
>>> >
>>> ------------------------------------------------------------------------------------------------
>>> > = S2Graph Proposal =
>>> >
>>> > == Abstract ==
>>> > S2Graph is a distributed and scalable OLTP graph database built on
>>> > Apache HBase to support fast traversal of extremely large graphs.
>>> >
>>> > == Proposal ==
>>> > S2Graph provides a scalable distributed graph database engine over a
>>> > key/value store such as HBase. S2Graph provides a fully asynchronous
>>> > API to manipulate data as a property graph model and fast
>>> > breadth-first-search queries over the graph. S2Graph is designed for
>>> > OLTP-like workloads on graph data sets instead of batch processing.
>>> > Also, S2Graph provides INSERT/UPDATE operations. Its name 'S2Graph' is
>>> > an abbreviated word of '''S'''uper '''S'''imple '''Graph''' Database.
>>> >
>>> > Here are additional materials to introduce S2Graph.
>>> >  * HBaseCon 2015 -
>>> http://www.slideshare.net/HBaseCon/use-cases-session-5
>>> >  * Apache: Big Data 2015 -
>>> > http://schd.ws/hosted_files/apachebigdata2015/06/s2graph_apache_con.pdf
>>> >
>>> > == Background ==
>>> > S2Graph initially started as an internal project at Kakao.com to
>>> > efficiently store user relations and user activities as one large
>>> > graph and to provide a unified query interface to traverse the graph.
>>> > It was open sourced on Github about a 3 months ago in June 2015.
>>> >
>>> > Over time, S2Graph using HBase as the storage tier has begun by
>>> > adapted into various applications, such as messaging, social feeds,
>>> > and realtime recommendations at Kakao.
>>> >
>>> > Users can benefit by using S2Graph`s generalized high level graph
>>> > abstraction API instead of querying via low-level key/value APIs, just
>>> > as Apache Phoenix provides a SQL layer over HBase.
>>> >
>>> > == Rationale ==
>>> > Graph data (highly interconnected data) is very abundant and important
>>> > these days. When users have a multitude of relationships, each with
>>> > complex properties associated with them, a graph model is more
>>> > intuitive and efficient than tabular formats (RDBMS).
>>> >
>>> > There are many ASF projects that provide SQL tiers, but there is no
>>> > ASF projects that provide a scalable graph layer on top of the
>>> > existing hadoop ecosystem. When graph data grows to the trillion edge
>>> > scale, the process of traversing takes a long time and can be costly.
>>> > However, with the benefit of HBase`s scalable architecture, S2Graph
>>> > can traverse large graphs in a breadth-first-search manner
>>> > efficiently.
>>> >
>>> > S2Graph also interoperates with several existing Apache projects
>>> > (HBase, Apache Spark) to provide means of merging real time events and
>>> > batch processed data using the property graph data model.
>>> >
>>> > Many developers run their own domain specific API servers to serve
>>> > their data products, but a graph model is general and the S2Graph API
>>> > fully supports traversal of the graph, so it can be used as a scalable
>>> > general purpose API serving layer for various domains. As long as data
>>> > can be modeled as graph, then users can avoid tedious work developing
>>> > customized API servers if they use S2Graph.
>>> >
>>> > == Initial Goals ==
>>> > The initial goals will be to move the existing codebase to Apache and
>>> > integrate with the Apache development process. Once this is
>>> > accomplished, we plan for incremental development and releases that
>>> > follow the Apache guidelines.
>>> >
>>> > == Current Status ==
>>> >
>>> > === Meritocracy ===
>>> > S2Graph operated on meritocratic principles from the get go.
>>> > Currently, all the discussions pertaining to S2Graph development are
>>> > public on Github. The current incubation proposal includes the major
>>> > code contributors to S2Graph. Several additional people have worked on
>>> > the S2graph codebase for industry use cases and would be interested in
>>> > becoming committers. We are starting with a small committer group and
>>> > we plan to add additional committers following an open merit-based
>>> > decision process during the incubation phase.
>>> >
>>> > === Community ===
>>> > We have already begun building a community but at this time the
>>> > community consists only of S2Graph developers – all Kakao employees –
>>> > and prospective users. S2Graph seeks to develop developer and user
>>> > communities during incubation.
>>> >
>>> > === Core Developers ===
>>> > S2Graph is currently being designed and developed by 2 engineers from
>>> > Kakao. - Doyung Yoon, Deawon Jeong.
>>> >
>>> > === Alignment ===
>>> > Our proposed S2Graph effort aligns closely with Apache HBase. The
>>> > HBase project perimeter is denoted by a simple byte-array based
>>> > Create, Read, Update, Delete and Scan API with no current plans to
>>> > extend beyond these bounds.
>>> >
>>> > S2Graph complements this with a higher level API for a property graph
>>> > model.
>>> >
>>> > S2Graph was designed to offer a scalable distributed graph database
>>> > skin over HBase from the beginning in order to provide a property
>>> > graph model and breadth first search, and will continue to focus on
>>> > providing the graph model.
>>> >
>>> > == Known Risks ==
>>> > === Orphaned Products ===
>>> > The core developers of S2Graph team plan to work full time on this
>>> > project. There is very little risk of S2Graph getting orphaned since
>>> > at least one large company (Kakao) is extensively using it in their
>>> > production HBase clusters. For example, currently there are 20+ use
>>> > cases with more than 1+Trillion edges and 140 million breadth first
>>> > search query requests per minute using S2Graph in production. We plan
>>> > to extend and diversify this community further through Apache.
>>> >
>>> > === Inexperience with Open Source ===
>>> > The core developers are all active users and followers of open source.
>>> > They are already committers and contributors to the S2Graph Github
>>> > project. All have been involved with the source code that has been
>>> > released under an open source license. Though the core set of
>>> > Developers do not have Apache Open Source experience, there are plans
>>> > to onboard individuals with Apache open source experience to the
>>> > project.
>>> >
>>> > === Homogenous Developers ===
>>> > Most committers in this proposal belong to the same institution
>>> > (Kakao). The engagement of these committers goes well beyond the
>>> > necessary development to support research, and all committers work on
>>> > S2Graph full time. Several people from other institutions are working
>>> > on and are familiar with the S2Graph codebase. We will work to attract
>>> > them as future committers during the incubation phase, following a
>>> > merit-based approach.
>>> >
>>> > === Reliance on Salaried Developers ===
>>> > Kakao invested in S2Graph as the distributed graph database solution
>>> > on top of HBase and some of its key engineers are working full time on
>>> > the project. We look forward to other Apache developers and
>>> > researchers contributing to the project. Also key to addressing the
>>> > risk associated with relying on Salaried developers from a single
>>> > entity is to increase the diversity of the contributors and actively
>>> > lobby for Domain experts in the graph database space to contribute.
>>> > Apache S2Graph intends to do this.
>>> >
>>> > === Relationships with Other Apache Products ===
>>> > S2Graph has a strong relationship and dependency with Apache HBase and
>>> > Apache Spark. Being part of Apache’s Incubation community, could help
>>> > with a closer collaboration among these two projects and as well as
>>> > others.
>>> >
>>> > In terms of graph processing frameworks, S2Graph and Apache Giraph
>>> > look similar. However, their goals are apparently different to each
>>> > other. Giraph aims at analytical batch processing on immutable graph
>>> > data sets. In contrast, S2Graph is designed for OLTP-like workloads on
>>> > graph data sets, and S2Graph provides INSERT/UPDATE operations too.
>>> >
>>> >
>>> > === An Excessive Fascination with the Apache Brand ===
>>> > S2Graph is proposing to enter incubation at Apache in order to help
>>> > efforts to diversify the committer-base, not so much to capitalize on
>>> > the Apache brand. The S2Graph project is in production use already
>>> > inside Kakao, but is not expected to be a Kakao product for external
>>> > customers. As such, the S2Graph project is not seeking to use the
>>> > Apache brand as a marketing tool.
>>> >
>>> > == Documentation ==
>>> > Information about S2Graph can be found at
>>> > https://github.com/kakao/s2graph. The following links provide more
>>> > information about S2Graph in open source:
>>> >  * S2Graph web site: https://steamshon.gitbooks.io/s2graph-book/content/
>>> >  * Codebase at Github: https://github.com/kakao/s2graph
>>> >  * Issue Tracking: https://github.com/kakao/s2graph/issues
>>> >  * User community: https://groups.google.com/forum/#!forum/s2graph
>>> >
>>> > == Initial Source ==
>>> >
>>> > The S2Graph codebase is currently hosted on Github:
>>> > https://github.com/kakao/s2graph.
>>> >
>>> > === Source and Intellectual Property Submission Plan ===
>>> >
>>> > Currently, the S2Graph codebase is distributed under the Apache 2.0
>>> > License.
>>> >
>>> > == External Dependencies ==
>>> >
>>> > Beyond relying on Apache HBase, S2Graph has the following external
>>> > dependencies:
>>> >  * Asynchbase (BSD)
>>> >  * Play Framework (Apache 2.0 license)
>>> >  * Scala (http://www.scala-lang.org/license.html)
>>> >  * Spark (Apache 2.0 license)
>>> >  * Kafka (Apache 2.0 license)
>>> >
>>> > == Required Resources ==
>>> >
>>> > === Mailing list ===
>>> >
>>> > We will migrate our mailing lists to the following:
>>> >  * us...@s2graph.incubator.apache.org
>>> >  * d...@s2graph.incubator.apache.org
>>> >  * priv...@s2graph.incubator.apache.org
>>> >  * comm...@s2graph.incubator.apache.org
>>> >
>>> > === Source control ===
>>> >
>>> > The S2Graph team would like to use Git for source code control, due to
>>> > our current use of Git. We request a writeable Git repo for S2Graph,
>>> > and mirroring to be set up to Github through INFRA.
>>> >
>>> > === Issue Tracking ===
>>> >
>>> > S2Graph currently uses the github issue tracking system associated
>>> > with its github repo (https://github.com/kakao/s2graph/issues). We
>>> > will migrate to the Apache JIRA
>>> > (http://issues.apache.org/jira/browse/S2Graph).
>>> >
>>> > === Other Resources ===
>>> >
>>> >  * Jenkins/Hudson for builds and test running.
>>> >  * Wiki for documentation purposes.
>>> >  * Blog to improve project dissemination.
>>> >
>>> > == Initial Committers ==
>>> >
>>> >  * Doyung Yoon <shom83 at gmail dot com>
>>> >  * Daewon Jeong <blueiur at gmail dot com>
>>> >  * Jaesang Kim <honeysleep at gmail dot com>
>>> >  * Hwansung Yu <deejayfwan at gmail dot com>
>>> >  * Min-Seok Kim <mskim.org at gmail dot com>
>>> >  * Chul Kang <miralchul at gmail dot com>
>>> >  * Luke Han <lukehan at apache dot org>
>>> >  * Alexander Bezzubov <bzz at apache dot org>
>>> >
>>> > == Affiliations ==
>>> >
>>> >  * Doyung Yoon, Kakao
>>> >  * Daewon Jeong, Kakao
>>> >  * Jaesang Kim, Kakao
>>> >  * Hwansung Yu, Kakao
>>> >  * Min-Seok Kim, Kakao
>>> >  * Chul Kang, Kakao,
>>> >  * Luke Han, Ebay Inc.
>>> >  * Alexander Bezzubov, NFLabs
>>> >
>>> > == Sponsors ==
>>> >
>>> > === Champion ===
>>> > Hyunsik Choi
>>> >
>>> > === Nominated Mentors ===
>>> >  * Andrew Purtell - Apache Member, Salesforce
>>> >  * Sergio Fernández - Apache Member, Redlink
>>> >  * Hyunsik Choi - Apache Member, Gruter Inc.
>>> >  * Seetharam Venkatesh - IPMC, Hortonworks Inc.
>>> >
>>> > === Sponsoring Entity ===
>>> >
>>> >  * The Apache Incubator
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
>>> > For additional commands, e-mail: general-h...@incubator.apache.org
>>> >
>>> >
>>>
>>
>>
>>
>> --
>> Julien
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
> For additional commands, e-mail: general-h...@incubator.apache.org
>



-- 
Best Regards, Edward J. Yoon

---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org

Re: [VOTE] Accept S2Graph into Apache Incubation

Reply via email to