I read the proposal of Giraph. That's very interesting! I have some experiences in graph processing with MapReduce. I think that the approach of Giraph is very promising since MapReduce already is regarded as the de-facto standards in processing large data and is appropriate to many graph algorithms. If there is a general graph package in MapReduce, it will be widely used in many areas. I would like to participate in Giraph project.
Best regards, Hyunsik Choi Avery Ching wrote: > > Hi, > > I would like to propose Giraph as an Apache Incubator project. Giraph is > a large-scale graph processing infrastructure (inspired by Pregel) that > runs entirely on Hadoop. Giraph applications and MapReduce jobs coexist > on shared Hadoop instances and Giraph applications can be part of Oozie > workflows as a normal MapReduce job. > > Here is a link to the proposal in our GitHub wiki: > > https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal > > The proposal is also inlined below: > > Thanks! > > Avery > > > > = Giraph : Large-scale graph processing on Hadoop = > > == Abstract == > > Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel > (BSP)-based graph processing framework. > > == Proposal == > > Graph processing platforms to run large-scale algorithms (such as page > rank, shared connections, personalization-based popularity, etc.) have > become quite popular. Some recent examples include Pregel and HaLoop. > For general-purpose big data computation, the MapReduce computation model > is widely adopted and the most deployed MapReduce infrastructure is Apache > Hadoop. We have implemented a graph-processing framework that is launched > as a typical Hadoop MapReduce job to leverage existing Hadoop > infrastructure, such as Amazon’s EC2. Giraph builds upon the > graph-oriented nature of Pregel but additionally adds fault-tolerance to > the coordinator process with the use of ZooKeeper as its centralized > coordination service. Additionally, Giraph will include a library of > generic graph algorithms. > > == Background == > > Giraph was initially began development as a side project at Yahoo! at the > end of 2010. It was made functional in a month and then started adding > various features. Development has been focused on internal customers > needs until this point. > > == Rationale == > > Web and online social graphs have been rapidly growing in size and scale > during the past decade. In 2008, Google estimated that the number of web > pages reached over a trillion. Online social networking and email sites, > including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have > hundreds of millions of users and are expected to grow much more in the > future. Processing these graphs plays a big role in relevant and > personalized information for users, such as results from a search engine > or news in an online social networking site. > > == Initial Goals == > > At this point, most of the functionality has been implemented and we are > looking to get more adoption and contributions from users outside Yahoo!. > We want to ensure that performance scales and that the code is robust and > fault tolerant. > > == Current Status == > > === Meritocracy === > > Giraph was initially developed by Avery Ching and Christian Kunz beginning > in December 2010 at Yahoo!. There are other developers using Giraph at > Yahoo! that are making suggestions and adding code. We are reaching out > to other folks at social networking companies for additional usage and > development. > > === Community === > > Several groups who are interested in either joining our project or using > our code have contacted us. We certainly believe that there is a lot of > interest and are actively looking to improve and expand the community. > > === Core Developers === > > Avery Ching: Wrote a majority of the code > Christian Kunz: Wrote most of the communication code and security > integration with Hadoop > > === Alignment === > > Giraph uses several Apache projects as its underlying infrastructure > (Hadoop and ZooKeeper). It also builds on Apache Maven. > > == Known Risks == > > === Orphaned products === > > There are many social networking companies that would be interested in > using this graph-processing framework and we have already received > interest from some of them. Yahoo! is already using this code in > production and will certainly continue to use it in the future as well. > > === Inexperience with Open Source === > > While the initial developers have limited experience on contributing to > open-source projects, Yahoo! as a company has a strong commitment to > open-source and we have several advisors that we can ask for help. > > === Homogenous Developers === > > At this time, the project is relatively young and the developers work at > only two companies (Yahoo! and Jybe). However, given the interest we have > seen in the project, we expect the diversity to improve in the near > future. > > === Reliance on Salaried Developers === > > Currently Giraph is being developed by a combination of salaried and > volunteer time. We expect that other corporations will take an interest > in this project and likely contribute with salaried developers. Some > individuals will likely spend volunteer time on it as well. It is still > early in their project and we are hoping for a lot of growth. > > === Relationships with Other Apache Products === > > Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, > etc. It is built using Apache Maven. > > Giraph has some overlapping functionality with Apache Hama. However, > there are some significant differences. Giraph focuses on graph-based > bulk synchronous parallel (BSP) computing, while Apache Hama is more for > general purposed BSP computing. Giraph runs on the Hadoop infrastructure, > while Apache Hama uses its own computing framework. > > === An Excessive Fascination with the Apache Brand === > > The Apache brand is likely to help us find contributors, however, our > interests in Apache are primarily because the other projects that we > depend on are also Apache projects and it makes sense that all this > software be available from the same place. > > === Documentation === > > Currently we have little documentation, but several examples. We are > working on improving this situation. > > === Initial Source === > > The initial source of the code is from Yahoo! and began development in > December 2010. It is already available on GitHub at > https://github.com/aching/Giraph. > > === Source and Intellectual Property Submission Plan === > > We intend the entire code base to be licensed under the Apache License, > Version 2.0. > > === External Dependencies === > > The required dependencies are all Apache compatible licenses. The > following components with non-Apache licenses are enumerated: > * JSON – Public Domain > > === Cryptography === > > Giraph depends on secure Hadoop that can optionally use Kerberos. > > == Required Resources == > > === Mailing lists === > > * giraph-private (with moderated subscriptions) > * giraph-dev > * giraph-commits > * giraph-users > > === Subversion Directory === > > https://svn.apache.org/repos/asf/incubator/giraph > > === Issue Tracking === > > JIRA Giraph (GIRAPH) > > === Other Resources === > > Giraph has integration tests that can be run with the LocalJobRunner. > These same tests also designed to be run on a small (even single node) > Hadoop cluster. While not required at this time, it would be nice if such > a resource were available. > > === Initial Committers === > > Avery Ching, aching at yahoo-inc dot com > Christian Kunz, christian at jybe-inc dot com > Owen O’Malley, owen at hortonworks dot com > > === Affiliations === > > Avery Ching, Yahoo! > Christian Kunz, Jybe > > == Sponsors == > > === Champion === > > Owen O’ Malley > > === Nominated Mentors === > > Owen O’Malley > > === Sponsoring Entity === > > Apache Incubator PMC > > -- View this message in context: http://old.nabble.com/-PROPOSAL--Proposing-Giraph-for-the-Apache-Incubator-tp32070326p32078057.html Sent from the Apache Incubator - General mailing list archive at Nabble.com. --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org