Sounds good to me. Thanks for your reply Avery. - Henry
On Thu, Jul 21, 2011 at 4:39 PM, Avery Ching <ach...@yahoo-inc.com> wrote: > Henry, > > While we haven't begun too much work on a generic library, the intent is to > provide generic vertex input/output formats, aggregators, combiners, and > graph computations that make it very easy for a user to get started right > away. None of these need to be explicitly integrated with Hadoop or Hadoop > objects. That being said, we provide users the ability to use existing > Hadoop Writable implementations, such as IntWritable, FloatWritable, etc. to > make their lives easier rather than reimplementing those basic types. > Similarly, the methods of VertexInputFormat/VertexOutputFormat need not be > implemented using an underlying Hadoop InputFormat/OutputFormat, but they are > similar to make it easy to do so if desired. > > Hope that answers your question, > > Avery > > On Jul 21, 2011, at 4:09 PM, Henry Saputra wrote: > >> Will the library generic graph algorithm be tightly coupled with the >> Hadoop integration piece? >> >> - Henry >> >> On Fri, Jul 15, 2011 at 11:14 AM, Avery Ching <ach...@yahoo-inc.com> wrote: >>> Hi, >>> >>> I would like to propose Giraph as an Apache Incubator project. Giraph is a >>> large-scale graph processing infrastructure (inspired by Pregel) that runs >>> entirely on Hadoop. Giraph applications and MapReduce jobs coexist on >>> shared Hadoop instances and Giraph applications can be part of Oozie >>> workflows as a normal MapReduce job. >>> >>> Here is a link to the proposal in our GitHub wiki: >>> >>> https://github.com/aching/Giraph/wiki/Apache-Incubator-Proposal >>> >>> The proposal is also inlined below: >>> >>> Thanks! >>> >>> Avery >>> >>> >>> >>> = Giraph : Large-scale graph processing on Hadoop = >>> >>> == Abstract == >>> >>> Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel >>> (BSP)-based graph processing framework. >>> >>> == Proposal == >>> >>> Graph processing platforms to run large-scale algorithms (such as page >>> rank, shared connections, personalization-based popularity, etc.) have >>> become quite popular. Some recent examples include Pregel and HaLoop. For >>> general-purpose big data computation, the MapReduce computation model is >>> widely adopted and the most deployed MapReduce infrastructure is Apache >>> Hadoop. We have implemented a graph-processing framework that is launched >>> as a typical Hadoop MapReduce job to leverage existing Hadoop >>> infrastructure, such as Amazon’s EC2. Giraph builds upon the >>> graph-oriented nature of Pregel but additionally adds fault-tolerance to >>> the coordinator process with the use of ZooKeeper as its centralized >>> coordination service. Additionally, Giraph will include a library of >>> generic graph algorithms. >>> >>> == Background == >>> >>> Giraph was initially began development as a side project at Yahoo! at the >>> end of 2010. It was made functional in a month and then started adding >>> various features. Development has been focused on internal customers needs >>> until this point. >>> >>> == Rationale == >>> >>> Web and online social graphs have been rapidly growing in size and scale >>> during the past decade. In 2008, Google estimated that the number of web >>> pages reached over a trillion. Online social networking and email sites, >>> including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and Twitter, have >>> hundreds of millions of users and are expected to grow much more in the >>> future. Processing these graphs plays a big role in relevant and >>> personalized information for users, such as results from a search engine or >>> news in an online social networking site. >>> >>> == Initial Goals == >>> >>> At this point, most of the functionality has been implemented and we are >>> looking to get more adoption and contributions from users outside Yahoo!. >>> We want to ensure that performance scales and that the code is robust and >>> fault tolerant. >>> >>> == Current Status == >>> >>> === Meritocracy === >>> >>> Giraph was initially developed by Avery Ching and Christian Kunz beginning >>> in December 2010 at Yahoo!. There are other developers using Giraph at >>> Yahoo! that are making suggestions and adding code. We are reaching out to >>> other folks at social networking companies for additional usage and >>> development. >>> >>> === Community === >>> >>> Several groups who are interested in either joining our project or using >>> our code have contacted us. We certainly believe that there is a lot of >>> interest and are actively looking to improve and expand the community. >>> >>> === Core Developers === >>> >>> Avery Ching: Wrote a majority of the code >>> Christian Kunz: Wrote most of the communication code and security >>> integration with Hadoop >>> >>> === Alignment === >>> >>> Giraph uses several Apache projects as its underlying infrastructure >>> (Hadoop and ZooKeeper). It also builds on Apache Maven. >>> >>> == Known Risks == >>> >>> === Orphaned products === >>> >>> There are many social networking companies that would be interested in >>> using this graph-processing framework and we have already received interest >>> from some of them. Yahoo! is already using this code in production and >>> will certainly continue to use it in the future as well. >>> >>> === Inexperience with Open Source === >>> >>> While the initial developers have limited experience on contributing to >>> open-source projects, Yahoo! as a company has a strong commitment to >>> open-source and we have several advisors that we can ask for help. >>> >>> === Homogenous Developers === >>> >>> At this time, the project is relatively young and the developers work at >>> only two companies (Yahoo! and Jybe). However, given the interest we have >>> seen in the project, we expect the diversity to improve in the near future. >>> >>> === Reliance on Salaried Developers === >>> >>> Currently Giraph is being developed by a combination of salaried and >>> volunteer time. We expect that other corporations will take an interest in >>> this project and likely contribute with salaried developers. Some >>> individuals will likely spend volunteer time on it as well. It is still >>> early in their project and we are hoping for a lot of growth. >>> >>> === Relationships with Other Apache Products === >>> >>> Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j, Commons, >>> etc. It is built using Apache Maven. >>> >>> Giraph has some overlapping functionality with Apache Hama. However, there >>> are some significant differences. Giraph focuses on graph-based bulk >>> synchronous parallel (BSP) computing, while Apache Hama is more for general >>> purposed BSP computing. Giraph runs on the Hadoop infrastructure, while >>> Apache Hama uses its own computing framework. >>> >>> === An Excessive Fascination with the Apache Brand === >>> >>> The Apache brand is likely to help us find contributors, however, our >>> interests in Apache are primarily because the other projects that we depend >>> on are also Apache projects and it makes sense that all this software be >>> available from the same place. >>> >>> === Documentation === >>> >>> Currently we have little documentation, but several examples. We are >>> working on improving this situation. >>> >>> === Initial Source === >>> >>> The initial source of the code is from Yahoo! and began development in >>> December 2010. It is already available on GitHub at >>> https://github.com/aching/Giraph. >>> >>> === Source and Intellectual Property Submission Plan === >>> >>> We intend the entire code base to be licensed under the Apache License, >>> Version 2.0. >>> >>> === External Dependencies === >>> >>> The required dependencies are all Apache compatible licenses. The >>> following components with non-Apache licenses are enumerated: >>> * JSON – Public Domain >>> >>> === Cryptography === >>> >>> Giraph depends on secure Hadoop that can optionally use Kerberos. >>> >>> == Required Resources == >>> >>> === Mailing lists === >>> >>> * giraph-private (with moderated subscriptions) >>> * giraph-dev >>> * giraph-commits >>> * giraph-users >>> >>> === Subversion Directory === >>> >>> https://svn.apache.org/repos/asf/incubator/giraph >>> >>> === Issue Tracking === >>> >>> JIRA Giraph (GIRAPH) >>> >>> === Other Resources === >>> >>> Giraph has integration tests that can be run with the LocalJobRunner. >>> These same tests also designed to be run on a small (even single node) >>> Hadoop cluster. While not required at this time, it would be nice if such >>> a resource were available. >>> >>> === Initial Committers === >>> >>> Avery Ching, aching at yahoo-inc dot com >>> Christian Kunz, christian at jybe-inc dot com >>> Owen O’Malley, owen at hortonworks dot com >>> >>> === Affiliations === >>> >>> Avery Ching, Yahoo! >>> Christian Kunz, Jybe >>> >>> == Sponsors == >>> >>> === Champion === >>> >>> Owen O’ Malley >>> >>> === Nominated Mentors === >>> >>> Owen O’Malley >>> >>> === Sponsoring Entity === >>> >>> Apache Incubator PMC >>> >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org