+1
On Jul 24, 2011, at 10:38 PM, Bill Graham <billgra...@gmail.com> wrote:
+1
On Sun, Jul 24, 2011 at 8:27 AM, Mohammad Nour El-Din <
nour.moham...@gmail.com> wrote:
+1 (Binding)
On Sat, Jul 23, 2011 at 4:57 AM, Ashish <paliwalash...@gmail.com>
wrote:
+1
On Sat, Jul 23, 2011 at 2:00 AM, Avery Ching <ach...@yahoo-inc.com>
wrote:
Hi and good friday to you all,
It's been a week since we submitted our proposal for Giraph's
inclusion
into the Apache incubator and the discussion around the proposal
seems
to
have settled. Thank you for all the comments/questions/general
interest
and
for those who volunteered to be committers. At this time, I'd
like to
ask
for a vote.
The latest proposal can be found at the end of this email and in
the
following wiki:
http://wiki.apache.org/incubator/GiraphProposal
<http://wiki.apache.org/incubator/GiraphProposal>The discussion
regarding
the proposal can be found below:
http://www.mail-archive.com/general@incubator.apache.org/msg29957.html
<http://www.mail-archive.com/general@incubator.apache.org/msg29957.html
Please cast your votes:
[ ] +1 Accept Giraph for incubation
[ ] +0 Indifferent to Giraph incubation
[ ] -1 Reject Giraph for incubation
This vote will close 72 hours from now.
Thanks!
Avery
= Giraph : Large-scale graph processing on Hadoop =
== Abstract ==
Giraph is a large-scale, fault-tolerant, Bulk Synchronous Parallel
(BSP)-based graph processing framework.
== Proposal ==
Graph processing platforms to run large-scale algorithms (such as
page
rank, shared connections, personalization-based popularity, etc.)
have
become quite popular. Some recent examples include Pregel and
HaLoop.
For
general-purpose big data computation, the MapReduce computation
model is
widely adopted and the most deployed MapReduce infrastructure is
Apache
Hadoop. We have implemented a graph-processing framework that is
launched
as a typical Hadoop MapReduce job to leverage existing Hadoop
infrastructure, such as Amazon’s EC2. Giraph builds upon the
graph-oriented
nature of Pregel but additionally adds fault-tolerance to the
coordinator
process with the use of ZooKeeper as its centralized coordination
service.
Additionally, Giraph will include a library of generic graph
algorithms.
== Background ==
Giraph was initially began development as a side project at
Yahoo! at
the
end of 2010. It was made functional in a month and then started
adding
various features. Development has been focused on internal
customers
needs
until this point.
== Rationale ==
Web and online social graphs have been rapidly growing in size
and scale
during the past decade. In 2008, Google estimated that the
number of
web
pages reached over a trillion. Online social networking and email
sites,
including Yahoo!, Google, Microsoft, Facebook, LinkedIn, and
Twitter,
have
hundreds of millions of users and are expected to grow much more
in the
future. Processing these graphs plays a big role in relevant and
personalized information for users, such as results from a search
engine
or
news in an online social networking site.
== Initial Goals ==
At this point, most of the functionality has been implemented and
we are
looking to get more adoption and contributions from users outside
Yahoo!.
We want to ensure that performance scales and that the code is
robust
and
fault tolerant.
== Current Status ==
=== Meritocracy ===
Giraph was initially developed by Avery Ching and Christian Kunz
beginning
in December 2010 at Yahoo!. There are other developers using
Giraph at
Yahoo! that are making suggestions and adding code. We are
reaching out
to
other folks at social networking companies for additional usage and
development.
=== Community ===
Several groups who are interested in either joining our project
or using
our code have contacted us. We certainly believe that there is a
lot of
interest and are actively looking to improve and expand the
community.
=== Core Developers ===
* Avery Ching: Wrote a majority of the code
* Christian Kunz: Wrote most of the communication code and security
integration with Hadoop
=== Alignment ===
Giraph uses several Apache projects as its underlying
infrastructure
(Hadoop and ZooKeeper). It also builds on Apache Maven.
== Known Risks ==
=== Orphaned products ===
There are many social networking companies that would be
interested in
using this graph-processing framework and we have already received
interest
from some of them. Yahoo! is already using this code in
production and
will
certainly continue to use it in the future as well.
=== Inexperience with Open Source ===
While the initial developers have limited experience on
contributing to
open-source projects, Yahoo! as a company has a strong commitment
to
open-source and we have several advisors that we can ask for help.
=== Homogenous Developers ===
At this time, the project is relatively young and the developers
work at
only two companies (Yahoo! and Jybe). However, given the
interest we
have
seen in the project, we expect the diversity to improve in the near
future.
=== Reliance on Salaried Developers ===
Currently Giraph is being developed by a combination of salaried
and
volunteer time. We expect that other corporations will take an
interest
in
this project and likely contribute with salaried developers. Some
individuals will likely spend volunteer time on it as well. It
is still
early in their project and we are hoping for a lot of growth.
=== Relationships with Other Apache Products ===
Giraph depends on many Apache projects: Hadoop, ZooKeeper, Log4j,
Commons,
etc. It is built using Apache Maven.
Giraph has some overlapping functionality with Apache Hama.
However,
there
are some significant differences. Giraph focuses on graph-based
bulk
synchronous parallel (BSP) computing, while Apache Hama is more for
general
purposed BSP computing. Giraph runs on the Hadoop
infrastructure, while
Apache Hama uses its own computing framework.
=== An Excessive Fascination with the Apache Brand ===
The Apache brand is likely to help us find contributors, however,
our
interests in Apache are primarily because the other projects that
we
depend
on are also Apache projects and it makes sense that all this
software be
available from the same place.
=== Documentation ===
Currently we have little documentation, but several examples. We
are
working on improving this situation.
=== Initial Source ===
The initial source of the code is from Yahoo! and began
development in
December 2010. It is already available on GitHub at
https://github.com/aching/Giraph.
=== Source and Intellectual Property Submission Plan ===
We intend the entire code base to be licensed under the Apache
License,
Version 2.0.
=== External Dependencies ===
The required dependencies are all Apache compatible licenses. The
following components with non-Apache licenses are enumerated:
* JSON – Public Domain
=== Cryptography ===
Giraph depends on secure Hadoop that can optionally use Kerberos.
== Required Resources ==
=== Mailing lists ===
* giraph-private (with moderated subscriptions)
* giraph-dev
* giraph-commits
* giraph-users
=== Subversion Directory ===
https://svn.apache.org/repos/asf/incubator/giraph
=== Issue Tracking ===
JIRA Giraph (GIRAPH)
=== Other Resources ===
Giraph has integration tests that can be run with the
LocalJobRunner.
These same tests also designed to be run on a small (even single
node)
Hadoop cluster. While not required at this time, it would be
nice if
such a
resource were available.
=== Initial Committers ===
* Avery Ching, aching at yahoo-inc dot com
* Christian Kunz, christian at jybe-inc dot com
* Owen O’Malley, owen at hortonworks dot com
* Phillip Rhodes, prhodes at apache dot org
* Hyunsik Choi, hyunsik at apache dot org
* Jakob Homan, jghoman at apache dot org
* Arun Suresh, asuresh at yahoo-inc dot com
=== Affiliations ===
* Avery Ching, Yahoo!
* Christian Kunz, Jybe
* Owen O'Malley, Hortonworks
* Phillip Rhodes, Fogbeam Labs
* Hyunsik Choi, Database Lab, Korea University
* Jakob Homan, LinkedIn
* Arun Suresh, Yahoo!
== Sponsors ==
=== Champion ===
Owen O’ Malley
=== Nominated Mentors ===
Owen O’Malley
=== Sponsoring Entity ===
Apache Incubator PMC
--
thanks
ashish
Blog: http://www.ashishpaliwal.com/blog
My Photo Galleries: http://www.pbase.com/ashishpaliwal
--
Thanks
- Mohammad Nour
Author of (WebSphere Application Server Community Edition 2.0 User
Guide)
http://www.redbooks.ibm.com/abstracts/sg247585.html
- LinkedIn: http://www.linkedin.com/in/mnour
- Blog: http://tadabborat.blogspot.com
----
"Life is like riding a bicycle. To keep your balance you must keep
moving"
- Albert Einstein
"Writing clean code is what you must do in order to call yourself a
professional. There is no reasonable excuse for doing anything less
than your best."
- Clean Code: A Handbook of Agile Software Craftsmanship
"Stay hungry, stay foolish."
- Steve Jobs
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org
---------------------------------------------------------------------
To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org
For additional commands, e-mail: general-h...@incubator.apache.org