Jon, I'm available as a mentor if you're still looking. Patrick
On Mon, Jun 6, 2011 at 9:59 AM, Jonathan Hsieh <j...@cloudera.com> wrote: > It looks like we've gotten many positive responses, and thus far have had no > issues brought up. We have 3 mentors signed up now, but if anyone else is > willing, we'd be interested in adding at most one or two more mentors. > > Dicussion seem to have tapered off, so unless I hear otherwise, I plan on > opening the vote late on Tuesday, 6/7! > > Thanks, > Jon. > > On Tue, May 31, 2011 at 12:28 PM, Upayavira <u...@odoko.co.uk> wrote: > >> >> >> On Tue, 31 May 2011 11:53 -0700, "Jonathan Hsieh" <j...@cloudera.com> >> wrote: >> > Hi, >> > >> > I have a few questions. >> > >> > My understanding is that a podling requires 3 +1s for progress (releases, >> > new commiters). Does this mean we need at least 3 mentors? Would it be >> > helpful to have "extra"? >> >> Strictly you do not need three mentors, but having three means you (in >> theory) have three people with binding votes watching your progress, >> which makes the necessary votes much easier, so having 'extra' can help. >> Too many mentors though can lead to them all thinking that someone else >> is doing it. Three does seem an optimal number. >> >> > Is having the Champion being a Mentor ok? >> >> Yes, it is fine. >> >> > Are there any concerns/discussion with the proposal? (or are +1's >> > basically saying lgtm.) >> >> I think the +1s are saying that (although I haven't read the proposal). >> >> Upayavira >> >> >> > On Tue, May 31, 2011 at 4:13 AM, Mohammad Nour El-Din < >> > nour.moham...@gmail.com> wrote: >> > >> > > +1 (binding) >> > > >> > > On Tue, May 31, 2011 at 11:59 AM, Mark Struberg <strub...@yahoo.de> >> wrote: >> > > > +1 >> > > > >> > > > LieGrue, >> > > > strub >> > > > >> > > > --- On Mon, 5/30/11, Yoav Shapira <yo...@apache.org> wrote: >> > > > >> > > >> From: Yoav Shapira <yo...@apache.org> >> > > >> Subject: Re: [PROPOSAL] Flume for the Apache Incubator >> > > >> To: general@incubator.apache.org >> > > >> Date: Monday, May 30, 2011, 11:18 PM >> > > >> On Fri, May 27, 2011 at 10:18 AM, >> > > >> Jonathan Hsieh <j...@cloudera.com> >> > > >> wrote: >> > > >> > I would like to propose Flume to be an Apache >> > > >> Incubator project. Flume is a >> > > >> > distributed, reliable, and available system for >> > > >> efficiently collecting, >> > > >> > aggregating, and moving large amounts of log data to >> > > >> scalable data storage >> > > >> > systems such as Apache Hadoop's HDFS. >> > > >> > >> > > >> > Here's a link to the proposal in the Incubator wiki >> > > >> > http://wiki.apache.org/incubator/FlumeProposal >> > > >> >> > > >> +1, cool stuff. >> > > >> >> > > >> Yoav >> > > >> >> > > >> > >> > > >> > I've also pasted the initial contents below. >> > > >> > >> > > >> > Thanks! >> > > >> > Jon. >> > > >> > >> > > >> > = Flume - A Distributed Log Collection System = >> > > >> > >> > > >> > == Abstract == >> > > >> > >> > > >> > Flume is a distributed, reliable, and available system >> > > >> for efficiently >> > > >> > collecting, aggregating, and moving large amounts of >> > > >> log data to scalable >> > > >> > data storage systems such as Apache Hadoop's HDFS. >> > > >> > >> > > >> > == Proposal == >> > > >> > >> > > >> > Flume is a distributed, reliable, and available system >> > > >> for efficiently >> > > >> > collecting, aggregating, and moving large amounts of >> > > >> log data from many >> > > >> > different sources to a centralized data store. Its >> > > >> main goal is to deliver >> > > >> > data from applications to Hadoop’s HDFS. It has a >> > > >> simple and flexible >> > > >> > architecture for transporting streaming event data via >> > > >> flume nodes to the >> > > >> > data store. It is robust and fault-tolerant with >> > > >> tunable reliability >> > > >> > mechanisms that rely upon many failover and recovery >> > > >> mechanisms. The system >> > > >> > is centrally configured and allows for intelligent >> > > >> dynamic management. It >> > > >> > uses a simple extensible data model that allows for >> > > >> lightweight online >> > > >> > analytic applications. It provides a pluggable >> > > >> mechanism by which new >> > > >> > sources, destinations, and analytic functions which >> > > >> can be integrated within >> > > >> > a Flume pipeline. >> > > >> > >> > > >> > == Background == >> > > >> > >> > > >> > Flume was initially developed by Cloudera to enable >> > > >> reliable and simplified >> > > >> > collection of log information from many distributed >> > > >> sources. It was later >> > > >> > open-sourced by Cloudera on GitHub as an Apache 2.0 >> > > >> licensed project in June >> > > >> > 2010. During this time Flume has been formally >> > > >> released five times as >> > > >> > versions 0.9.0 (June 2010), 0.9.1 (Aug 2010), 0.9.1u1 >> > > >> (Oct 2010), 0.9.2 (Nov >> > > >> > 2010), and 0.9.3 (Feb 2011). These releases are also >> > > >> distributed by >> > > >> > Cloudera as source and binaries along with >> > > >> enhancements as part of Cloudera >> > > >> > Distribution including Apache Hadoop (CDH). >> > > >> > >> > > >> > == Rationale == >> > > >> > >> > > >> > Collecting log information in a data center in a >> > > >> timely, reliable, and >> > > >> > efficient manner is a difficult challenge but >> > > >> important because when >> > > >> > aggregated and analyzed, log information can yield >> > > >> valuable business >> > > >> > insights. We believe that users and operators need >> > > >> a manageable systematic >> > > >> > approach for log collection that simplifies the >> > > >> creation, the monitoring, >> > > >> > and the administration of reliable log data pipelines. >> > > >> Oftentimes today, >> > > >> > this collection is attempted by periodically shipping >> > > >> data in batches and by >> > > >> > using potentially unreliable and inefficient ad-hoc >> > > >> methods. >> > > >> > >> > > >> > Log data is typically generated in various systems >> > > >> running within a data >> > > >> > center that can range from a few machines to hundreds >> > > >> of machines. In >> > > >> > aggregate, the data acts like a large-volume >> > > >> continuous stream with contents >> > > >> > that can have highly-varied format and highly-varied >> > > >> content. The volume >> > > >> > and variety of raw log data makes Apache Hadoop's HDFS >> > > >> file system an ideal >> > > >> > storage location before the eventual analysis. >> > > >> Unfortunately, HDFS has >> > > >> > limitations with regards to durability as well as >> > > >> scaling limitations when >> > > >> > handling a large number of low-bandwidth connections >> > > >> or small files. >> > > >> > Similar technical challenges are also suffered when >> > > >> attempting to write >> > > >> > data to other data storage services. >> > > >> > >> > > >> > Flume addresses these challenges by providing a >> > > >> reliable, scalable, >> > > >> > manageable, and extensible solution. It uses a >> > > >> streaming design for >> > > >> > capturing and aggregating log information from varied >> > > >> sources in a >> > > >> > distributed environment and has centralized management >> > > >> features for minimal >> > > >> > configuration and management overhead. >> > > >> > >> > > >> > == Initial Goals == >> > > >> > >> > > >> > Flume is currently in its first major release with a >> > > >> considerable number of >> > > >> > enhancement requests, tasks, and issues recorded >> > > >> towards its future >> > > >> > development. The initial goal of this project will be >> > > >> to continue to build >> > > >> > community in the spirit of the "Apache Way", and to >> > > >> address the highly >> > > >> > requested features and bug-fixes towards the next dot >> > > >> release. >> > > >> > >> > > >> > Some goals include: >> > > >> > * To stand up a sustaining Apache-based community >> > > >> around the Flume codebase. >> > > >> > * Implementing core functionality of a usable >> > > >> highly-available Flume master. >> > > >> > * Performance, usability, and robustness >> > > >> improvements. >> > > >> > * Improving the ability to monitor and diagnose >> > > >> problems as data is >> > > >> > transported. >> > > >> > * Providing a centralized place for contributed >> > > >> connectors and related >> > > >> > projects. >> > > >> > >> > > >> > = Current Status = >> > > >> > >> > > >> > == Meritocracy == >> > > >> > >> > > >> > Flume was initially developed by Jonathan Hsieh in >> > > >> July 2009 along with >> > > >> > development team at Cloudera. Developers external to >> > > >> Cloudera provided >> > > >> > feedback, suggested features and fixes and implemented >> > > >> extensions of Flume. >> > > >> > Cloudera engineering team has since maintained the >> > > >> project with Jonathan >> > > >> > Hsieh, Henry Robinson, and Patrick Hunt dedicated >> > > >> towards its improvement. >> > > >> > Contributors to Flume and its connectors include >> > > >> developers from different >> > > >> > companies and different parts of the world. >> > > >> > >> > > >> > == Community == >> > > >> > >> > > >> > Flume is currently used by a number of organizations >> > > >> all over the world. >> > > >> > Flume has an active and growing user and developer >> > > >> community with active >> > > >> > participation in [user| >> > > >> > https://groups.google.com/a/cloudera.org/group/flume-user/topics] >> > > >> and >> > > >> > [developer| >> > > https://groups.google.com/a/cloudera.org/group/flume-dev/topics] >> > > >> > mailing lists. The users and developers also >> > > >> communicate via IRC on #flume >> > > >> > at irc.freenode.net. >> > > >> > >> > > >> > Since open sourcing the project, there have been over >> > > >> 15 different people >> > > >> > from diverse organizations who have contributed code. >> > > >> During this period, >> > > >> > the project team has hosted open, in-person, quarterly >> > > >> meetups to discuss >> > > >> > new features, new designs, and new use-case stories. >> > > >> > >> > > >> > == Core Developers == >> > > >> > >> > > >> > The core developers for Flume project are: >> > > >> > * Andrew Bayer: Andrew has a lot of expertise with >> > > >> build tools, >> > > >> > specifically Jenkins continuous integration and >> > > >> Maven. >> > > >> > * Jonathan Hsieh: Jonathan designed and implemented >> > > >> much of the original >> > > >> > code. >> > > >> > * Patrick Hunt: Patrick has improved the web >> > > >> interfaces of Flume components >> > > >> > and contributed several build quality improvements. >> > > >> > * Bruce Mitchener: Bruce has improved the internal >> > > >> logging infrastructure >> > > >> > as well as edited significant portions of the Flume >> > > >> manual. >> > > >> > * Henry Robinson: Henry has implemented much of the >> > > >> ZooKeeper integration, >> > > >> > plugin mechanisms, as well as several Flume features >> > > >> and bug fixes. >> > > >> > * Eric Sammer: Eric has implemented the Maven build, >> > > >> as well as several >> > > >> > Flume features and bug fixes. >> > > >> > >> > > >> > All core developers of the Flume project have >> > > >> contributed towards Hadoop or >> > > >> > related Apache projects and are very familiar with >> > > >> Apache principals and >> > > >> > philosophy for community driven software development. >> > > >> > >> > > >> > == Alignment == >> > > >> > >> > > >> > Flume complements Hadoop Map-Reduce, Pig, Hive, HBase >> > > >> by providing a robust >> > > >> > mechanism to allow log data integration from external >> > > >> systems for effective >> > > >> > analysis. Its design enable efficient integration of >> > > >> newly ingested data to >> > > >> > Hive's data warehouse. >> > > >> > >> > > >> > Flume's architecture is open and easily extensible. >> > > >> This has encouraged >> > > >> > many users to contribute integrate plugins to other >> > > >> projects. For example, >> > > >> > several users have contributed connectors to message >> > > >> queuing and bus >> > > >> > services, to several open source data stores, to >> > > >> incremental search indexes, >> > > >> > and to a stream analysis engines. >> > > >> > >> > > >> > = Known Risks = >> > > >> > >> > > >> > == Orphaned Products == >> > > >> > >> > > >> > Flume is already deployed in production at multiple >> > > >> companies and they are >> > > >> > actively participating in feature requests and user >> > > >> led discussions. Flume >> > > >> > is getting traction with developers and thus the risks >> > > >> of it being orphaned >> > > >> > are minimal. >> > > >> > >> > > >> > == Inexperience with Open Source == >> > > >> > >> > > >> > All code developed for Flume has is open sourced by >> > > >> Cloudera under Apache >> > > >> > 2.0 license. All committers of Flume project are >> > > >> intimately familiar with >> > > >> > the Apache model for open-source development and are >> > > >> experienced with >> > > >> > working with new contributors. >> > > >> > >> > > >> > == Homogeneous Developers == >> > > >> > >> > > >> > The initial set of committers is from a reduced set of >> > > >> organizations. >> > > >> > However, we expect that once approved for incubation, >> > > >> the project will >> > > >> > attract new contributors from diverse organizations >> > > >> and will thus grow >> > > >> > organically. The participation of developers from >> > > >> several different >> > > >> > organizations in the mailing list is a strong >> > > >> indication for this assertion. >> > > >> > >> > > >> > == Reliance on Salaried Developers == >> > > >> > >> > > >> > It is expected that Flume will be developed on >> > > >> salaried and volunteer time, >> > > >> > although all of the initial developers will work on it >> > > >> mainly on salaried >> > > >> > time. >> > > >> > >> > > >> > == Relationships with Other Apache Products == >> > > >> > >> > > >> > Flume depends upon other Apache Projects: Apache >> > > >> Hadoop, Apache Log4J, >> > > >> > Apache ZooKeeper, Apache Thrift, Apache Avro, multiple >> > > >> Apache Commons >> > > >> > components. Its build depends upon Apache Ant and >> > > >> Apache Maven. >> > > >> > >> > > >> > Flume users have created connectors that interact with >> > > >> several other Apache >> > > >> > projects including Apache HBase and Apache Cassandra. >> > > >> > >> > > >> > Flume's functionality has some indirect or direct >> > > >> overlap with the >> > > >> > functionality of Apache Chukwa but has several >> > > >> significant architectural >> > > >> > diffferences. Both systems can be used to collect >> > > >> log data to write to >> > > >> > hdfs. However, Chukwa's primary goals are the >> > > >> analytic and monitoring >> > > >> > aspects of a Hadoop cluster. Instead of focusing on >> > > >> analytics, Flume >> > > >> > focuses primarily upon data transport and integration >> > > >> with a wide set of >> > > >> > data sources and data destinations. >> > > >> Architecturally, Chukwa components are >> > > >> > individually and statically configured. It also >> > > >> depends upon Hadoop >> > > >> > MapReduce for its core functionality. In contrast, >> > > >> Flume's components are >> > > >> > dynamically and centrally configured and does not >> > > >> depend directly upon >> > > >> > Hadoop MapReduce. Furthermore, Flume provides a more >> > > >> general model for >> > > >> > handling data and enables integration with projects >> > > >> such as Apache Hive, >> > > >> > data stores such as Apache HBase, Apache Cassandra and >> > > >> Voldemort, and >> > > >> > several Apache Lucene-related projects. >> > > >> > >> > > >> > == An Excessive Fascination with the Apache Brand == >> > > >> > >> > > >> > We would like Flume to become an Apache project to >> > > >> further foster a healthy >> > > >> > community of contributors and consumers around the >> > > >> project. Since Flume >> > > >> > directly interacts with many Apache Hadoop-related >> > > >> projects by solves an >> > > >> > important problem of many Hadoop users, residing in >> > > >> the the Apache Software >> > > >> > Foundation will increase interaction with the larger >> > > >> community. >> > > >> > >> > > >> > = Documentation = >> > > >> > >> > > >> > * All Flume documentation (User Guide, Developer >> > > >> Guide, Cookbook, and >> > > >> > Windows Guide) is maintained within Flume sources and >> > > >> can be built directly. >> > > >> > * Cloudera provides documentation specific to its >> > > >> distribution of Flume at: >> > > >> > http://archive.cloudera.com/cdh/3/flume/ >> > > >> > * Flume wiki at GitHub: https://github.com/cloudera/flume/wiki >> > > >> > * Flume jira at Cloudera: >> https://issues.cloudera.org/browse/flume >> > > >> > >> > > >> > = Initial Source = >> > > >> > >> > > >> > * https://github.com/cloudera/flume/tree/ >> > > >> > >> > > >> > == Source and Intellectual Property Submission Plan >> > > >> == >> > > >> > >> > > >> > * The initial source is already licensed under the >> > > >> Apache License, Version >> > > >> > 2.0. https://github.com/cloudera/flume/blob/master/LICENSE >> > > >> > >> > > >> > == External Dependencies == >> > > >> > >> > > >> > The required external dependencies are all Apache >> > > >> License or compatible >> > > >> > licenses. Following components with non-Apache >> > > >> licenses are enumerated: >> > > >> > >> > > >> > * org.arabidopsis.ahocorasick : BSD-style >> > > >> > >> > > >> > Non-Apache build tools that are used by Flume are as >> > > >> follows: >> > > >> > >> > > >> > * AsciiDoc: GNU GPLv2 >> > > >> > * FindBugs: GNU LGPL >> > > >> > * Cobertura: GNU GPLv2 >> > > >> > * PMD : BSD-style >> > > >> > >> > > >> > == Cryptography == >> > > >> > >> > > >> > Flume uses standard APIs and tools for SSH and SSL >> > > >> communication where >> > > >> > necessary. >> > > >> > >> > > >> > = Required Resources = >> > > >> > >> > > >> > == Mailing lists == >> > > >> > >> > > >> > * flume-private (with moderated subscriptions) >> > > >> > * flume-dev >> > > >> > * flume-commits >> > > >> > * flume-user >> > > >> > >> > > >> > == Subversion Directory == >> > > >> > >> > > >> > https://svn.apache.org/repos/asf/incubator/flume >> > > >> > >> > > >> > == Issue Tracking == >> > > >> > >> > > >> > JIRA Flume (FLUME) >> > > >> > >> > > >> > == Other Resources == >> > > >> > >> > > >> > The existing code already has unit and integration >> > > >> tests so we would like a >> > > >> > Hudson instance to run them whenever a new patch is >> > > >> submitted. This can be >> > > >> > added after project creation. >> > > >> > >> > > >> > = Initial Committers = >> > > >> > >> > > >> > * Andrew Bayer (abayer at cloudera dot com) >> > > >> > * Jonathan Hsieh (jon at cloudera dot com) >> > > >> > * Aaron Kimball (akimball83 at gmail dot com) >> > > >> > * Bruce Mitchener (bruce.mitchener at gmail dot >> > > >> com) >> > > >> > * Arvind Prabhakar (arvind at cloudera dot com) >> > > >> > * Ahmed Radwan (ahmed at cloudera dot com) >> > > >> > * Henry Robinson (henry at cloudera dot com) >> > > >> > * Eric Sammer (esammer at cloudera dot com) >> > > >> > >> > > >> > = Affiliations = >> > > >> > >> > > >> > * Andrew Bayer, Cloudera >> > > >> > * Jonathan Hsieh, Cloudera >> > > >> > * Aaron Kimball, Odiago >> > > >> > * Bruce Mitchener, Independent >> > > >> > * Arvind Prabhakar, Cloudera >> > > >> > * Ahmed Radwan, Cloudera >> > > >> > * Henry Robinson, Cloudera >> > > >> > * Eric Sammer, Cloudera >> > > >> > >> > > >> > >> > > >> > = Sponsors = >> > > >> > >> > > >> > == Champion == >> > > >> > >> > > >> > * Nigel Daley >> > > >> > >> > > >> > == Nominated Mentors == >> > > >> > >> > > >> > * Tom White >> > > >> > * Nigel Daley >> > > >> > >> > > >> > == Sponsoring Entity == >> > > >> > >> > > >> > * Apache Incubator PMC >> > > >> > >> > > >> > >> > > >> > -- >> > > >> > // Jonathan Hsieh (shay) >> > > >> > // Software Engineer, Cloudera >> > > >> > // j...@cloudera.com >> > > >> > >> > > >> >> > > >> >> --------------------------------------------------------------------- >> > > >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> > > >> For additional commands, e-mail: general-h...@incubator.apache.org >> > > >> >> > > >> >> > > > >> > > > --------------------------------------------------------------------- >> > > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> > > > For additional commands, e-mail: general-h...@incubator.apache.org >> > > > >> > > > >> > > >> > > >> > > >> > > -- >> > > Thanks >> > > - Mohammad Nour >> > > Author of (WebSphere Application Server Community Edition 2.0 User >> Guide) >> > > http://www.redbooks.ibm.com/abstracts/sg247585.html >> > > - LinkedIn: http://www.linkedin.com/in/mnour >> > > - Blog: http://tadabborat.blogspot.com >> > > ---- >> > > "Life is like riding a bicycle. To keep your balance you must keep >> moving" >> > > - Albert Einstein >> > > >> > > "Writing clean code is what you must do in order to call yourself a >> > > professional. There is no reasonable excuse for doing anything less >> > > than your best." >> > > - Clean Code: A Handbook of Agile Software Craftsmanship >> > > >> > > "Stay hungry, stay foolish." >> > > - Steve Jobs >> > > >> > > --------------------------------------------------------------------- >> > > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> > > For additional commands, e-mail: general-h...@incubator.apache.org >> > > >> > > >> > >> > >> > -- >> > // Jonathan Hsieh (shay) >> > // Software Engineer, Cloudera >> > // j...@cloudera.com >> > >> >> --------------------------------------------------------------------- >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> > > > -- > // Jonathan Hsieh (shay) > // Software Engineer, Cloudera > // j...@cloudera.com > --------------------------------------------------------------------- To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org