Thanks to all who voted. Obviously, I'm +1 (binding) on the proposal.
With 14 +1s (10 binding) the vote passes.
I'll start the work to get the podling started.
thanks,
Arun
On Feb 19, 2013, at 8:26 PM, Arun C Murthy wrote:
> Hi Folks,
>
> Thanks for participating in the discussion. I'd like to call a VOTE for
> acceptance of Apache Tez into the Incubator. I'll let the vote run till into
> this weekend (Sun 2/24 6pm PST).
>
> [ ] +1 Accept Apache Tez into the Incubator
> [ ] +0 Don't care.
> [ ] -1 Don't accept Apache Tez into the Incubator because...
>
> Full proposal is pasted at the bottom of this email, and the corresponding
> wiki is http://wiki.apache.org/incubator/TezProposal.
>
> Only VOTEs from Incubator PMC members are binding, but all are welcome to
> express their thoughts.
>
> Here's my +1 (binding).
>
> thanks,
> Arun
>
> PS: From the initial discussion, the only changes are that I've added one new
> mentor and 2 new committers. All the new additions come from the non-major
> employer while we continue to strive to further diversify during the
> incubation. Thanks.
>
>
>
> = Tez =
>
> == Abstract ==
> Tez is an effort to develop a generic application framework which can be used
> to process arbitrarily complex data-processing tasks and also a re-usable set
> of data-processing primitives which can be used by other projects.
>
> == Proposal ==
> Tez is a proposal to develop a generic application which can be used to
> process complex data-processing task DAGs and runs natively on Apache Hadoop
> YARN. YARN is a generic resource-management system on which currently
> applications like MapReduce already exist. MapReduce is a specific, and
> constrained, DAG - which is not optimal for several frameworks like Apache
> Hive
> and Apache Pig. Furthermore, we propose to develop a re-usable set of
> libraries of data-processing primitives such as sorting, merging,
> data-shuffling, intermediate data management etc. which are necessary for Tez
> which we envision can be used directly by other projects.
>
> == Background ==
> Apache Hadoop MapReduce has emerged as the assembly-language on which other
> frameworks like Apache Pig and Apache Hive have been built. However, it has
> been well accepted that MapReduce produces very constrained task DAGs for each
> job which results in Apache Pig and Apache Hive requiring multiple MapReduce
> jobs for several queries. By providing a more expressive DAG of tasks for a
> job, Tez attempts to provide significantly enhanced data-processing
> capabilities for projects like Apache Pig, Apache Hive, Cascading etc.
>
> == Rationale ==
> There is an important gap that Tez fulfills in the Apache Hadoop ecosystem of
> allowing for more expressive task DAGs for data-processing applications such
> as Apache Pig, Apache Hive, Cascading etc.
>
> With emergence of Apache Hadoop YARN, there is a strong need for a
> common DAG application which can then be shared by Apache Pig, Apache Hive,
> Cascading etc.
>
> == Initial Goals ==
> The initial goals for this project are to specify the detailed requirements
> and architecture, and then develop the initial implementation including the
> DAG ApplicationMaster to run natively inside Apache Hadoop YARN.
>
> == Current Status ==
> Significant work has been completed to identify the initial requirements and
> define the overall system architecture. There is a patch available in the
> internal Hortonworks git repository which can act as the initial seed.
>
> === Meritocracy ===
> We plan to invest in supporting a meritocracy. We will discuss the
> requirements
> in an open forum. Several companies have already expressed interest in this
> project, and we intend to invite additional developers to participate.
> We will encourage and monitor community participation so that privileges can
> be
> extended to those that contribute.
>
> === Community ===
> The need for a generic DAG application for data processing in the open source
> is
> tremendous, so there is a potential for a very large community. We believe
> that Tez's extensible architecture will further encourage community
> participation.
> Also, related Apache projects (eg, Pig, Hive) have very large and active
> communities, and we expect that over time Tez will also attract a large
> community.
>
> === Core Developers ===
> The developers on the initial committers list include people very experienced
> in the Apache Hadoop ecosystem:
>
> * Alan Gates
> * Arun C Murthy
> * Ashutosh Chauhan
> * Bikas Saha
> * Chris Douglas
> * Daryn Sharp
> * Devaraj Das
> * Gopal Vijayaraghavan
> * Gunther Hagleitner
> * Hitesh Shah
> * Jason Lowe
> * Jean Xu
> * Jitendra Pandey
> * Julien Le Dem
> * Kevin Wilfong
> * Mike Liddell
> * Namit Jain
> * Nathan Roberts
> * Owen O'Malley
> * Robert Evans
> * Siddharth Seth
> * Tom White
> * Thomas Graves
> * Vikram Dixit
> * Vinod Kumar Vavilap