Re: [VOTE] Retire ALOIS podling
+1 On Wed, Jun 22, 2011 at 7:06 PM, Henri Yandell wrote: > +1. > > Source code should be removed from SVN as the podling has not signed > off on its copyright items. > > Hen > > On Tue, Jun 21, 2011 at 8:52 AM, Christian Grobmeier > wrote: >> Hello, >> >> as already mentioned last week, the ALOIS project is dead and it seems >> there is no way to recover in near future (or even later). The >> developers told me in a private message in March that they cannot >> continue due to personal reasons. It seem this has become truth. >> >> I have set up a vote on the dev mailinglist: >> * http://s.apache.org/eBx >> (Note: one of the voters responded on the private list - I counted the vote) >> >> So far, no releases have been made. >> >> This vote passed before a few hour after being open for 5 days. >> >> Please vote for retirement of the alois podling. If this vote passes, >> I will step to the discussions on retirement and finally retire it. >> >> Thanks, >> Christian >> >> [] +1 - please retire >> [] +/-0 >> [] -1 - please don't retire, because... >> >> - >> To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org >> For additional commands, e-mail: general-h...@incubator.apache.org >> >> > > - > To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org > For additional commands, e-mail: general-h...@incubator.apache.org > > -- Thanks - Mohammad Nour Author of (WebSphere Application Server Community Edition 2.0 User Guide) http://www.redbooks.ibm.com/abstracts/sg247585.html - LinkedIn: http://www.linkedin.com/in/mnour - Blog: http://tadabborat.blogspot.com "Life is like riding a bicycle. To keep your balance you must keep moving" - Albert Einstein "Writing clean code is what you must do in order to call yourself a professional. There is no reasonable excuse for doing anything less than your best." - Clean Code: A Handbook of Agile Software Craftsmanship "Stay hungry, stay foolish." - Steve Jobs - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Oozie for the Apache Incubator
+1 On Sat, Jun 25, 2011 at 2:15 AM, Phillip Rhodes wrote: > On Fri, Jun 24, 2011 at 3:46 PM, Mohammad Islam wrote: > >> Hi, >> >> I would like to propose Oozie to be an Apache Incubator project. >> Oozie is a server-based workflow scheduling and coordination system to >> manage >> data processing jobs for Apache Hadoop. >> >> >> +1 > -- Thanks - Mohammad Nour Author of (WebSphere Application Server Community Edition 2.0 User Guide) http://www.redbooks.ibm.com/abstracts/sg247585.html - LinkedIn: http://www.linkedin.com/in/mnour - Blog: http://tadabborat.blogspot.com "Life is like riding a bicycle. To keep your balance you must keep moving" - Albert Einstein "Writing clean code is what you must do in order to call yourself a professional. There is no reasonable excuse for doing anything less than your best." - Clean Code: A Handbook of Agile Software Craftsmanship "Stay hungry, stay foolish." - Steve Jobs - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Kafka for the Apache Incubator
+1 on the proposal, looking forward the [VOTE] thread to start. On Sat, Jun 25, 2011 at 3:01 AM, Joe Key wrote: > +1 > We will be using it heavily here at HomeHealthCareSOS.com to relay app > server logs to our DW and Hadoop cluster. > > -- > Joe Andrew Key (Andy) > -- Thanks - Mohammad Nour Author of (WebSphere Application Server Community Edition 2.0 User Guide) http://www.redbooks.ibm.com/abstracts/sg247585.html - LinkedIn: http://www.linkedin.com/in/mnour - Blog: http://tadabborat.blogspot.com "Life is like riding a bicycle. To keep your balance you must keep moving" - Albert Einstein "Writing clean code is what you must do in order to call yourself a professional. There is no reasonable excuse for doing anything less than your best." - Clean Code: A Handbook of Agile Software Craftsmanship "Stay hungry, stay foolish." - Steve Jobs - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Oozie for the Apache Incubator
+1 Thanks for the team. I look forward for this project. Thanks, Angelo > On Fri, Jun 24, 2011 at 3:46 PM, Mohammad Islam wrote: > >> Hi, >> >> I would like to propose Oozie to be an Apache Incubator project. >> Oozie is a server-based workflow scheduling and coordination system to >> manage >> data processing jobs for Apache Hadoop. >> >> >> +1 > - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: [PROPOSAL] Oozie for the Apache Incubator
Interesting Project. Time permitting, I would like to contribute to the workflow effort --Suresh On Jun 24, 2011, at 3:46 PM, Mohammad Islam wrote: > Hi, > > I would like to propose Oozie to be an Apache Incubator project. > Oozie is a server-based workflow scheduling and coordination system to manage > data processing jobs for Apache Hadoop. > > > Here's a link to the proposal in the Incubator wiki > http://wiki.apache.org/incubator/OozieProposal > > > I've also pasted the initial contents below. > > Regards, > > Mohammad Islam > > > Start of Oozie Proposal > > Abstract > Oozie is a server-based workflow scheduling and coordination system to manage > data processing jobs for Apache HadoopTM. > > Proposal > Oozie is an extensible, scalable and reliable system to define, manage, > schedule, and execute complex Hadoop workloads via web services. More > specifically, this includes: > > * XML-based declarative framework to specify a job or a complex > workflow of > dependent jobs. > > * Support different types of job such as Hadoop Map-Reduce, Pipe, > Streaming, > Pig, Hive and custom java applications. > > * Workflow scheduling based on frequency and/or data availability. > * Monitoring capability, automatic retry and failure handing of jobs. > * Extensible and pluggable architecture to allow arbitrary grid > programming > paradigms. > > * Authentication, authorization, and capacity-aware load throttling to > allow > multi-tenant software as a service. > > Background > Most data processing applications require multiple jobs to achieve their > goals, > with inherent dependencies among the jobs. A dependency could be sequential, > where one job can only start after another job has finished. Or it could be > conditional, where the execution of a job depends on the return value or > status > of another job. In other cases, parallel execution of multiple jobs may be > permitted – or desired – to exploit the massive pool of compute nodes > provided > by Hadoop. > > These job dependencies are often expressed as a Directed Acyclic Graph, also > > called a workflow. A node in the workflow is typically a job (a computation > on > the grid) or another type of action such as an eMail notification. > Computations > can be expressed in map/reduce, Pig, Hive or any other programming paradigm > available on the grid. Edges of the graph represent transitions from one > node > to the next, as the execution of a workflow proceeds. > > Describing a workflow in a declarative way has the advantage of decoupling > job > dependencies and execution control from application logic. Furthermore, the > workflow is modularized into jobs that can be reused within the same > workflow > or across different workflows. Execution of the workflow is then driven by a > runtime system without understanding the application logic of the jobs. This > runtime system specializes in reliable and predictable execution: It can > retry > actions that have failed or invoke a cleanup action after termination of the > workflow; it can monitor progress, success, or failure of a workflow, and > send > appropriate alerts to an administrator. The application developer is > relieved > from implementing these generic procedures. > > Furthermore, some applications or workflows need to run in periodic > intervals > or when dependent data is available. For example, a workflow could be > executed > every day as soon as output data from the previous 24 instances of another, > hourly workflow is available. The workflow coordinator provides such > scheduling > features, along with prioritization, load balancing and throttling to > optimize > utilization of resources in the cluster. This makes it easier to maintain, > control, and coordinate complex data applications. > > Nearly three years ago, a team of Yahoo! developers addressed these critical > > requirements for Hadoop-based data processing systems by developing a new > workflow management and scheduling system called Oozie. While it was > initially > developed as a Yahoo!-internal project, it was designed and implemented with > the intention of open-sourcing. Oozie was released as a GitHub project in > early > 2010. Oozie is used in production within Yahoo and since it has been > open-sourced it has been gaining adoption with external developers > > Rationale > Commonly, applications that run on Hadoop require multiple Hadoop jobs in > order > to obtain the desired results. Furthermore, these Hadoop jobs are commonly > a > combination of Java map-reduce jobs, Streaming map-reduce jobs, Pipes > map-reduce jobs, Pig jobs, Hive jobs, HDFS operations, Java programs and > shell > scripts. > > Because of this, developers find themselves writing ad-hoc glue programs to > combine these Hadoop jobs. These ad-hoc programs are di
Re: [PROPOSAL] Oozie for the Apache Incubator
+1.. Very interesting stuff.. thanks, Thilina On Sun, Jun 26, 2011 at 7:12 PM, Suresh Marru wrote: > Interesting Project. Time permitting, I would like to contribute to the > workflow effort > > --Suresh > > On Jun 24, 2011, at 3:46 PM, Mohammad Islam wrote: > > > Hi, > > > > I would like to propose Oozie to be an Apache Incubator project. > > Oozie is a server-based workflow scheduling and coordination system to > manage > > data processing jobs for Apache Hadoop. > > > > > > Here's a link to the proposal in the Incubator wiki > > http://wiki.apache.org/incubator/OozieProposal > > > > > > I've also pasted the initial contents below. > > > > Regards, > > > > Mohammad Islam > > > > > > Start of Oozie Proposal > > > > Abstract > > Oozie is a server-based workflow scheduling and coordination system to > manage > > data processing jobs for Apache HadoopTM. > > > > Proposal > > Oozie is an extensible, scalable and reliable system to define, manage, > > schedule, and execute complex Hadoop workloads via web services. More > > specifically, this includes: > > > > * XML-based declarative framework to specify a job or a complex > workflow of > > dependent jobs. > > > > * Support different types of job such as Hadoop Map-Reduce, Pipe, > Streaming, > > Pig, Hive and custom java applications. > > > > * Workflow scheduling based on frequency and/or data availability. > > * Monitoring capability, automatic retry and failure handing of > jobs. > > * Extensible and pluggable architecture to allow arbitrary grid > programming > > paradigms. > > > > * Authentication, authorization, and capacity-aware load throttling > to allow > > multi-tenant software as a service. > > > > Background > > Most data processing applications require multiple jobs to achieve their > goals, > > with inherent dependencies among the jobs. A dependency could be > sequential, > > where one job can only start after another job has finished. Or it could > be > > conditional, where the execution of a job depends on the return value or > status > > of another job. In other cases, parallel execution of multiple jobs may > be > > permitted – or desired – to exploit the massive pool of compute nodes > provided > > by Hadoop. > > > > These job dependencies are often expressed as a Directed Acyclic Graph, > also > > called a workflow. A node in the workflow is typically a job (a > computation on > > the grid) or another type of action such as an eMail notification. > Computations > > can be expressed in map/reduce, Pig, Hive or any other programming > paradigm > > available on the grid. Edges of the graph represent transitions from one > node > > to the next, as the execution of a workflow proceeds. > > > > Describing a workflow in a declarative way has the advantage of > decoupling job > > dependencies and execution control from application logic. Furthermore, > the > > workflow is modularized into jobs that can be reused within the same > workflow > > or across different workflows. Execution of the workflow is then driven > by a > > runtime system without understanding the application logic of the jobs. > This > > runtime system specializes in reliable and predictable execution: It can > retry > > actions that have failed or invoke a cleanup action after termination of > the > > workflow; it can monitor progress, success, or failure of a workflow, > and send > > appropriate alerts to an administrator. The application developer is > relieved > > from implementing these generic procedures. > > > > Furthermore, some applications or workflows need to run in periodic > intervals > > or when dependent data is available. For example, a workflow could be > executed > > every day as soon as output data from the previous 24 instances of > another, > > hourly workflow is available. The workflow coordinator provides such > scheduling > > features, along with prioritization, load balancing and throttling to > optimize > > utilization of resources in the cluster. This makes it easier to > maintain, > > control, and coordinate complex data applications. > > > > Nearly three years ago, a team of Yahoo! developers addressed these > critical > > requirements for Hadoop-based data processing systems by developing a > new > > workflow management and scheduling system called Oozie. While it was > initially > > developed as a Yahoo!-internal project, it was designed and implemented > with > > the intention of open-sourcing. Oozie was released as a GitHub project in > early > > 2010. Oozie is used in production within Yahoo and since it has been > > open-sourced it has been gaining adoption with external developers > > > > Rationale > > Commonly, applications that run on Hadoop require multiple Hadoop jobs > in order > > to obtain the desired results. Furthermore, these Hadoop jobs are > commonly a > > combination of Java map-reduce jobs, Streaming map-reduce jobs, Pipes > > map-reduce jobs, Pig jo
Re: [PROPOSAL] Oozie for the Apache Incubator
+1 Thanks a lot team, I look forward to contribute more to project. Thanks, Mayank On Sun, Jun 26, 2011 at 4:24 PM, Thilina Gunarathne wrote: > +1.. Very interesting stuff.. > > thanks, > Thilina > > On Sun, Jun 26, 2011 at 7:12 PM, Suresh Marru wrote: > > > Interesting Project. Time permitting, I would like to contribute to the > > workflow effort > > > > --Suresh > > > > On Jun 24, 2011, at 3:46 PM, Mohammad Islam wrote: > > > > > Hi, > > > > > > I would like to propose Oozie to be an Apache Incubator project. > > > Oozie is a server-based workflow scheduling and coordination system to > > manage > > > data processing jobs for Apache Hadoop. > > > > > > > > > Here's a link to the proposal in the Incubator wiki > > > http://wiki.apache.org/incubator/OozieProposal > > > > > > > > > I've also pasted the initial contents below. > > > > > > Regards, > > > > > > Mohammad Islam > > > > > > > > > Start of Oozie Proposal > > > > > > Abstract > > > Oozie is a server-based workflow scheduling and coordination system to > > manage > > > data processing jobs for Apache HadoopTM. > > > > > > Proposal > > > Oozie is an extensible, scalable and reliable system to define, > manage, > > > schedule, and execute complex Hadoop workloads via web services. More > > > specifically, this includes: > > > > > > * XML-based declarative framework to specify a job or a complex > > workflow of > > > dependent jobs. > > > > > > * Support different types of job such as Hadoop Map-Reduce, Pipe, > > Streaming, > > > Pig, Hive and custom java applications. > > > > > > * Workflow scheduling based on frequency and/or data > availability. > > > * Monitoring capability, automatic retry and failure handing of > > jobs. > > > * Extensible and pluggable architecture to allow arbitrary grid > > programming > > > paradigms. > > > > > > * Authentication, authorization, and capacity-aware load > throttling > > to allow > > > multi-tenant software as a service. > > > > > > Background > > > Most data processing applications require multiple jobs to achieve > their > > goals, > > > with inherent dependencies among the jobs. A dependency could be > > sequential, > > > where one job can only start after another job has finished. Or it > could > > be > > > conditional, where the execution of a job depends on the return value > or > > status > > > of another job. In other cases, parallel execution of multiple jobs > may > > be > > > permitted – or desired – to exploit the massive pool of compute nodes > > provided > > > by Hadoop. > > > > > > These job dependencies are often expressed as a Directed Acyclic > Graph, > > also > > > called a workflow. A node in the workflow is typically a job (a > > computation on > > > the grid) or another type of action such as an eMail notification. > > Computations > > > can be expressed in map/reduce, Pig, Hive or any other programming > > paradigm > > > available on the grid. Edges of the graph represent transitions from > one > > node > > > to the next, as the execution of a workflow proceeds. > > > > > > Describing a workflow in a declarative way has the advantage of > > decoupling job > > > dependencies and execution control from application logic. Furthermore, > > the > > > workflow is modularized into jobs that can be reused within the same > > workflow > > > or across different workflows. Execution of the workflow is then > driven > > by a > > > runtime system without understanding the application logic of the > jobs. > > This > > > runtime system specializes in reliable and predictable execution: It > can > > retry > > > actions that have failed or invoke a cleanup action after termination > of > > the > > > workflow; it can monitor progress, success, or failure of a workflow, > > and send > > > appropriate alerts to an administrator. The application developer is > > relieved > > > from implementing these generic procedures. > > > > > > Furthermore, some applications or workflows need to run in periodic > > intervals > > > or when dependent data is available. For example, a workflow could be > > executed > > > every day as soon as output data from the previous 24 instances of > > another, > > > hourly workflow is available. The workflow coordinator provides such > > scheduling > > > features, along with prioritization, load balancing and throttling to > > optimize > > > utilization of resources in the cluster. This makes it easier to > > maintain, > > > control, and coordinate complex data applications. > > > > > > Nearly three years ago, a team of Yahoo! developers addressed these > > critical > > > requirements for Hadoop-based data processing systems by developing a > > new > > > workflow management and scheduling system called Oozie. While it was > > initially > > > developed as a Yahoo!-internal project, it was designed and > implemented > > with > > > the intention of open-sourcing. Oozie was released as a GitHub project > in > >
KEYS and releases
Hi, I've recently noticed a podling (ok, it was gora ;-) including KEYS as part of the release package to be voted on. I've seen some projects that include KEYS as part of a release. I've seen other projects add KEYS to the top level of the inclubator/ directory and the same file is updated and used for each subsequent release. My preference would be to manage KEYS separately from releases. So any time a new prospective release manager is added to the project, the KEYS file would be updated with no vote, since it's not being released. Then, no KEYS file would need to be included in the release artifacts. And no vote need occur. Thoughts? Craig Craig L Russell Secretary, Apache Software Foundation Chair, OpenJPA PMC c...@apache.org http://db.apache.org/jdo - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
RE: [VOTE] Retire ALOIS podling
+1 --- Noel - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
RE: [VOTE] Retire Stonehenge
+1 --- Noel - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
RE: KEYS and releases
It seems to me to be a bad idea to distribute keys with releases. And don't we already have some ASF-wide policy for managing keys? --- Noel - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org
Re: KEYS and releases
> It seems to me to be a bad idea to distribute keys with releases. +1 > And don't > we already have some ASF-wide policy for managing keys? I don't know if there is a policy, but I recently found this: http://people.apache.org/foaf/index.html "PGP keys may additionally be added to your profile on https://id.apache.org/. This will cause them to be added to https://people.apache.org/keys/, and make them available to other infrastructure tools in the future." Then it would be available here: https://people.apache.org/keys/group/ Why shouldn't all podlings use this a central keys file? Cheers Christian - To unsubscribe, e-mail: general-unsubscr...@incubator.apache.org For additional commands, e-mail: general-h...@incubator.apache.org