[jira] [Created] (HADOOP-7259) contrib modules should include build.properties from parent.

2011-05-03 Thread Owen O'Malley (JIRA)
contrib modules should include build.properties from parent. Key: HADOOP-7259 URL: https://issues.apache.org/jira/browse/HADOOP-7259 Project: Hadoop Common Issue Type: Bug

[jira] [Resolved] (HADOOP-4858) to add appropriate reference to the dependent library files in the chukwa/build.xml file

2011-05-03 Thread Owen O'Malley (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-4858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Owen O'Malley resolved HADOOP-4858. --- Resolution: Not A Problem Chukwa moved out long ago. > to add appropriate reference to the

RE: Build failed in Jenkins: Hadoop-0.20.203-Build #5

2011-05-03 Thread Rottinghuis, Joep
This is exactly what my e-mail (Subject: RE: [VOTE] Release candidate 0.20.203.0-rc0 Sent: Tuesday, May 03, 2011 7:14 PM) was about. To fix apply "alex-HADOOP-3744.patch" as attached in MAPREDUCE-1280. Cheers, Joep From: Apache Jenkins Server [hud...@hud

Build failed in Jenkins: Hadoop-0.20.203-Build #5

2011-05-03 Thread Apache Jenkins Server
See Changes: [omalley] HADOOP-7258. The Gzip codec should not return null decompressors. (omalley) [acmurthy] HADOOP-5759. Fix for IllegalArgumentException when CombineFileInputFormat is used as job InputFormat. Contribute

RE: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Rottinghuis, Joep
Yes, the Eclipse contrib is skipped unless eclipse.home is set. See: src/contrib/eclipse-plugin/build.xml lines 47-50 When this happens you should be able to see the string "skipping eclipse plugin" in the console output. However, turning on Eclipse build without any changes w

[jira] [Resolved] (HADOOP-7179) Improve HDFS startup scripts

2011-05-03 Thread Suresh Srinivas (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-7179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas resolved HADOOP-7179. - Resolution: Fixed Hadoop Flags: [Reviewed] I committed the patch. > Improve HDFS

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Nigel Daley
On May 3, 2011, at 1:48 PM, Owen O'Malley wrote: > > On May 3, 2011, at 1:33 PM, Nigel Daley wrote: > >> Owen, any reason you're not building the eclipse plugin for this release? >> Instructions are here: http://wiki.apache.org/hadoop/HowToRelease > > Of course, I know (and have updated) the

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Owen O'Malley
On May 3, 2011, at 1:33 PM, Nigel Daley wrote: > Owen, any reason you're not building the eclipse plugin for this release? > Instructions are here: http://wiki.apache.org/hadoop/HowToRelease Of course, I know (and have updated) the HowToRelease page. It looks like the eclipse-plugin was drop

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Nigel Daley
Owen, any reason you're not building the eclipse plugin for this release? Instructions are here: http://wiki.apache.org/hadoop/HowToRelease n. On Apr 29, 2011, at 4:09 PM, Owen O'Malley wrote: > I think everything is ready to go on the 0.20.203.0 release. It includes > security and a lot of

Re: Questions wrt security branches

2011-05-03 Thread Owen O'Malley
On May 3, 2011, at 9:35 AM, Eli Collins wrote: > Do all changes for 0.20.2xx release go through branch-0.20-security, > then get merged to a particular -2xx branch? I've discussed this before on the lists, but here goes: branch-0.20-security is the major branch and all changes need to be commi

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Konstantin Boudnik
I also have built instrumented cluster and at least no bindings required for system testing are broken. --   Take care, Konstantin (Cos) Boudnik On Tue, May 3, 2011 at 12:51, Jakob Homan wrote: > Tested the RC on a single node cluster, kicked the tires.  Looks good. >  +1 on its release. > > Rega

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Jakob Homan
Tested the RC on a single node cluster, kicked the tires. Looks good. +1 on its release. Regardless of how the RC got here, we only get benefit from releasing it. It represents a huge chunk of work from our contributors, provides needed features for our users and moves us one step closer to mak

[jira] [Created] (HADOOP-7258) Gzip codec should not return null decompressors

2011-05-03 Thread Owen O'Malley (JIRA)
Gzip codec should not return null decompressors --- Key: HADOOP-7258 URL: https://issues.apache.org/jira/browse/HADOOP-7258 Project: Hadoop Common Issue Type: Bug Reporter: Owen O'Malle

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Konstantin Boudnik
Yup, exactly right - it has been reverted in the trunk as well. Thanks for digging this up, Koji! On Tue, May 3, 2011 at 11:22, Koji Noguchi wrote: >>> except >>> HADOOP-6386 and HADOOP-6428. >> causes a rolling port side effect in TT >> > I remember bugging Cos and Rob to revert HADOOP-6386. > h

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Koji Noguchi
>> except >> HADOOP-6386 and HADOOP-6428. > causes a rolling port side effect in TT > I remember bugging Cos and Rob to revert HADOOP-6386. https://issues.apache.org/jira/browse/HADOOP-6760?focusedCommentId=12867342&; page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#commen t-12

Re: Questions wrt security branches

2011-05-03 Thread Eli Collins
People requested I move the query to general, will do that. On Tue, May 3, 2011 at 9:35 AM, Eli Collins wrote: > He guys, > > Do all changes for 0.20.2xx release go through branch-0.20-security, > then get merged to a particular -2xx branch? > > Why create a new branch for every new dot release?

Questions wrt security branches

2011-05-03 Thread Eli Collins
He guys, Do all changes for 0.20.2xx release go through branch-0.20-security, then get merged to a particular -2xx branch? Why create a new branch for every new dot release? Ie if the intent that the branch will be dead after release, why not release from a single branch? It's hard to see that e

Re: Why mergeParts() is not parallel with collect() on map?

2011-05-03 Thread Owen O'Malley
On Tue, May 3, 2011 at 1:48 AM, elton sky wrote: > Pls correct me if I am wrong. One of the important assumptions of hadoop > map > reduce is: map's output should be smaller than input. No, that isn't a valid assumption. MapReduce workloads can roughly be divided into three categories: 1. scans

[jira] [Created] (HADOOP-7256) Resource leak during failure scenario of closing of resources.

2011-05-03 Thread ramkrishna.s.vasudevan (JIRA)
Resource leak during failure scenario of closing of resources. --- Key: HADOOP-7256 URL: https://issues.apache.org/jira/browse/HADOOP-7256 Project: Hadoop Common Issue Type: Bug

Re: Why mergeParts() is not parallel with collect() on map?

2011-05-03 Thread 博亮
Elton, I think the procedure of each map task including spill, sort and partition can be processed in memory. Thus the benefit of parallel is not obvious. On the other hand, reduce should receive map output from several map tasks across the cluster. Boliang On Tue, May 3, 2011 at 8:46 PM, elt

Re: Why mergeParts() is not parallel with collect() on map?

2011-05-03 Thread elton sky
Dave, you are right, collect() will be called whenever a [K,V] will be inserted into kvbuffer. Here, I mean when all [K,V] are created and the last collect() finishes :). But I think if map phase created bigger amount of output than input, we need some different procedure. On Tue, May 3, 2011 at

RE: Why mergeParts() is not parallel with collect() on map?

2011-05-03 Thread Dave Shine
I'm a relative newbie to Hadoop, but your assumption below is not correct in my organization. It is common for us to call output.collect() more than once in a map() function. Dave Shine -Original Message- From: elton sky [mailto:eltonsky9...@gmail.com] Sent: Tuesday, May 03, 2011 4:49

Re: Why mergeParts() is not parallel with collect() on map?

2011-05-03 Thread elton sky
Pls correct me if I am wrong. One of the important assumptions of hadoop map reduce is: map's output should be smaller than input. So the workload on reduce should be smaller than map phase. That's why we put sort, spill and merge all on map side. Reduce just merge sorted output. > However, typic

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-03 Thread Konstantin Shvachko
I think its a good idea to release hadoop-0.20.203. It moves Apache Hadoop a step forward. Looks like the technical difficulties are resolved now with latest Arun's commits. Being a superset of hadoop-0.20.2 it can be considered based on one of the official Apache releases. I don't think there was

Re: Mapreduce program reports child error

2011-05-03 Thread Arun C Murthy
Moving to mapreduce-dev@, please use the right list for questions. On May 3, 2011, at 12:06 AM, Sudharsan Sampath wrote: Hi, Could anyone point me to a summary on why this error would occur? java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:471) Caus

Re: Why mergeParts() is not parallel with collect() on map?

2011-05-03 Thread Arun C Murthy
Elton, On May 2, 2011, at 11:30 PM, elton sky wrote: In shuffle phase, reduce copies output from map. In parallel, there are InMemoryMerger and OnDiskMerger merge copied files if too many. But on map, the mergeParts*() *happens only after collect() finished. Why don't we parallel spills mer

Mapreduce program reports child error

2011-05-03 Thread Sudharsan Sampath
Hi, Could anyone point me to a summary on why this error would occur? java.lang.Throwable: Child Error at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:471) Caused by: java.io.IOException: Task process exit with nonzero status of 1. at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner