Why mergeParts() is not parallel with collect() on map?

2011-05-02 Thread elton sky
In shuffle phase, reduce copies output from map. In parallel, there are InMemoryMerger and OnDiskMerger merge copied files if too many. But on map, the mergeParts*() *happens only after collect() finished. Why don't we parallel spills merging with collect()/sort&spill on map? -Elton

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Konstantin Boudnik
On Mon, May 2, 2011 at 22:18, Arun C Murthy wrote: > > On May 2, 2011, at 9:43 PM, Konstantin Boudnik wrote: >> >> I have looked somewhat more into these two JIRAs and if I remember >> correctly >> this fix causes a rolling port side effect in TT and it has been reverted >> in >> 0.20.200 (Y! Fred

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Arun C Murthy
On May 2, 2011, at 9:43 PM, Konstantin Boudnik wrote: I have looked somewhat more into these two JIRAs and if I remember correctly this fix causes a rolling port side effect in TT and it has been reverted in 0.20.200 (Y! Fred? release) because Ops weren't happy about this (I am sure you ca

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Konstantin Boudnik
On Mon, May 2, 2011 at 16:56, Arun C Murthy wrote: > > On May 2, 2011, at 3:01 PM, Arun C Murthy wrote: > > >> On May 2, 2011, at 12:31 PM, Tom White wrote: >> >>> I just did a quick search, and these are the JIRAs that are in 0.20.2 >>> but appear not to be in 0.20.203.0. >>> >> >> Thanks Tom. >

Build failed in Jenkins: Hadoop-0.20.203-Build #4

2011-05-02 Thread Apache Jenkins Server
See -- [...truncated 4511 lines...] [exec] If you ever happen to want to link against installed libraries [exec] in a given directory, LIBDIR, you must either use libtool, and [ex

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Nigel Daley
On May 2, 2011, at 9:07 AM, Owen O'Malley wrote: > > On May 1, 2011, at 8:52 PM, Nigel Daley wrote: > >> I would like to see CI setup on this branch before we release anything from >> it. I've copied the 0.20 build config and tried running it on this branch, >> but getting a native compile f

[jira] [Created] (HADOOP-7255) Performance regression bug caused by locking code

2011-05-02 Thread T Jake Luciani (JIRA)
Performance regression bug caused by locking code - Key: HADOOP-7255 URL: https://issues.apache.org/jira/browse/HADOOP-7255 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.20.203

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Ian Holsman
On May 3, 2011, at 9:58 AM, Arun C Murthy wrote: >> >> Owen, Suresh and I have committed everything on this list except >> HADOOP-6386 and HADOOP-6428. Not sure which of the two are relevant/ >> necessary, I'll check with Cos. Other than that hadoop-0.20.203 now a >> superset of hadoop-0.20.2.

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Arun C Murthy
On May 2, 2011, at 4:56 PM, Arun C Murthy wrote: On May 2, 2011, at 3:01 PM, Arun C Murthy wrote: On May 2, 2011, at 12:31 PM, Tom White wrote: I just did a quick search, and these are the JIRAs that are in 0.20.2 but appear not to be in 0.20.203.0. Thanks Tom. I did a quick analysis:

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Arun C Murthy
On May 2, 2011, at 3:01 PM, Arun C Murthy wrote: On May 2, 2011, at 12:31 PM, Tom White wrote: I just did a quick search, and these are the JIRAs that are in 0.20.2 but appear not to be in 0.20.203.0. Thanks Tom. I did a quick analysis: # Remaining for 0.20.203 * HADOOP-5611 * HADOOP-56

[jira] [Reopened] (HADOOP-7227) Remove protocol version check at proxy creation in Hadoop RPC.

2011-05-02 Thread Todd Lipcon (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-7227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon reopened HADOOP-7227: - > Remove protocol version check at proxy creation in Hadoop RPC. > -

[jira] [Resolved] (HADOOP-7170) Support UGI in FileContext API

2011-05-02 Thread Jitendra Nath Pandey (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-7170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jitendra Nath Pandey resolved HADOOP-7170. -- Resolution: Duplicate This is duplicate of HADOOP-7171. > Support UGI in File

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Arun C Murthy
On May 2, 2011, at 12:31 PM, Tom White wrote: I just did a quick search, and these are the JIRAs that are in 0.20.2 but appear not to be in 0.20.203.0. Thanks Tom. I did a quick analysis: # Remaining for 0.20.203 * HADOOP-5611 * HADOOP-5612 * HADOOP-5623 * HDFS-596 * HDFS-723 * HDFS-73

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Eli Collins
Hey Eric, I don't have any objections to a release from branch-0.20-security-203. However when I examined the specific patch set I noticed the are important implications with respect to compatibility (of for 0.20.2 and 0.22), a question about project model (eg not reviewing patches on jira before

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Stack
How hard would it be to get the patches Tom lists below into branch-0.20-security-203? I'd think it'd be an easier sell if it were a superset of all in 0.20, especially since it bears its name. Otherwise, glad to see the release candidate. St.Ack

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Arun C Murthy
Doug, On May 2, 2011, at 10:58 AM, Doug Cutting wrote: The patch selection process for this branch did not appear to be a community process. A massive patch set was committed en-masse with no public discussion before or after about its specific composition. Lets review: # You proposed to rel

Re: Discussions - Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Eric Baldeschwieler
Hi folks, This strikes me as a bit odd. I think we have already discussed this at length and agreed that a release could proceed. Since then, Arun and Owen have worked actively to incorporated community feedback into this release. All parties making Hadoop releases other then Apache have al

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Tom White
On Mon, May 2, 2011 at 12:16 PM, Eli Collins wrote: > On Fri, Apr 29, 2011 at 4:09 PM, Owen O'Malley wrote: >> I think everything is ready to go on the 0.20.203.0 release. It includes >> security and a lot of improvements in the capacity scheduler and JobTracker. >> >> Should we release http://p

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Eli Collins
On Fri, Apr 29, 2011 at 4:09 PM, Owen O'Malley wrote: > I think everything is ready to go on the 0.20.203.0 release. It includes > security and a lot of improvements in the capacity scheduler and JobTracker. > > Should we release http://people.apache.org/~omalley/hadoop-0.20.203.0-rc0/? > Based

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Doug Cutting
On 05/02/2011 11:37 AM, Alan Gates wrote: > From the viewpoint of a downstream user, I'd like to see this released. > Right now Hive 0.7 and soon HCatalog 0.1 have to depend on a Cloudera > distribution because they need security. Having Apache products depend > on 3rd party distributions of Apac

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Alan Gates
From the viewpoint of a downstream user, I'd like to see this released. Right now Hive 0.7 and soon HCatalog 0.1 have to depend on a Cloudera distribution because they need security. Having Apache products depend on 3rd party distributions of Apache products is bogus. The sooner this is

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Doug Cutting
On 04/29/2011 04:09 PM, Owen O'Malley wrote: > I think everything is ready to go on the 0.20.203.0 release. It > includes security and a lot of improvements in the capacity scheduler > and JobTracker. This does not appear to include the 0.20-append work? So it's not advisable to use HBase with th

Re: [VOTE] Release candidate 0.20.203.0-rc0

2011-05-02 Thread Owen O'Malley
On May 1, 2011, at 8:52 PM, Nigel Daley wrote: > I would like to see CI setup on this branch before we release anything from > it. I've copied the 0.20 build config and tried running it on this branch, > but getting a native compile failure: > https://builds.apache.org/hudson/view/G-L/view/Ha