Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-05-02 Thread suresh srinivas
We have been testing federation regularly with MapReduce with yahoo-merge branches. With trunk we missed the contrib (raid). The dependency with project splits has been crazy. Not sure how large changes can keep on top of all these things. I am working on fixing the raid contrib. On Mon, May 2, 2

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-05-02 Thread Todd Lipcon
Apparently this merge wasn't tested against MapReduce trunk at all -- MR trunk has been failing to compile for several days. Please see MAPREDUCE-2465. I attempted to fix it myself but don't have enough background in the new federation code or in RAID. -Todd On Thu, Apr 28, 2011 at 11:30 PM, Kons

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-28 Thread Konstantin Shvachko
Thanks for clarifying, Owen. Should we have the bylaws somewhere on wiki? --Konstantin On Thu, Apr 28, 2011 at 1:33 PM, Owen O'Malley wrote: > On Apr 27, 2011, at 10:12 PM, Konstantin Shvachko wrote: > > > The question is whether this is a > > * Code Change, > > which requires Lazy consensus of

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-28 Thread suresh srinivas
Owen, thanks for clarification. I have attached the patch to the jira HDFS-1052. Please use the jira to cast your vote or post objections. If you have objections please be specific on how I can address it and move forward with this issue. Regards, Suresh On Thu, Apr 28, 2011 at 1:33 PM, Owen O'M

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-28 Thread Owen O'Malley
On Apr 27, 2011, at 10:12 PM, Konstantin Shvachko wrote: > The question is whether this is a > * Code Change, > which requires Lazy consensus of active committers or a > * Adoption of New Codebase, > which needs Lazy 2/3 majority of PMC members This is a code change, just like all of our jiras. T

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-28 Thread suresh srinivas
As Eli suggested, I have uploaded a new patch to the jira. Merging new trunk changes and testing them took several hours! It passes all the tests except two unit test failure. These failures do not happen on my machine - if this is a real failure we will address them after merging the patch to the

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-28 Thread Konstantin Boudnik
+1. Having an open QE process would be a tremendous value-add to the overall quality of the feature. Append was an exemplary development in this sense. Would it be possible to have Federation test plan (if exists) to be published along with the specs on the JIRA (similar to HDFS-265) at least for t

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread Konstantin Shvachko
Suresh, Showing no degradation in performance on one-node cluster is a good start for benchmarking. You still have a dev cluster to run benchmarks, don't you? --Konstantin On Wed, Apr 27, 2011 at 2:36 PM, suresh srinivas wrote: > I ran these tests on my laptop. I would like to use this data to em

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread Konstantin Shvachko
Owen, The question is whether this is a * Code Change, which requires Lazy consensus of active committers or a * Adoption of New Codebase, which needs Lazy 2/3 majority of PMC members Lazy consensus requires 3 binding +1 votes and no binding vetoes. If I am looking at the current bylaws, then it

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread Konstantin Shvachko
Yes, I can talk about append as an example. Some differences with federation project are: - append had a comprehensive test plan document, which was designed an executed; - append was independently evaluated by HBase guys; - it introduced new benchmark for append; - We ran both DFSIO and NNThroughp

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread suresh srinivas
Thanks Eli. The merge of latest changes in trunk is not straight forward. I will get it done tonight and post a new patch. That means the earlier the merge can happen is tomorrow. On Wed, Apr 27, 2011 at 2:36 PM, Eli Collins wrote: > Hey Suresh, > > Do you plan to update the patch on HDFS-1052

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread suresh srinivas
If there are no further issues by tonight, I will merge the branch into trunk. Regards, Suresh On Wed, Apr 27, 2011 at 1:53 PM, Owen O'Malley wrote: > On Apr 26, 2011, at 11:34 PM, suresh srinivas wrote: > > >> 2. I assume that merging requires a vote. I am sure people who know > bylaws > >> be

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread Eli Collins
Hey Suresh, Do you plan to update the patch on HDFS-1052 soon? Trunk has moved on a little bit since the last patch. I assume we vote on the patch there. I think additional review feedback (beyond what's already been done) can be handled after the code is merged, I know what a pain it is to keep

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread suresh srinivas
I ran these tests on my laptop. I would like to use this data to emphasize that there is no regression in performance. I am not sure with just the tests that I ran we could conclude there is a huge gain in performance with federation. When out performance test team runs tests at scale we will get m

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread Owen O'Malley
On Apr 26, 2011, at 11:34 PM, suresh srinivas wrote: >> 2. I assume that merging requires a vote. I am sure people who know bylaws >> better than I do will correct me if it is not true. >> Did I miss the vote? >> > > > As regards to voting, since I was not sure about the procedure, I had > cons

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread Hairong
Nice performance data! The federation branch definitely adds code complexity to HDFS, but this is a long waited feature to improve HDFS scalability and is a step forward to separating the namespace management from the storage management. I am for merging this to trunk. Hairong On 4/27/11 10:02 AM

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread Konstantin Boudnik
Interesting... while the read performance has only marginally improved <4% (still a good thing) the write performance shows significantly better improvements >10%. Very interesting asymmetry, indeed. Suresh, what was the size of the cluster in the testing? Cos On Wed, Apr 27, 2011 at 10:02, sur

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread Devaraj Das
Good to see the performance improvements with federation. Curious to know whether it is because of the associated refactoring? On 4/27/11 10:02 AM, "suresh srinivas" wrote: I posted the TestDFSIO comparison with and without federation to HDFS-1052. Please let me know if it addresses your conce

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread Tsz Wo (Nicholas), Sze
@hadoop.apache.org Sent: Wed, April 27, 2011 10:02:32 AM Subject: Re: [Discuss] Merge federation branch HDFS-1052 into trunk I posted the TestDFSIO comparison with and without federation to HDFS-1052. Please let me know if it addresses your concern. I am also adding it here: TestDFSIO read tests *Without

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread suresh srinivas
I posted the TestDFSIO comparison with and without federation to HDFS-1052. Please let me know if it addresses your concern. I am also adding it here: TestDFSIO read tests *Without federation:* - TestDFSIO - : read Date & time: Wed Apr 27 02:04:24 PDT 2011 Number of files

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-27 Thread Sanjay Radia
On Apr 26, 2011, at 10:40 PM, Konstantin Boudnik wrote: Oops, the message came out garbled. I meant to say I assume the outlined changes won't prevent an earlier version of HDFS from upgrades to the federation version, right? Yes absolutely. We have tested upgrades . Besides our ops will

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread suresh srinivas
Konstantin, Could you provide me link to how this was done on a big feature, like say append and how benchmark info was captured? I am planning to run dfsio tests, btw. Regards, Suresh On Tue, Apr 26, 2011 at 11:34 PM, suresh srinivas wrote: > Konstantin, > > On Tue, Apr 26, 2011 at 10:26 PM, K

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread suresh srinivas
Konstantin, On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko wrote: > Suresh, Sanjay. > > 1. I asked for benchmarks many times over the course of different > discussions on the topic. > I don't see any numbers attached to jira, and I was getting the same > response, > Doug just got from you,

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread suresh srinivas
Upgrades from earlier version is supported. The existing configuration should run without any change. On Tue, Apr 26, 2011 at 10:40 PM, Konstantin Boudnik wrote: > Oops, the message came out garbled. I meant to say > > I assume the outlined changes won't prevent an earlier version of HDFS from >

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread Konstantin Boudnik
Oops, the message came out garbled. I meant to say I assume the outlined changes won't prevent an earlier version of HDFS from upgrades to the federation version, right? Thanks in advance, Cos On Tue, Apr 26, 2011 at 17:59, Konstantin Boudnik wrote: > Sanjay, > > I assume the outlined changes

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread Konstantin Shvachko
Dhruba, It would be very valuable for the community to share your experience if you performed any independent testing of the federation branch. Thanks, --Konstantin On Tue, Apr 26, 2011 at 9:27 PM, Dhruba Borthakur wrote: > I feel that making the datanode talk to multiple namenodes is very > v

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread Konstantin Shvachko
Suresh, Sanjay. 1. I asked for benchmarks many times over the course of different discussions on the topic. I don't see any numbers attached to jira, and I was getting the same response, Doug just got from you, guys: which is "why would the performance be worse". And this is not an argument for me

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread Tsz Wo (Nicholas), Sze
Agree. It is a step forward to distributed namespace. Regards, Nicholas From: Dhruba Borthakur To: hdfs-dev@hadoop.apache.org Cc: sra...@yahoo-inc.com; Doug Cutting Sent: Wed, April 27, 2011 12:27:30 AM Subject: Re: [Discuss] Merge federation branch HDFS

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread Doug Cutting
Suresh, Sanjay, Thank you very much for addressing my questions. Cheers, Doug On 04/26/2011 10:29 AM, suresh srinivas wrote: > Doug, > > >> 1. Can you please describe the significant advantages this approach has >> over a symlink-based approach? > > Federation is complementary with symlink a

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread Dhruba Borthakur
I feel that making the datanode talk to multiple namenodes is very valuable, especially when there is plenty of storage available on a single datanode machine (think 24 TB to 36 TB) and a single namenode does not have enough memory to hold all file metadata for such a large cluster in memory. This

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread Konstantin Boudnik
Sanjay, I assume the outlined changes won't an earlier version of HDFS from upgrads to the federation version, right? Cos On Tue, Apr 26, 2011 at 17:26, Sanjay Radia wrote: > > Changes to the code base >  - The fundamental code change is to extend the notion of block id to now > include a block

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread Sanjay Radia
On Apr 25, 2011, at 2:36 PM, Doug Cutting wrote: A couple of questions: 1. Can you please describe the significant advantages this approach has over a symlink-based approach? It seems to me that one could run multiple namenodes on separate boxes and run multile datanode processes per storage

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread suresh srinivas
Doug, please reply back. I am planning to commit this by tonight, as I would like to avoid unnecessary merge work and also avoid having to redo the merge if SVN is re-organized. On Tue, Apr 26, 2011 at 10:29 AM, suresh srinivas wrote: > Doug, > > >> 1. Can you please describe the significant adva

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-26 Thread suresh srinivas
Doug, > 1. Can you please describe the significant advantages this approach has > over a symlink-based approach? Federation is complementary with symlink approach. You could choose to provide integrated namespace using symlinks. However, client side mount tables seems a better approach for many

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-25 Thread Doug Cutting
On 04/22/2011 09:48 AM, Suresh Srinivas wrote: > A few weeks ago, I had sent an email about the progress of HDFS > federation development in HDFS-1052 branch. I am happy to announce > that all the tasks related to this feature development is complete > and it is ready to be integrated into trunk.

Re: [Discuss] Merge federation branch HDFS-1052 into trunk

2011-04-23 Thread Dhruba Borthakur
Given that we will be re-organizing the svn tree very soon and the fact that the design and most of the implementation is complete, let's merge it into trunk! -dhruba On Fri, Apr 22, 2011 at 9:48 AM, Suresh Srinivas wrote: > A few weeks ago, I had sent an email about the progress of HDFS federat