Nice performance data! The federation branch definitely adds code complexity to HDFS, but this is a long waited feature to improve HDFS scalability and is a step forward to separating the namespace management from the storage management. I am for merging this to trunk.
Hairong On 4/27/11 10:02 AM, "suresh srinivas" <srini30...@gmail.com> wrote: >I posted the TestDFSIO comparison with and without federation to >HDFS-1052. >Please let me know if it addresses your concern. I am also adding it here: > >TestDFSIO read tests >*Without federation:* >----- TestDFSIO ----- : read > Date & time: Wed Apr 27 02:04:24 PDT 2011 > Number of files: 1000 >Total MBytes processed: 30000.0 > Throughput mb/sec: 43.62329251162561 >Average IO rate mb/sec: 44.619869232177734 > IO rate std deviation: 5.060306158158443 > Test exec time sec: 959.943 > >*With federation:* >----- TestDFSIO ----- : read > Date & time: Wed Apr 27 02:43:10 PDT 2011 > Number of files: 1000 >Total MBytes processed: 30000.0 > Throughput mb/sec: 45.657513857055456 >Average IO rate mb/sec: 46.72107696533203 > IO rate std deviation: 5.455125923399539 > Test exec time sec: 924.922 > >TestDFSIO write tests >*Without federation:* >----- TestDFSIO ----- : write > Date & time: Wed Apr 27 01:47:50 PDT 2011 > Number of files: 1000 >Total MBytes processed: 30000.0 > Throughput mb/sec: 35.940755259031015 >Average IO rate mb/sec: 38.236236572265625 > IO rate std deviation: 5.929484960036511 > Test exec time sec: 1266.624 > >*With federation:* >----- TestDFSIO ----- : write > Date & time: Wed Apr 27 02:27:12 PDT 2011 > Number of files: 1000 >Total MBytes processed: 30000.0 > Throughput mb/sec: 42.17884674597227 >Average IO rate mb/sec: 43.11423873901367 > IO rate std deviation: 5.357057259968647 > Test exec time sec: 1135.298 >{noformat} > > >On Tue, Apr 26, 2011 at 11:55 PM, suresh srinivas ><srini30...@gmail.com>wrote: > >> Konstantin, >> >> Could you provide me link to how this was done on a big feature, like >>say >> append and how benchmark info was captured? I am planning to run dfsio >> tests, btw. >> >> Regards, >> Suresh >> >> >> On Tue, Apr 26, 2011 at 11:34 PM, suresh srinivas >><srini30...@gmail.com>wrote: >> >>> Konstantin, >>> >>> On Tue, Apr 26, 2011 at 10:26 PM, Konstantin Shvachko < >>> shv.had...@gmail.com> wrote: >>> >>>> Suresh, Sanjay. >>>> >>>> 1. I asked for benchmarks many times over the course of different >>>> discussions on the topic. >>>> I don't see any numbers attached to jira, and I was getting the same >>>> response, >>>> Doug just got from you, guys: which is "why would the performance be >>>> worse". >>>> And this is not an argument for me. >>>> >>> >>> We had done testing earlier and had found that performance had not >>> degraded. We are waiting for out performance team to publish the >>>official >>> numbers to post it to the jira. Unfortunately they are busy qualifying >>>2xx >>> releases currently. I will get the perf numbers and post them. >>> >>> >>>> >>>> 2. I assume that merging requires a vote. I am sure people who know >>>> bylaws >>>> better than I do will correct me if it is not true. >>>> Did I miss the vote? >>>> >>> >>> >>> As regards to voting, since I was not sure about the procedure, I had >>> consulted Owen about it. He had indicated that voting is not >>>necessary. If >>> the right procedure is to call for voting, I will do so. Owen any >>>comments? >>> >>> >>>> >>>> It feels like you are rushing this and are not doing what you would >>>> expect >>>> others to >>>> do in the same position, and what has been done in the past for such >>>> large >>>> projects. >>>> >>> >>> I am not trying to rush here and not follow the procedure required. I >>>am >>> not sure about what the procedure is. Any pointers to it is >>>appreciated. >>> >>> >>>> >>>> Thanks, >>>> --Konstantin >>>> >>>> >>>> On Tue, Apr 26, 2011 at 9:43 PM, Doug Cutting <cutt...@apache.org> >>>> wrote: >>>> >>>> > Suresh, Sanjay, >>>> > >>>> > Thank you very much for addressing my questions. >>>> > >>>> > Cheers, >>>> > >>>> > Doug >>>> > >>>> > On 04/26/2011 10:29 AM, suresh srinivas wrote: >>>> > > Doug, >>>> > > >>>> > > >>>> > >> 1. Can you please describe the significant advantages this >>>>approach >>>> has >>>> > >> over a symlink-based approach? >>>> > > >>>> > > Federation is complementary with symlink approach. You could >>>>choose >>>> to >>>> > > provide integrated namespace using symlinks. However, client side >>>> mount >>>> > > tables seems a better approach for many reasons: >>>> > > # Unlike symbolic links, client side mount tables can choose to >>>>go to >>>> > right >>>> > > namenode based on configuration. This avoids unnecessary RPCs to >>>>the >>>> > > namenodes to discover the targer of symlink. >>>> > > # The unavailability of a namenode where a symbolic link is >>>> configured >>>> > does >>>> > > not affect reaching the symlink target. >>>> > > # Symbolic links need not be configured on every namenode in the >>>> cluster >>>> > and >>>> > > future changes to symlinks need not be propagated to multiple >>>> namenodes. >>>> > In >>>> > > client side mount tables, this information is in a central >>>> configuration. >>>> > > >>>> > > If a deployment still wants to use symbolic link, federation does >>>>not >>>> > > preclude it. >>>> > > >>>> > >> It seems to me that one could run multiple namenodes on separate >>>> boxes >>>> > > and run multile datanode processes per storage box >>>> > > >>>> > > There are several advantages to using a single datanode: >>>> > > # When you have large number of namenodes (say 20), the cost of >>>> running >>>> > > separate datanodes in terms of process resources such as memory is >>>> huge. >>>> > > # The disk i/o management and storage utilization using a single >>>> datanode >>>> > is >>>> > > much better, as it has complete view the storage. >>>> > > # In the approach you are proposing, you have several clusters to >>>> manage. >>>> > > However with federation, all datanodes are in a single cluster; >>>>with >>>> > single >>>> > > configuration and operationally easier to manage. >>>> > > >>>> > >> The patch modifies much of the logic of Hadoop's central >>>>component, >>>> upon >>>> > > which the performance and reliability of most other components of >>>>the >>>> > > ecosystem depend. >>>> > > That is not true. >>>> > > >>>> > > # Namenode is mostly unchanged in this feature. >>>> > > # Read/write pipelines are unchanged. >>>> > > # The changes are mainly in datanode: >>>> > > #* the storage, FSDataset, Directory and Disk scanners now have >>>> another >>>> > > level to incorporate block pool ID into the hierarchy. This is >>>>not a >>>> > > significant change that should cause performance or stability >>>> concerns. >>>> > > #* datanodes use a separate thread per NN, just like the existing >>>> thread >>>> > > that communicates with NN. >>>> > > >>>> > >> Can you please tell me how this has been tested beyond unit >>>>tests? >>>> > > As regards to testing, we have passed 600+ tests. In hadoop, these >>>> tests >>>> > > are mostly integration tests and not pure unit tests. >>>> > > >>>> > > While these tests have been extensive, we have also been testing >>>>this >>>> > branch >>>> > > for last 4 months, with QA validation that reflects our production >>>> > > environment. We have found the system to be stable, performing >>>>well >>>> and >>>> > have >>>> > > not found any blockers with the branch so far. >>>> > > >>>> > > HDFS-1052 has been open more than a year now. I had also sent an >>>> email >>>> > about >>>> > > this merge around 2 months ago. There are 90 subtasks that have >>>>been >>>> > worked >>>> > > on last couple of months under HDFS-1052. Given that there was >>>>enough >>>> > time >>>> > > to ask these questions, your email a day before I am planning to >>>> merge >>>> > the >>>> > > branch into trunk seems late! >>>> > > >>>> > >>>> >>> >>> >>> >>> -- >>> Regards, >>> Suresh >>> >>> >> >> >> -- >> Regards, >> Suresh >> >> > > >-- >Regards, >Suresh