Ack! The mailing list seems to strip attachments. The throughput/latency 
diagram is available on HDFS-234

https://issues.apache.org/jira/secure/attachment/12503763/hdfs_tpt_lat.pdf

-Ivan

On 15 Nov 2011, at 17:29, Ivan Kelly wrote:

> Hi guys,
> 
> I've just uploaded a patch to HDFS-234 which contains an implementation of 
> JournalManager for BookKeeper. The code is ready for review, though I plan to 
> add some more tests. The code relies on HDFS-1580 which isn't in trunk yet. 
> The code is on github if you want to avoid faffing about with multiple 
> patches. (https://github.com/ivankelly/hadoop-common/tree/HDFS-234)
> 
> To configure the namenode to use BK with this code, put the following in 
> hdfs-site.xml
> 
> <property>
>   <name>dfs.namenode.edits.dir</name>
>   <value>bookkeeper://[zkEnsemble]/[zkPath]</value>
> </property>
>  
> <property>
>   <name>dfs.namenode.edits.journalPlugin.bookkeeper</name>
>   
> <value>org.apache.hadoop.hdfs.server.namenode.bkjournal.BookKeeperJournalManager</value>
> </property>
> 
> Where zkEnsemble is a semicolon[1] separated list of zookeeper servers, and 
> zkPath is the znode path under which the editlog metadata should be stored. 
> For example, if you have 3 servers, zk1-3 with zookeeper listening on port 
> 2181, and you want to store the metadata under /hdfsnn, the URI would be 
> bookkeeper://zk1:2181;zk2:2181;zk3:2181/hdfsnn.
> 
> I benchmarks this code against an NFS filer, local storage and a NoPersist 
> implementation of JournalManager which simply discarded edits to get a 
> theoretical max. I ran the bench using NNThroughputBenchmark, to create 
> 100000 ops. I've attached the graph generated. The graph shows that 
> bookkeeper sees similar throughput to NFS and local file (very slightly 
> lower). Latency is a little higher, but once the disk cache for the local 
> disk saturates, BK's latency is lower. The NFS filer has a big chunk of 
> NVRAM, so it maintains low latency until the client saturates. 
> 
> 
> 
> -Ivan
> 
> [1] I couldn't use comma as is usually done for zookeeper, as 
> HadoopConfiguration would interpret this as multiple strings

Reply via email to