BookKeeper Journal Manager for Namenode

Ivan Kelly Tue, 15 Nov 2011 08:30:39 -0800

Hi guys,

I've just uploaded a patch to HDFS-234 which contains an implementation of 
JournalManager for BookKeeper. The code is ready for review, though I plan to 
add some more tests. The code relies on HDFS-1580 which isn't in trunk yet. The 
code is on github if you want to avoid faffing about with multiple patches. 
(https://github.com/ivankelly/hadoop-common/tree/HDFS-234)


To configure the namenode to use BK with this code, put the following in 
hdfs-site.xml

<property>
  <name>dfs.namenode.edits.dir</name>
  <value>bookkeeper://[zkEnsemble]/[zkPath]</value>
</property>
 
<property>
  <name>dfs.namenode.edits.journalPlugin.bookkeeper</name>
  
<value>org.apache.hadoop.hdfs.server.namenode.bkjournal.BookKeeperJournalManager</value>
</property>

Where zkEnsemble is a semicolon[1] separated list of zookeeper servers, and 
zkPath is the znode path under which the editlog metadata should be stored. For 
example, if you have 3 servers, zk1-3 with zookeeper listening on port 2181, 
and you want to store the metadata under /hdfsnn, the URI would be 
bookkeeper://zk1:2181;zk2:2181;zk3:2181/hdfsnn.

I benchmarks this code against an NFS filer, local storage and a NoPersist 
implementation of JournalManager which simply discarded edits to get a 
theoretical max. I ran the bench using NNThroughputBenchmark, to create 100000 
ops. I've attached the graph generated. The graph shows that bookkeeper sees 
similar throughput to NFS and local file (very slightly lower). Latency is a 
little higher, but once the disk cache for the local disk saturates, BK's 
latency is lower. The NFS filer has a big chunk of NVRAM, so it maintains low 
latency until the client saturates.


-Ivan

[1] I couldn't use comma as is usually done for zookeeper, as 
HadoopConfiguration would interpret this as multiple strings

BookKeeper Journal Manager for Namenode

Reply via email to