New DataNode Implementation

David Mollitor Fri, 23 Aug 2019 07:42:58 -0700

Hello Gang,

Over the past few months, I've been dedicating my spare time to review the
DataNode project.  I've had several patches accepted over the past year but
found working with the code base very challenging.  It motivated me to
create a new DataNode, backed by Spring Boot.


I also wrote the DataTransfer protocol into a Netty-backed service.  I
learned a lot about the protocol and took a bunch of notes that, if
implemented, could streamline the protocol quite a bit and make it much
more Netty (async+protobuf) friendly.

The other motivation is that I wanted a way to better utilize all of the
drives in the node.  In a standard cluster installation, the OS gets two
disks dedicated to it (RAID-1).  In a cluster with 500 nodes, that is 1,000
drives dedicated to OS, much of it wasted.  I have designed this DataNode
to better utilize this space by storing the block metadata on these primary
drives as well, freeing up more space on the data drives for block data.

I have elected to use LevelDB to store the block metadata.  This turns out
to be quite handy because it can also be used to store all the other
metadata a DataNode generates... DataNode UUID, Volume Metadata, Namespace
info, and the rest.  Having a single metadata repository greatly simplifies
the design and since it is on a RAID drive, it can be assumed that it is
always available (if the OS drives are both dead, the entire node will be
dead anyway).  It also could remove a lot of work from the NameNode.  For
example, the volume location of each block is tracked in LevelDB.
Therefore, it is not necessary for the NameNode to track volumes.  The only
information that the NameNode need track is the DataNode URI, block pool
Id, block ID, and generation stamp.  The client can request the block from
the DataNode without concern of the specific volume it is on.

What I've put together is currently at version 0.0.0.5.  It will work
(tested on a three-node cluster with terasort/teragen/teravalidate) but it
is pretty rough and does not implement even 20% of the functionality of the
reference Apache DataNode.  Also note that without a detailed reference
guide of the protocol involved, I've had to do a bunch of reverse
engineering.  So, this DataNode may be communicating in such a way that the
cluster doesn't reject it, but using bogus values for some of the fields.

Please check it out.

https://github.com/belugabehr/springdn

Thanks!

New DataNode Implementation

Reply via email to