Hey Yoonmin, Unfortunately I agree it's a bit complex, especially because "Block" is sometimes used where "Replica" might be more accurate. If you find any ambiguities like this, I think we'd happily take patches with clarifying comments / javadoc.
The best way to learn is to read the code, but maybe this will help a bit: - The NameNode uses the BlocksMap to store the block -> datanode locations mapping. This is done by the BlockInfo class, which actually holds the locations of the block's replicas in the triplets array. The map is appropriately managed by the BlockManager. - BlockInfo is also a GSet.Element, which is used to get the set of BlockInfo on a particular datanode. This is primarily useful when processing block reports. - LocatedBlock and LocatedBlocks are used in ClientProtocol#getBlockLocations, which clients use to query the block -> datanode mapping. It makes sense to have separate client and server Block representations here, though they aren't the purest. - INodes are pretty separate from Blocks. BlockInfo has a pointer back to the containing BlockCollection, which can be some type of INode, but that's about all the BlockManager worries about. Best, Andrew On Tue, Oct 15, 2013 at 11:18 PM, Yoonmin Nam <rony...@dgist.ac.kr> wrote: > When we see the source code of hdfs especially FSNamesystem, there is so > many block related types are used such as Block, LocatedBLocks, > BlocksWithLocations. And this makes me very unclear about the system. > > In addition, BlocksMap just maps Block and BlockInfo, but Block becomes > LocatedBlock with DatanodeInfo. With several locateBlock, these become > LocatedBlocks. > > Also, Combining INode related classes with Block related classes makes me > unhappy. > > Is there anyone who let me know about the motto of this kind of complex > structure of HDFS block management and give more specific and detail > information? > > Thanks! > > > >