Sorry "But I found the namenode is fair to process the invalidating for each datanode." should be:
"But I found the namenode is unfair to process the invalidating for each datanode." On Fri, Mar 27, 2009 at 3:49 PM, schubert zhang <zson...@gmail.com> wrote: > Thanks Samuel, > Your information is very correct. > I have also read code about garbage collection of invalidating blocks. > > But I found the namenode is fair to process the invalidating for each > datanode. > In my cluster, there are 5 datanode. The storage IDs are: > > node1: DS- 978762906-10.24.1.12-50010-1237686434530 > node2: DS- 489086185-10.24.1.14-50010-1237686416330 > node3: DS-1170985665-10.24.1.16-50010-1237686426395 > node4: DS-1024388083-10.24.1.18-50010-1237686404482 > node5: DS-2136798339-10.24.1.20-50010-1237686444430 > I know the storage ID is generated > by org.apache.hadoop.hdfs.server.datanode.DataNode.setNewStorageID(...). > > In org.apache.hadoop.hdfs.server.namenode.FSNamesystem > > // Keeps a Collection for every named machine containing > // blocks that have recently been invalidated and are thought to live > // on the machine in question. > // Mapping: StorageID -> ArrayList<Block> > // > private Map<String, Collection<Block>> recentInvalidateSets = > new TreeMap<String, Collection<Block>>(); > > In org.apache.hadoop.hdfs.server.namenode.FSNamesystem.ReplicationMonitor > This thread run in interval: replicationRecheckInterval=3000 milliseconds. > > Into computeDatanodeWork() > nodesToProcess = 2. > > then into computeInvalidateWork(nodesToProcess) > the for cycle will only exec 2 cycles. > > for each cycle, go into invalidateWorkForOneNode() > it will always get the first node to invalidate blocks on this node. > String firstNodeId = recentInvalidateSets.keySet().iterator().next(); > > TreeMap is a stored map, so, the ketSet is: > [1024388083-10.24.1.18-50010-1237686404482, > 1170985665-10.24.1.16-50010-1237686426395, > 2136798339-10.24.1.20-50010-1237686444430, > 489086185-10.24.1.14-50010-1237686416330, > 978762906-10.24.1.12-50010-1237686434530] > > So, the sequence of node list in recentInvalidateSets is: > [node4, node3, node5, node2, node1] > > So, every time in invalidateWorkForOneNode(), it will always process node4 > then node3, then node2 and then node1. > > My application is a HBase write-heavy application. > So, there is many blocks need invalidate in each datanode. So when each > 3000 milliseconds, at most, there is only two datanode is processed. Since > the node1 is the last one in the TreeMap, it have no change to be garbage > collected. > > I think HDFS namenode should fix this issue. > > Schubert > > On Thu, Mar 26, 2009 at 2:57 PM, Samuel Guo <guosi...@gmail.com> wrote: > >> After a file is deleted, HDFS does not immediately reclaim the available >> physical storage. It does so only lazily during garbage collection. When a >> file is deleted by the application, the master remove the file's metadata >> from *FSNamesystem* and logs the deletion immediately. And the file's >> deleted blocks information will be collected in each DataNodeDescriptor's >> *invalidateBlocks* set in Namenode. During the heartbeats between NN and >> DN, >> NN will scan the specified DN's DataNodeDescriptor's invalidateBlocks set, >> find the blocks to be deleted in DN and send a *DNA_INVALIDATE* >> BlockCommand >> to DN. And the *BlockScanner* thread running on DN will scan, find and >> delete these blocks after DN receives the *DNA_INVALIDATE* BlockCommand. >> >> You can search *DNA_INVALIDATE* in DataNode.java and NameNode.java files, >> and find the logic of the garbage collection. Hope it will be helpful. >> >> On Thu, Mar 26, 2009 at 11:07 AM, schubert zhang <zson...@gmail.com> >> wrote: >> >> > Tanks Andrew and Billy. >> > I think the subject of this mail thread is not appropriate, it may not >> be a >> > balance issue. >> > The problem seems the block deleting scheduler in HDFS. >> > >> > Last night(timezone:+8), I slow down my application, and this morning, I >> > found almost all garbage blocks are deleted. >> > Here is the current blocks number of each datanode: >> > node1: 10651 >> > node2: 10477 >> > node3: 12185 >> > node4: 11607 >> > node5: 14000 >> > >> > It seems fine. >> > But I want to study the code of HDFS and make clear the policy of >> deleting >> > blocks on datanodes. If anyone in the hadoop community can give me some >> > advices? >> > >> > Schubert >> > >> > On Thu, Mar 26, 2009 at 7:55 AM, Andrew Purtell <apurt...@apache.org> >> > wrote: >> > >> > >> > > >> > > > From: schubert zhang <zson...@gmail.com> >> > > > From another point of view, I think HBase cannot control to >> > > > delete blocks on which node, it would just delete files, and >> > > > HDFS delete blocks where the blocks locating. >> > > >> > > Yes, that is exactly correct. >> > > >> > > Best regards, >> > > >> > > - Andy >> > > >> > > >> > > >> > > >> > > >> > >> > >