Hadoop-Hdfs-trunk - Build # 614 - Still Failing
See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk/614/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 729780 lines...] [junit] at java.lang.Thread.run(Thread.java:662) [junit] [junit] 2011-03-22 12:24:42,983 INFO datanode.DataNode (DataNode.java:shutdown(788)) - Waiting for threadgroup to exit, active threads is 0 [junit] 2011-03-22 12:24:42,983 INFO datanode.DataBlockScanner (DataBlockScanner.java:run(624)) - Exiting DataBlockScanner thread. [junit] 2011-03-22 12:24:42,984 INFO datanode.DataNode (DataNode.java:run(1464)) - DatanodeRegistration(127.0.0.1:57260, storageID=DS-91605065-127.0.1.1-57260-1300796672283, infoPort=35711, ipcPort=34865):Finishing DataNode in: FSDataset{dirpath='/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build-fi/test/data/dfs/data/data3/current/finalized,/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build-fi/test/data/dfs/data/data4/current/finalized'} [junit] 2011-03-22 12:24:42,984 INFO ipc.Server (Server.java:stop(1626)) - Stopping server on 34865 [junit] 2011-03-22 12:24:42,984 INFO datanode.DataNode (DataNode.java:shutdown(788)) - Waiting for threadgroup to exit, active threads is 0 [junit] 2011-03-22 12:24:42,984 INFO datanode.FSDatasetAsyncDiskService (FSDatasetAsyncDiskService.java:shutdown(133)) - Shutting down all async disk service threads... [junit] 2011-03-22 12:24:42,985 INFO datanode.FSDatasetAsyncDiskService (FSDatasetAsyncDiskService.java:shutdown(142)) - All async disk service threads have been shut down. [junit] 2011-03-22 12:24:42,985 WARN datanode.FSDatasetAsyncDiskService (FSDatasetAsyncDiskService.java:shutdown(130)) - AsyncDiskService has already shut down. [junit] 2011-03-22 12:24:42,985 INFO hdfs.MiniDFSCluster (MiniDFSCluster.java:shutdownDataNodes(835)) - Shutting down DataNode 0 [junit] 2011-03-22 12:24:42,990 INFO ipc.Server (Server.java:stop(1626)) - Stopping server on 38522 [junit] 2011-03-22 12:24:42,990 INFO ipc.Server (Server.java:run(691)) - Stopping IPC Server Responder [junit] 2011-03-22 12:24:42,990 INFO datanode.DataNode (DataNode.java:shutdown(788)) - Waiting for threadgroup to exit, active threads is 1 [junit] 2011-03-22 12:24:42,991 INFO ipc.Server (Server.java:run(487)) - Stopping IPC Server listener on 38522 [junit] 2011-03-22 12:24:42,991 WARN datanode.DataNode (DataXceiverServer.java:run(142)) - DatanodeRegistration(127.0.0.1:49868, storageID=DS-1067466969-127.0.1.1-49868-1300796672121, infoPort=48806, ipcPort=38522):DataXceiveServer: java.nio.channels.AsynchronousCloseException [junit] at java.nio.channels.spi.AbstractInterruptibleChannel.end(AbstractInterruptibleChannel.java:185) [junit] at sun.nio.ch.ServerSocketChannelImpl.accept(ServerSocketChannelImpl.java:159) [junit] at sun.nio.ch.ServerSocketAdaptor.accept(ServerSocketAdaptor.java:84) [junit] at org.apache.hadoop.hdfs.server.datanode.DataXceiverServer.run(DataXceiverServer.java:135) [junit] at java.lang.Thread.run(Thread.java:662) [junit] [junit] 2011-03-22 12:24:42,993 INFO datanode.DataNode (DataNode.java:shutdown(788)) - Waiting for threadgroup to exit, active threads is 0 [junit] 2011-03-22 12:24:42,995 INFO ipc.Server (Server.java:run(1459)) - IPC Server handler 0 on 38522: exiting [junit] 2011-03-22 12:24:43,093 INFO datanode.DataBlockScanner (DataBlockScanner.java:run(624)) - Exiting DataBlockScanner thread. [junit] 2011-03-22 12:24:43,093 INFO datanode.DataNode (DataNode.java:run(1464)) - DatanodeRegistration(127.0.0.1:49868, storageID=DS-1067466969-127.0.1.1-49868-1300796672121, infoPort=48806, ipcPort=38522):Finishing DataNode in: FSDataset{dirpath='/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build-fi/test/data/dfs/data/data1/current/finalized,/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build-fi/test/data/dfs/data/data2/current/finalized'} [junit] 2011-03-22 12:24:43,094 INFO ipc.Server (Server.java:stop(1626)) - Stopping server on 38522 [junit] 2011-03-22 12:24:43,094 INFO datanode.DataNode (DataNode.java:shutdown(788)) - Waiting for threadgroup to exit, active threads is 0 [junit] 2011-03-22 12:24:43,094 INFO datanode.FSDatasetAsyncDiskService (FSDatasetAsyncDiskService.java:shutdown(133)) - Shutting down all async disk service threads... [junit] 2011-03-22 12:24:43,094 INFO datanode.FSDatasetAsyncDiskService (FSDatasetAsyncDiskService.java:shutdown(142)) - All async disk service threads have been shut down. [junit] 2011-03-22 12:24:43,095 WARN datanode.FSDatasetAsyncDiskService (FSDatasetAsyncDiskService.java:shutdown(130)) - AsyncDiskService has already shut down. [junit] 2011-03-22 12:24:43,196 WARN namenode.DecommissionManager (
[jira] [Created] (HDFS-1774) Optimization in org.apache.hadoop.hdfs.server.datanode.FSDataset class.
Optimization in org.apache.hadoop.hdfs.server.datanode.FSDataset class. --- Key: HDFS-1774 URL: https://issues.apache.org/jira/browse/HDFS-1774 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Reporter: Uma Maheswara Rao G Assignee: Uma Maheswara Rao G Inner class FSDir constructor is doing duplicate iterations over the listed files in the passed directory. We can optimize this to single loop and also we can avoid isDirectory check which will perform some native invocations. Consider a case: one directory has only one child directory and 1 files. 1) First loop will get the number of children directories. 2) if (numChildren > 0) , This condition will satisfy and again it will iterate 10001 times and also will check isDirectory. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-1758) Web UI JSP pages thread safety issue
[ https://issues.apache.org/jira/browse/HDFS-1758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Suresh Srinivas resolved HDFS-1758. --- Resolution: Fixed I committed the patch. Thank you Tanping. > Web UI JSP pages thread safety issue > > > Key: HDFS-1758 > URL: https://issues.apache.org/jira/browse/HDFS-1758 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools >Affects Versions: 0.20.203.1 > Environment: branch-20-security >Reporter: Tanping Wang >Assignee: Tanping Wang >Priority: Minor > Fix For: 0.20.4 > > Attachments: HDFS-1758.patch > > > The set of JSP pages that web UI uses are not thread safe. We have observed > some problems when requesting Live/Dead/Decommissioning pages from the web > UI, incorrect page is displayed. To be more specific, requesting Dead node > list page, sometimes, Live node page is returned. Requesting decommissioning > page, sometimes, dead page is returned. > The root cause of this problem is that JSP page is not thread safe by > default. When multiple requests come in, each request is assigned to a > different thread, multiple threads access the same instance of the servlet > class resulted from a JSP page. A class variable is shared by multiple > threads. The JSP code in 20 branche, for example, dfsnodelist.jsp has > {code} >int rowNum = 0; > int colNum = 0; > String sorterField = null; > String sorterOrder = null; > String whatNodes = "LIVE"; > ... > %> > {code} > declared as class variables. ( These set of variables are declared within > <%! code %> directives which made them class members. ) Multiple threads > share the same set of class member variables, one request would step on > anther's toe. > However, due to the JSP code refactor, HADOOP-5857, all of these class member > variables are moved to become function local variables. So this bug does not > appear in Apache trunk. Hence, we have proposed to take a simple fix for > this bug on 20 branch alone, to be more specific, branch-0.20-security. > The simple fix is to add jsp ThreadSafe="false" directive into the related > JSP pages, dfshealth.jsp and dfsnodelist.jsp to make them thread safe, i.e. > only on request is processed at each time. > We did evaluate the thread safety issue for other JSP pages on trunk, we > noticed a potential problem is that when we retrieving some statistics from > namenode, for example, we make the call to > {code} > NamenodeJspHelper.getInodeLimitText(fsn); > {code} > in dfshealth.jsp, which eventuality is > {code} > static String getInodeLimitText(FSNamesystem fsn) { > long inodes = fsn.dir.totalInodes(); > long blocks = fsn.getBlocksTotal(); > long maxobjects = fsn.getMaxObjects(); > > {code} > some of the function calls are already guarded by readwritelock, e.g. > dir.totalInodes, but others are not. As a result of this, the web ui results > are not 100% thread safe. But after evaluating the prons and cons of adding > a giant lock into the JSP pages, we decided not to issue FSNamesystem > ReadWrite locks into JSPs. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-1386) unregister namenode datanode info MXBean
[ https://issues.apache.org/jira/browse/HDFS-1386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanping Wang resolved HDFS-1386. Resolution: Duplicate Release Note: (was: Making it a blocker for 0.22 as this causes tests to fail.) This bug fix is included in HADOOP-6728. > unregister namenode datanode info MXBean > > > Key: HDFS-1386 > URL: https://issues.apache.org/jira/browse/HDFS-1386 > Project: Hadoop HDFS > Issue Type: Bug > Components: data-node, name-node, test >Affects Versions: 0.22.0 >Reporter: Tanping Wang >Assignee: Tanping Wang >Priority: Blocker > Fix For: 0.22.0 > > -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Resolved] (HDFS-1695) HDFS federation: Fix testOIV and TestDatanodeUtils
[ https://issues.apache.org/jira/browse/HDFS-1695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tanping Wang resolved HDFS-1695. Resolution: Fixed > HDFS federation: Fix testOIV and TestDatanodeUtils > -- > > Key: HDFS-1695 > URL: https://issues.apache.org/jira/browse/HDFS-1695 > Project: Hadoop HDFS > Issue Type: Sub-task > Components: test >Affects Versions: Federation Branch >Reporter: Tanping Wang >Assignee: Tanping Wang >Priority: Minor > Fix For: Federation Branch > > Attachments: HDFS-1695.patch > > > # TestOfflineImageViewer fails due to the newest fsimage versions not being > listed as supported. We're missing HDFS:1500, which fixed this same problem > for a previous commit from Hairong. > # Fix TestOIV. The supported version got bumped from -25 to-27 during a > merge (19bbb922bc8e08a3a546b578b476be3ea79d45ea), which didn't update the > TestOIV. > # Fix TestDatanodeUtils. TestDataNodeUtils was named as if it were a test, > which it wasn't, and so was being failed. > Contributed by Jakob Homan. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HDFS-1775) getContentSummary should hold the FSNamesystem readlock
getContentSummary should hold the FSNamesystem readlock --- Key: HDFS-1775 URL: https://issues.apache.org/jira/browse/HDFS-1775 Project: Hadoop HDFS Issue Type: Bug Reporter: Dmytro Molkov Assignee: Dmytro Molkov Priority: Minor Right now the getContentSummary call on the namenode only holds the FSDirectory lock, but not the FSNamesystem lock. What we are seeing because of that is: 1) getContentSummary takes the read lock on FSDirectory 2) the write operation comes and takes a write lock on FSNamesystem and waits for getContentSummary to finish to get a write lock on FSDirectory As a result other read operations can't be executed. Since getContentSummary can take a while to execute on large directories, the performance would be improved if we hold a readlock while doing that. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Merging Namenode Federation feature (HDFS-1052) to trunk
On Mar 21, 2011, at 4:08 PM, Sanjay Radia wrote: > > Allen, not sure if I explained the difference above. > Base on the discussion we had at the Hug, I want to clarify a few things Thanks for taking the time at HUG. (I've since figured out that I lost your messages as part of my email list transition.) > A DN stores block for only ONE cluster. But this does make things easier. Although I'm still fairly confident that it adds too much complexity for little gain though. So put this in the 'agree to disagree' column. It would still be nice if you guys could lay off the camelCase options though. Admins hate the shift key. BTW, Robert C. asked what I thought you guys should have been working on instead of Federation. I told him (and you) high availability of the namenode (which I still believe is necessary for HDFS in more and more cases), but I've had more time to think about it. So expect my list (which I'll post here) soon. :p
[jira] [Created] (HDFS-1776) Bug in Concat code
Bug in Concat code -- Key: HDFS-1776 URL: https://issues.apache.org/jira/browse/HDFS-1776 Project: Hadoop HDFS Issue Type: Bug Reporter: Dmytro Molkov There is a bug in the concat code. Specifically: in INodeFile.appendBlocks() we need to first reassign the blocks list and then go through it and update the INode pointer. Otherwise we are not updating the inode pointer on all of the new blocks in the file. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira