Hadoop-Hdfs-trunk - Build # 609 - Still Failing
See https://hudson.apache.org/hudson/job/Hadoop-Hdfs-trunk/609/ ### ## LAST 60 LINES OF THE CONSOLE ### [...truncated 704012 lines...] [mkdir] Created dir: /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target [echo] Including clover.jar in the war file ... [cactifywar] Analyzing war: /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/hdfsproxy-2.0-test.war [cactifywar] Building war: /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target/test.war cactifywar: test-cactus: [echo] Free Ports: startup-11592 / http-11593 / https-11594 [echo] Please take a deep breath while Cargo gets the Tomcat for running the servlet tests... [mkdir] Created dir: /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target/tomcat-config [mkdir] Created dir: /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target/tomcat-config/conf [mkdir] Created dir: /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target/tomcat-config/webapps [mkdir] Created dir: /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target/tomcat-config/temp [mkdir] Created dir: /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target/logs [mkdir] Created dir: /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target/reports [copy] Copying 1 file to /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target/tomcat-config/conf [copy] Copying 1 file to /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target/tomcat-config/conf [copy] Copying 1 file to /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target/tomcat-config/conf [cactus] - [cactus] Running tests against Tomcat 5.x @ http://localhost:11593 [cactus] - [cactus] Deploying [/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target/test.war] to [/grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build/contrib/hdfsproxy/target/tomcat-config/webapps]... [cactus] Tomcat 5.x starting... Server [Apache-Coyote/1.1] started [cactus] WARNING: multiple versions of ant detected in path for junit [cactus] jar:file:/homes/hudson/tools/ant/latest/lib/ant.jar!/org/apache/tools/ant/Project.class [cactus] and jar:file:/homes/hudson/.ivy2/cache/ant/ant/jars/ant-1.6.5.jar!/org/apache/tools/ant/Project.class [cactus] Running org.apache.hadoop.hdfsproxy.TestAuthorizationFilter [cactus] Tests run: 4, Failures: 2, Errors: 0, Time elapsed: 0.486 sec [cactus] Test org.apache.hadoop.hdfsproxy.TestAuthorizationFilter FAILED [cactus] Running org.apache.hadoop.hdfsproxy.TestLdapIpDirFilter [cactus] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.555 sec [cactus] Tomcat 5.x started on port [11593] [cactus] Running org.apache.hadoop.hdfsproxy.TestProxyFilter [cactus] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.311 sec [cactus] Running org.apache.hadoop.hdfsproxy.TestProxyForwardServlet [cactus] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.343 sec [cactus] Running org.apache.hadoop.hdfsproxy.TestProxyUtil [cactus] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.875 sec [cactus] Tomcat 5.x is stopping... [cactus] Tomcat 5.x is stopped BUILD FAILED /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build.xml:753: The following error occurred while executing this line: /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/build.xml:734: The following error occurred while executing this line: /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/src/contrib/build.xml:49: The following error occurred while executing this line: /grid/0/hudson/hudson-slave/workspace/Hadoop-Hdfs-trunk/trunk/src/contrib/hdfsproxy/build.xml:343: Tests failed! Total time: 51 minutes 16 seconds [FINDBUGS] Skipping publisher since build result is FAILURE Publishing Javadoc Archiving artifacts Recording test results Recording fingerprints Publishing Clover coverage report... No Clover report will be published due to a Build Failure Email was triggered for: Failure Sending email for trigger: Failure ### ## FAILED TESTS (if any) ## 2 tests failed. FAILED: org.apache.hadoop.hdfsproxy.TestAu
[jira] Created: (HDFS-1764) add 'Time Since Delared Dead' to namenode dead data nodes web page
add 'Time Since Delared Dead' to namenode dead data nodes web page -- Key: HDFS-1764 URL: https://issues.apache.org/jira/browse/HDFS-1764 Project: Hadoop HDFS Issue Type: Improvement Reporter: Hairong Kuang Assignee: Hairong Kuang I am filing this jira for Andrew. :) Currently on the dead nodes page of a namenode, we only list the dead datanode's hostnames. In addition I would like to list the duration since the node was declared as dead, for example in the same format as the "Decommissioning Nodes" page when it lists "Time Since Decommissioning Started". In our Hadoop clusters if a node has only been dead for a few minutes, our monitoring is likely to bring the node back without us needing to do anything about it. This proposed functionality will help administrators identify which nodes need manual attention and which nodes are likely to be fixed by our monitoring. If the node has been dead for many hours, it merits a closer look. This seems like useful functionality for open source as well. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Created: (HDFS-1765) Block Replication should respect under-replication block priority
Block Replication should respect under-replication block priority - Key: HDFS-1765 URL: https://issues.apache.org/jira/browse/HDFS-1765 Project: Hadoop HDFS Issue Type: Improvement Components: name-node Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 Currently under-replicated blocks are assigned different priorities depending on how many replicas a block has. However the replication monitor works on blocks in a round-robin fashion. So the newly added high priority blocks won't get replicated until all low-priority blocks are done. One example is that on decommissioning datanode WebUI we often observe that "blocks with only decommissioning replicas" do not get scheduled to replicate before other blocks, so risking data availability if the node is shutdown for repair before decommission completes. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
libhdfs not getting compiled
Hello, I am working on a project involving hdfs and fuse-dfs API on top of it. I wanted to trace through the functions called from libhdfs API by fuse-dfs functions. I added print statements inside the hdfs.c file in appropriate places to see how the functions progress. I execute ant compile-c++-libhdfs -Dlibhdfs=1 and then ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1 -Djava5.home=/usr/lib/jvm/java-1.5.0-sun. However, when I use fuse-dfs I cannot see any of the print statements executed from libhdfs/hdfs.c. I am using hadoop-0.20.2 version and the libhdfs is present in hadoop-0.20.2/src/c++/libhdfs. Could someone tell me if this libhdfs is the one compiled and used or if there will be some other libhdfs that is accessed. If this is the one, then why are the changes made in its files reflected on running the code? Thanks, Aastha. -- Aastha Mehta Intern, NetApp, Bangalore 4th year undergraduate, BITS Pilani E-mail: aasth...@gmail.com
[jira] Created: (HDFS-1766) Datanode is marked dead, but datanode process is alive and verifying blocks
Datanode is marked dead, but datanode process is alive and verifying blocks --- Key: HDFS-1766 URL: https://issues.apache.org/jira/browse/HDFS-1766 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.0 Reporter: Hairong Kuang Assignee: Hairong Kuang Fix For: 0.23.0 We have a datanode marked dead in the namenode, and it is not taking any traffic. But it is verifying blocks continuously, so the DataNode process is definitely not dead. Jstack shows that the main thread and the offerService thread are gone but the JVM stuck at waiting for other threads to die. It seems to me that the offerService thread has died abnormally, for example, by a runtime exception and it did not shut down other threads before exiting. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: libhdfs not getting compiled
Hi Aastha, Try using "ldd" against the fuse_dfs executable, and see where you are pulling libhdfs.so from. It may be it is linking from the "wrong one". Brian On Mar 17, 2011, at 3:24 PM, Aastha Mehta wrote: > Hello, > > I am working on a project involving hdfs and fuse-dfs API on top of it. I > wanted to trace through the functions called from libhdfs API by fuse-dfs > functions. I added print statements inside the hdfs.c file in appropriate > places to see how the functions progress. I execute ant compile-c++-libhdfs > -Dlibhdfs=1 and then ant compile-contrib -Dlibhdfs=1 -Dfusedfs=1 > -Djava5.home=/usr/lib/jvm/java-1.5.0-sun. However, when I use fuse-dfs I > cannot see any of the print statements executed from libhdfs/hdfs.c. > > I am using hadoop-0.20.2 version and the libhdfs is present in > hadoop-0.20.2/src/c++/libhdfs. Could someone tell me if this libhdfs is the > one compiled and used or if there will be some other libhdfs that is > accessed. If this is the one, then why are the changes made in its files > reflected on running the code? > > Thanks, > Aastha. > > -- > Aastha Mehta > Intern, NetApp, Bangalore > 4th year undergraduate, BITS Pilani > E-mail: aasth...@gmail.com smime.p7s Description: S/MIME cryptographic signature
[jira] Created: (HDFS-1767) Delay second Block Reports until after cluster finishes startup, to improve startup times
Delay second Block Reports until after cluster finishes startup, to improve startup times - Key: HDFS-1767 URL: https://issues.apache.org/jira/browse/HDFS-1767 Project: Hadoop HDFS Issue Type: Improvement Components: data-node Affects Versions: 0.22.0 Reporter: Matt Foley Assignee: Matt Foley Fix For: 0.23.0 Consider a large cluster that takes 40 minutes to start up. The datanodes compete to register and send their Initial Block Reports (IBRs) as fast as they can after startup (subject to a small sub-two-minute random delay, which isn't relevant to this discussion). As each datanode succeeds in sending its IBR, it schedules the starting time for its regular cycle of reports, every hour (or other configured value of dfs.blockreport.intervalMsec). In order to spread the reports evenly across the block report interval, each datanode picks a random fraction of that interval, for the starting point of its regular report cycle. For example, if a particular datanode ends up randomly selecting 18 minutes after the hour, then that datanode will send a Block Report at 18 minutes after the hour every hour as long as it remains up. Other datanodes will start their cycles at other randomly selected times. This code is in DataNode.blockReport() and DataNode.scheduleBlockReport(). The "second Block Report" (2BR), is the start of these hourly reports. The problem is that some of these 2BRs get scheduled sooner rather than later, and actually occur within the startup period. For example, if the cluster takes 40 minutes (2/3 of an hour) to start up, then out of the datanodes that succeed in sending their IBRs during the first 10 minutes, between 1/2 and 2/3 of them will send their 2BR before the 40-minute startup time has completed! 2BRs sent within the startup time actually compete with the remaining IBRs, and thereby slow down the overall startup process. This can be seen in the following data, which shows the startup process for a 3700-node cluster that took about 17 minutes to finish startup: timestarts sum regs sum IBR sum 2nd_BR sum total_BRs/min 0 1299799498 3042 3042 1969 1969 151 151 0 151 1 1299799558 665 3707 1470 3439 248 399 0 248 2 12997996183707 224 3663 270 669 0 270 3 1299799678370714 3677 261 9303 3 264 4 1299799738370723 3700 288 12181 4 289 5 12997997983707 7 3707 258 14763 7 261 6 129979985837073707 317 1793411 321 7 129979991837073707 292 2085617 298 8 129979997837073707 292 2377825 300 9 129980003837073707 272 2649 25 272 10 129980009837073707 280 2929 1540 295 11 129980015837073707 223 3152 1454 237 12 129980021837073707 143 3295 54 143 13 129980027837073707 141 3436 2074 161 14 129980033837073707 195 3631 78 152 273 15 129980039837073707 51 3682 209 361 260 16 129980045837073707 25 3707 369 730 394 17 129980051837073707 3707 166 896 166 18 129980057837073707 3707 72 968 72 19 129980063837073707 3707 67 1035 67 20 129980069837073707 3707 75 1110 75 21 129980075837073707 3707 71 1181 71 22 129980081837073707 3707 67 1248 67 23 129980087837073707 3707 62 1310 62 24 129980093837073707 3707 56 1366 56 25 129980099837073707 3707 60 1426 60 This data was harvested from the startup logs of all the datanodes, and correlated into one-minute buckets. Each row of the table represents the progress during one elapsed minute of clock time. It seems that every cluster startup is different, but this one showed the effect fairly well. The "starts" column shows that all the nodes started up within the first 2 minutes, and the "regs" column shows that all succeeded in registering by minute 6. The IBR column shows a sustained rate of Initial Block Report processing of 250-300/minute for the first 10 minutes. The question is why, during minutes 11 through 16, the rate of IBR processing slowed down. Why didn't the startup just finish? In the "2nd_BR" column, we see the rate of 2BRs ramping up as more datanodes complete their IBRs. As the rate increases, they become more effective at competing with the IBRs, and slow down the IBR processing even more. After the IBRs finally fini