Jenkins build is still unstable: Hadoop-Hdfs-trunk #916

2012-01-05 Thread Apache Jenkins Server
See 




Hadoop-Hdfs-trunk - Build # 916 - Still Unstable

2012-01-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-trunk/916/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 12090 lines...]
[INFO] Building Apache Hadoop HDFS Project 0.24.0-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- maven-clean-plugin:2.4.1:clean (default-clean) @ hadoop-hdfs-project 
---
[INFO] Deleting 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/target
[INFO] 
[INFO] --- maven-antrun-plugin:1.6:run (create-testdirs) @ hadoop-hdfs-project 
---
[INFO] Executing tasks

main:
[mkdir] Created dir: 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-trunk/trunk/hadoop-hdfs-project/target/test-dir
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-javadoc-plugin:2.7:jar (module-javadocs) @ hadoop-hdfs-project 
---
[INFO] Not executing Javadoc as the project is not a Java classpath-capable 
package
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:jar-no-fork (hadoop-java-sources) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-checkstyle-plugin:2.6:checkstyle (default-cli) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- findbugs-maven-plugin:2.3.2:findbugs (default-cli) @ 
hadoop-hdfs-project ---
[INFO] ** FindBugsMojo execute ***
[INFO] canGenerate is false
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop HDFS  SUCCESS [5:02.396s]
[INFO] Apache Hadoop HttpFS .. SUCCESS [33.238s]
[INFO] Apache Hadoop HDFS BookKeeper Journal . SUCCESS [10.936s]
[INFO] Apache Hadoop HDFS Project  SUCCESS [0.034s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 5:47.030s
[INFO] Finished at: Thu Jan 05 11:40:48 UTC 2012
[INFO] Final Memory: 84M/753M
[INFO] 
+ /home/jenkins/tools/maven/latest/bin/mvn test 
-Dmaven.test.failure.ignore=true -Pclover 
-DcloverLicenseLocation=/home/jenkins/tools/clover/latest/lib/clover.license
Archiving artifacts
Recording test results
Build step 'Publish JUnit test result report' changed build result to UNSTABLE
Publishing Javadoc
Recording fingerprints
Updating MAPREDUCE-3566
Updating MAPREDUCE-3529
Updating MAPREDUCE-3490
Updating MAPREDUCE-1744
Updating MAPREDUCE-3478
Updating HADOOP-7948
Updating HADOOP-7949
Updating HDFS-1314
Updating MAPREDUCE-3595
Updating MAPREDUCE-3572
Updating MAPREDUCE-3569
Sending e-mails to: hdfs-dev@hadoop.apache.org
Email was triggered for: Unstable
Sending email for trigger: Unstable



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.hadoop.fs.viewfs.TestViewFileSystemHdfs.testGetDelegationTokensWithCredentials

Error Message:
expected:<0> but was:<1>

Stack Trace:
java.lang.AssertionError: expected:<0> but was:<1>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:126)
at org.junit.Assert.assertEquals(Assert.java:470)
at org.junit.Assert.assertEquals(Assert.java:454)
at 
org.apache.hadoop.fs.viewfs.ViewFileSystemBaseTest.testGetDelegationTokensWithCredentials(ViewFileSystemBaseTest.java:151)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runners.ParentRu

Hadoop-Hdfs-0.23-Build - Build # 129 - Still Unstable

2012-01-05 Thread Apache Jenkins Server
See https://builds.apache.org/job/Hadoop-Hdfs-0.23-Build/129/

###
## LAST 60 LINES OF THE CONSOLE 
###
[...truncated 14026 lines...]
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ 
hadoop-hdfs-project ---
[INFO] Installing 
/home/jenkins/jenkins-slave/workspace/Hadoop-Hdfs-0.23-Build/trunk/hadoop-hdfs-project/pom.xml
 to 
/home/jenkins/.m2/repository/org/apache/hadoop/hadoop-hdfs-project/0.23.1-SNAPSHOT/hadoop-hdfs-project-0.23.1-SNAPSHOT.pom
[INFO] 
[INFO] --- maven-antrun-plugin:1.6:run (create-testdirs) @ hadoop-hdfs-project 
---
[INFO] Executing tasks

main:
[INFO] Executed tasks
[INFO] 
[INFO] --- maven-javadoc-plugin:2.7:jar (module-javadocs) @ hadoop-hdfs-project 
---
[INFO] Not executing Javadoc as the project is not a Java classpath-capable 
package
[INFO] 
[INFO] --- maven-source-plugin:2.1.2:jar-no-fork (hadoop-java-sources) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-site-plugin:3.0:attach-descriptor (attach-descriptor) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- maven-checkstyle-plugin:2.6:checkstyle (default-cli) @ 
hadoop-hdfs-project ---
[INFO] 
[INFO] --- findbugs-maven-plugin:2.3.2:findbugs (default-cli) @ 
hadoop-hdfs-project ---
[INFO] ** FindBugsMojo execute ***
[INFO] canGenerate is false
[INFO] 
[INFO] Reactor Summary:
[INFO] 
[INFO] Apache Hadoop HDFS  SUCCESS [5:33.410s]
[INFO] Apache Hadoop HttpFS .. SUCCESS [40.398s]
[INFO] Apache Hadoop HDFS Project  SUCCESS [0.058s]
[INFO] 
[INFO] BUILD SUCCESS
[INFO] 
[INFO] Total time: 6:14.302s
[INFO] Finished at: Thu Jan 05 11:40:45 UTC 2012
[INFO] Final Memory: 75M/747M
[INFO] 
+ /home/jenkins/tools/maven/latest/bin/mvn test 
-Dmaven.test.failure.ignore=true -Pclover 
-DcloverLicenseLocation=/home/jenkins/tools/clover/latest/lib/clover.license
Archiving artifacts
Publishing Clover coverage report...
Publishing Clover HTML report...
Publishing Clover XML report...
Publishing Clover coverage results...
Recording test results
Build step 'Publish JUnit test result report' changed build result to UNSTABLE
Publishing Javadoc
Recording fingerprints
Updating MAPREDUCE-3566
Updating MAPREDUCE-3529
Updating MAPREDUCE-3490
Updating MAPREDUCE-3478
Updating HADOOP-7924
Updating HADOOP-7948
Updating MAPREDUCE-3595
Updating MAPREDUCE-3572
Updating MAPREDUCE-3569
Sending e-mails to: hdfs-dev@hadoop.apache.org
Email was triggered for: Unstable
Sending email for trigger: Unstable



###
## FAILED TESTS (if any) 
##
1 tests failed.
FAILED:  
org.apache.hadoop.fs.viewfs.TestViewFileSystemHdfs.testGetDelegationTokensWithCredentials

Error Message:
expected:<0> but was:<1>

Stack Trace:
java.lang.AssertionError: expected:<0> but was:<1>
at org.junit.Assert.fail(Assert.java:91)
at org.junit.Assert.failNotEquals(Assert.java:645)
at org.junit.Assert.assertEquals(Assert.java:126)
at org.junit.Assert.assertEquals(Assert.java:470)
at org.junit.Assert.assertEquals(Assert.java:454)
at 
org.apache.hadoop.fs.viewfs.ViewFileSystemBaseTest.testGetDelegationTokensWithCredentials(ViewFileSystemBaseTest.java:151)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:41)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:20)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:28)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:31)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:76)
at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:193)
at org.junit.runner

Jenkins build is still unstable: Hadoop-Hdfs-0.23-Build #129

2012-01-05 Thread Apache Jenkins Server
See 




Timeouts in Datanodes while block scanning

2012-01-05 Thread Uma Maheswara Rao G
Hi,

 I have 10 Node cluster running from last 25days( running with Hbase cluster). 
Recently observed that for every continuos blocks scans, there are many 
timeouts coming in DataNode.
 After this block scan verifications, again reads succeeded. This situation 
keep occurring many times now, for every continuous block scans.  Here Hbase 
continuously  performing many random reads.

Whether any one faced this situation in your clusters?

Below is the logs with timeouts.
2011-12-28 11:30:42,618 INFO  DataNode.clienttrace 
(BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: 
/107.252.175.3:52764, bytes: 264192, op: HDFS_READ, cliID: 
DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, 
srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid: 
blk_1323251633953_187190
2011-12-28 11:30:42,621 INFO  DataNode.clienttrace 
(BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: 
/107.252.175.3:52772, bytes: 396288, op: HDFS_READ, cliID: 
DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, 
srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid: 
blk_1323251635735_188342
2011-12-28 11:30:42,641 INFO  DataNode.clienttrace 
(BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: 
/107.252.175.3:52796, bytes: 396288, op: HDFS_READ, cliID: 
DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, 
srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid: 
blk_1323251634096_187277
2011-12-28 11:30:42,889 INFO  DataNode.clienttrace 
(BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: 
/107.252.175.3:52732, bytes: 264192, op: HDFS_READ, cliID: 
DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, 
srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid: 
blk_1323251635763_188363
2011-12-28 11:30:42,889 INFO  DataNode.clienttrace 
(BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: 
/107.252.175.3:52637, bytes: 264192, op: HDFS_READ, cliID: 
DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, 
srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid: 
blk_1323251634921_187798
2011-12-28 11:30:42,976 INFO  DataNode.clienttrace 
(BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: 
/107.252.175.3:52755, bytes: 396288, op: HDFS_READ, cliID: 
DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, 
srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid: 
blk_1323251635359_188075
2011-12-28 11:30:57,757 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1323251602823_167208
2011-12-28 11:32:15,757 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1323251599175_166755
2011-12-28 11:32:54,561 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1323251673745_194676
2011-12-28 11:33:33,561 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1323251640709_189383
2011-12-28 11:34:12,557 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1323251649630_190779
2011-12-28 11:34:51,557 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1323251463964_91885
2011-12-28 11:35:23,958 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1323251636310_188845
2011-12-28 11:36:01,155 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1322486683238_54999
2011-12-28 11:36:04,157 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1323251678959_195786
2011-12-28 11:36:43,157 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1323251641803_189561
2011-12-28 11:37:20,357 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1322486706170_66445
2011-12-28 11:37:44,759 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1323251646924_190359
2011-12-28 11:38:23,759 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1323251673776_194683
2011-12-28 11:38:30,157 INFO  datanode.DataBlockScanner 
(DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for 
blk_1323251621379_178399
2011-12-28 11:38:37,549 INFO  DataNode.clienttrace 
(BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: 
/107.252.175.3:51942, bytes: 396288, op: HDFS_READ, cliID: 
DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27, 
srvID: DS-

Re: Timeouts in Datanodes while block scanning

2012-01-05 Thread Aaron T. Myers
What version of HDFS? This question might be more appropriate for hdfs-user@
.

--
Aaron T. Myers
Software Engineer, Cloudera



On Thu, Jan 5, 2012 at 8:59 AM, Uma Maheswara Rao G wrote:

> Hi,
>
>  I have 10 Node cluster running from last 25days( running with Hbase
> cluster). Recently observed that for every continuos blocks scans, there
> are many timeouts coming in DataNode.
>  After this block scan verifications, again reads succeeded. This
> situation keep occurring many times now, for every continuous block scans.
>  Here Hbase continuously  performing many random reads.
>
> Whether any one faced this situation in your clusters?
>
> Below is the logs with timeouts.
> 2011-12-28 11:30:42,618 INFO  DataNode.clienttrace
> (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: /
> 107.252.175.3:52764, bytes: 264192, op: HDFS_READ, cliID:
> DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27,
> srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid:
> blk_1323251633953_187190
> 2011-12-28 11:30:42,621 INFO  DataNode.clienttrace
> (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: /
> 107.252.175.3:52772, bytes: 396288, op: HDFS_READ, cliID:
> DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27,
> srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid:
> blk_1323251635735_188342
> 2011-12-28 11:30:42,641 INFO  DataNode.clienttrace
> (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: /
> 107.252.175.3:52796, bytes: 396288, op: HDFS_READ, cliID:
> DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27,
> srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid:
> blk_1323251634096_187277
> 2011-12-28 11:30:42,889 INFO  DataNode.clienttrace
> (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: /
> 107.252.175.3:52732, bytes: 264192, op: HDFS_READ, cliID:
> DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27,
> srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid:
> blk_1323251635763_188363
> 2011-12-28 11:30:42,889 INFO  DataNode.clienttrace
> (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: /
> 107.252.175.3:52637, bytes: 264192, op: HDFS_READ, cliID:
> DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27,
> srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid:
> blk_1323251634921_187798
> 2011-12-28 11:30:42,976 INFO  DataNode.clienttrace
> (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: /
> 107.252.175.3:52755, bytes: 396288, op: HDFS_READ, cliID:
> DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27,
> srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid:
> blk_1323251635359_188075
> 2011-12-28 11:30:57,757 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251602823_167208
> 2011-12-28 11:32:15,757 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251599175_166755
> 2011-12-28 11:32:54,561 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251673745_194676
> 2011-12-28 11:33:33,561 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251640709_189383
> 2011-12-28 11:34:12,557 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251649630_190779
> 2011-12-28 11:34:51,557 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251463964_91885
> 2011-12-28 11:35:23,958 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251636310_188845
> 2011-12-28 11:36:01,155 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1322486683238_54999
> 2011-12-28 11:36:04,157 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251678959_195786
> 2011-12-28 11:36:43,157 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251641803_189561
> 2011-12-28 11:37:20,357 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1322486706170_66445
> 2011-12-28 11:37:44,759 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251646924_190359
> 2011-12-28 11:38:23,759 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251673776_194683
> 2011-12-28 11:38:30,157 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification suc

[jira] [Created] (HDFS-2752) HA: exit if multiple shared dirs are configured

2012-01-05 Thread Eli Collins (Created) (JIRA)
HA: exit if multiple shared dirs are configured
---

 Key: HDFS-2752
 URL: https://issues.apache.org/jira/browse/HDFS-2752
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


We don't support multiple shared edits dirs, we should fail to start with an 
error in this case.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2753) Standby namenode stuck in safenode during a failover

2012-01-05 Thread Hari Mankude (Created) (JIRA)
Standby namenode stuck in safenode during a failover


 Key: HDFS-2753
 URL: https://issues.apache.org/jira/browse/HDFS-2753
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Hari Mankude
Assignee: Hari Mankude


teragen is run to generate write traffic. A manual failover is initiated by 
killing namenode process. namenode which was killed is then restarted and it 
comes up as a standby. However, the standby never exits out of the safemode. 
Looking at the standby logs, it looks like the standby namenode gets 
addStoredBlock immediately after data node registration which results in 
numBlocks getting incremented. There is an optimization in processReport() 
which causes the follow on complete block report to be ignored when NN is in 
safemode. So, NN never exits out of the safemode.

2012-01-05 18:57:46,030 INFO  hdfs.StateChange 
(DatanodeManager.java:registerDatanode(573)) - BLOCK* 
NameSystem.registerDatanode: node registration from 98.137.233.235:50010 
storage DS-526656430-98.137.233.235-50010-1325723536492
2012-01-05 18:57:46,033 INFO  net.NetworkTopology 
(NetworkTopology.java:add(344)) - Adding a new node: 
/default-rack/98.137.233.235:50010
2012-01-05 18:57:46,033 INFO  namenode.FSNamesystem 
(FSNamesystem.java:checkMode(3411)) - DID NOT call initialize at 1
2012-01-05 18:57:46,034 INFO  hdfs.StateChange 
(DatanodeManager.java:registerDatanode(573)) - BLOCK* 
NameSystem.registerDatanode: node registration from 98.137.233.237:50010 
storage DS-1961520590-98.137.233.237-50010-1325725253057
2012-01-05 18:57:46,034 INFO  net.NetworkTopology 
(NetworkTopology.java:add(344)) - Adding a new node: 
/default-rack/98.137.233.237:50010
2012-01-05 18:57:46,042 INFO  namenode.FSNamesystem 
(FSNamesystem.java:checkMode(3411)) - DID NOT call initialize at 1
2012-01-05 18:57:46,045 INFO  hdfs.StateChange 
(BlockManager.java:addStoredBlock(1775)) - BLOCK* addStoredBlock: blockMap 
updated: 98.137.233.235:50010 is added to 
blk_-3183325095022454724_1172{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED]]} size 0
2012-01-05 18:57:46,046 INFO  hdfs.StateChange 
(BlockManager.java:addStoredBlock(1775)) - BLOCK* addStoredBlock: blockMap 
updated: 98.137.233.235:50010 is added to 
blk_5617057825952660916_1173{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED]]} size 0
2012-01-05 18:57:46,046 INFO  hdfs.StateChange 
(BlockManager.java:addStoredBlock(1775)) - BLOCK* addStoredBlock: blockMap 
updated: 98.137.233.237:50010 is added to 
blk_-3183325095022454724_1172{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED], 
ReplicaUnderConstruction[98.137.233.237:50010|FINALIZED]]} size 0
2012-01-05 18:57:46,046 INFO  hdfs.StateChange 
(BlockManager.java:addStoredBlock(1775)) - BLOCK* addStoredBlock: blockMap 
updated: 98.137.233.237:50010 is added to 
blk_5617057825952660916_1173{blockUCState=UNDER_CONSTRUCTION, 
primaryNodeIndex=-1, 
replicas=[ReplicaUnderConstruction[98.137.233.235:50010|FINALIZED], 
ReplicaUnderConstruction[98.137.233.237:50010|FINALIZED]]} size 0
2012-01-05 18:57:46,049 INFO  hdfs.StateChange 
(BlockManager.java:processReport(1365)) - BLOCK* processReport: discarded 
non-initial block report from 98.137.233.235:50010 because namenode still in 
startup phase
2012-01-05 18:57:46,049 INFO  hdfs.StateChange 
(BlockManager.java:processReport(1365)) - BLOCK* processReport: discarded 
non-initial block report from 98.137.233.237:50010 because namenode still in 
startup phase
2012-01-05 18:58:05,167 INFO  namenode.NameNode 
(NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1175, 
Queuing blockReceivedAndDeleted message
2012-01-05 18:58:05,168 INFO  namenode.NameNode 
(NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1175, 
Queuing blockReceivedAndDeleted message
2012-01-05 18:58:06,634 INFO  namenode.NameNode 
(NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1176, 
Queuing blockReceivedAndDeleted message
2012-01-05 18:58:06,636 INFO  namenode.NameNode 
(NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1176, 
Queuing blockReceivedAndDeleted message
2012-01-05 18:58:08,097 INFO  namenode.NameNode 
(NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1177, 
Queuing blockReceivedAndDeleted message
2012-01-05 18:58:08,097 INFO  namenode.NameNode 
(NameNodeRpcServer.java:blockReceivedAndDeleted(894)) - Required GS=1177, 
Queuing blockReceivedAndDeleted message

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/sec

[jira] [Created] (HDFS-2754) HA: enable dfs.namenode.name.dir.restore if HA is enabled

2012-01-05 Thread Eli Collins (Created) (JIRA)
HA: enable dfs.namenode.name.dir.restore if HA is enabled
-

 Key: HDFS-2754
 URL: https://issues.apache.org/jira/browse/HDFS-2754
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins


If HA is enabled it seems like we should always try to restore failed name 
dirs. Let's auto-enable name dir restoration if HA is enabled, at least for 
shared edits dirs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2755) HA: add tests for flaky and failed shared edits directories

2012-01-05 Thread Eli Collins (Created) (JIRA)
HA: add tests for flaky and failed shared edits directories
---

 Key: HDFS-2755
 URL: https://issues.apache.org/jira/browse/HDFS-2755
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins


We should test the behavior with both flaky and failed shared edits dirs. The 
tests should cover when name dir restore is enabled and disabled. There should 
be a warning and an API that we can check if all shared directories are not 
online.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2756) Warm standby does not read the in_progress edit log

2012-01-05 Thread Hari Mankude (Created) (JIRA)
Warm standby does not read the in_progress edit log 


 Key: HDFS-2756
 URL: https://issues.apache.org/jira/browse/HDFS-2756
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: ha
Affects Versions: HA branch (HDFS-1623)
Reporter: Hari Mankude


Warm standby does not read the in_progress edit log. This could result in 
standby taking a long time to become the primary during a failover scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2757) Cannot read a local file that's being written to when using the local read short circuit

2012-01-05 Thread Jean-Daniel Cryans (Created) (JIRA)
Cannot read a local file that's being written to when using the local read 
short circuit


 Key: HDFS-2757
 URL: https://issues.apache.org/jira/browse/HDFS-2757
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 1.0.0
Reporter: Jean-Daniel Cryans
 Fix For: 1.1.0


When testing the tail'ing of a local file with the read short circuit on, I get:

{noformat}
2012-01-06 00:17:31,598 WARN org.apache.hadoop.hdfs.DFSClient: BlockReaderLocal 
requested with incorrect offset:  Offset 0 and length 8230400 don't match block 
blk_-2842916025951313698_454072 ( blockLen 124 )
2012-01-06 00:17:31,598 WARN org.apache.hadoop.hdfs.DFSClient: 
BlockReaderLocal: Removing blk_-2842916025951313698_454072 from cache because 
local file 
/export4/jdcryans/dfs/data/blocksBeingWritten/blk_-2842916025951313698 could 
not be opened.
2012-01-06 00:17:31,599 INFO org.apache.hadoop.hdfs.DFSClient: Failed to read 
block blk_-2842916025951313698_454072 on local machine java.io.IOException:  
Offset 0 and length 8230400 don't match block blk_-2842916025951313698_454072 ( 
blockLen 124 )
2012-01-06 00:17:31,599 INFO org.apache.hadoop.hdfs.DFSClient: Try reading via 
the datanode on /10.4.13.38:51010
java.io.EOFException: 
hdfs://sv4r11s38:9100/hbase-1/.logs/sv4r13s38,62023,1325808100311/sv4r13s38%2C62023%2C1325808100311.1325808100818,
 entryStart=7190409, pos=8230400, end=8230400, edit=5
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-2758) HA: multi-process MiniDFS cluster for testing ungraceful shutdown

2012-01-05 Thread Eli Collins (Created) (JIRA)
HA: multi-process MiniDFS cluster for testing ungraceful shutdown
-

 Key: HDFS-2758
 URL: https://issues.apache.org/jira/browse/HDFS-2758
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: ha, test
Affects Versions: HA branch (HDFS-1623)
Reporter: Eli Collins
Assignee: Eli Collins


We should test ungraceful termination of NN processes, this is generally useful 
for HDFS testing, but particularly needed for HA since we may do this as via 
fencing (send a NN a SIGILL via ssh kill -9, flip the PDU, etc). We can't 
currently do this with the MiniDFSCluster since everything is in one process 
and killing the native thread hosting the java thread terminates the whole 
process.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




RE: Timeouts in Datanodes while block scanning

2012-01-05 Thread Uma Maheswara Rao G
Hi Aaron,
 Presently i am in 0.20.2 version.
I debugged the problem for some time. Could not find any clue. Wanted to know 
any of the dev/users faced this situation in their clusters.
 
Regards,
Uma

From: Aaron T. Myers [a...@cloudera.com]
Sent: Thursday, January 05, 2012 11:36 PM
To: hdfs-dev@hadoop.apache.org
Subject: Re: Timeouts in Datanodes while block scanning

What version of HDFS? This question might be more appropriate for hdfs-user@
.

--
Aaron T. Myers
Software Engineer, Cloudera



On Thu, Jan 5, 2012 at 8:59 AM, Uma Maheswara Rao G wrote:

> Hi,
>
>  I have 10 Node cluster running from last 25days( running with Hbase
> cluster). Recently observed that for every continuos blocks scans, there
> are many timeouts coming in DataNode.
>  After this block scan verifications, again reads succeeded. This
> situation keep occurring many times now, for every continuous block scans.
>  Here Hbase continuously  performing many random reads.
>
> Whether any one faced this situation in your clusters?
>
> Below is the logs with timeouts.
> 2011-12-28 11:30:42,618 INFO  DataNode.clienttrace
> (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: /
> 107.252.175.3:52764, bytes: 264192, op: HDFS_READ, cliID:
> DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27,
> srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid:
> blk_1323251633953_187190
> 2011-12-28 11:30:42,621 INFO  DataNode.clienttrace
> (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: /
> 107.252.175.3:52772, bytes: 396288, op: HDFS_READ, cliID:
> DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27,
> srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid:
> blk_1323251635735_188342
> 2011-12-28 11:30:42,641 INFO  DataNode.clienttrace
> (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: /
> 107.252.175.3:52796, bytes: 396288, op: HDFS_READ, cliID:
> DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27,
> srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid:
> blk_1323251634096_187277
> 2011-12-28 11:30:42,889 INFO  DataNode.clienttrace
> (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: /
> 107.252.175.3:52732, bytes: 264192, op: HDFS_READ, cliID:
> DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27,
> srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid:
> blk_1323251635763_188363
> 2011-12-28 11:30:42,889 INFO  DataNode.clienttrace
> (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: /
> 107.252.175.3:52637, bytes: 264192, op: HDFS_READ, cliID:
> DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27,
> srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid:
> blk_1323251634921_187798
> 2011-12-28 11:30:42,976 INFO  DataNode.clienttrace
> (BlockSender.java:sendBlock(529)) - src: /107.252.175.3:10010, dest: /
> 107.252.175.3:52755, bytes: 396288, op: HDFS_READ, cliID:
> DFSClient_hb_rs_107-252-175-3,20020,1324837769603_1324837770095_1770885334_27,
> srvID: DS-306564179-107.252.175.3-10010-1322019943818, blockid:
> blk_1323251635359_188075
> 2011-12-28 11:30:57,757 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251602823_167208
> 2011-12-28 11:32:15,757 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251599175_166755
> 2011-12-28 11:32:54,561 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251673745_194676
> 2011-12-28 11:33:33,561 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251640709_189383
> 2011-12-28 11:34:12,557 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251649630_190779
> 2011-12-28 11:34:51,557 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251463964_91885
> 2011-12-28 11:35:23,958 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251636310_188845
> 2011-12-28 11:36:01,155 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1322486683238_54999
> 2011-12-28 11:36:04,157 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251678959_195786
> 2011-12-28 11:36:43,157 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1323251641803_189561
> 2011-12-28 11:37:20,357 INFO  datanode.DataBlockScanner
> (DataBlockScanner.java:verifyBlock(481)) - Verification succeeded for
> blk_1322486706170_66445
> 2011-12-28 11:37:44

[jira] [Created] (HDFS-2759) Pre-allocate HDFS edit log files after writing version number

2012-01-05 Thread Aaron T. Myers (Created) (JIRA)
Pre-allocate HDFS edit log files after writing version number
-

 Key: HDFS-2759
 URL: https://issues.apache.org/jira/browse/HDFS-2759
 Project: Hadoop HDFS
  Issue Type: Sub-task
  Components: name-node
Affects Versions: HA branch (HDFS-1623)
Reporter: Aaron T. Myers
Assignee: Aaron T. Myers


In HDFS-2709 it was discovered that there's a potential race wherein edits log 
files are pre-allocated before the version number is written into the header of 
the file. This can cause the standby to read an invalid version. We should 
write the header, then pre-allocate the rest of the file after this point.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Merging some trunk changes to 23

2012-01-05 Thread Eli Collins
Hey gang,

I was looking at the difference between hdfs trunk and 23, for the
purpose of looking at what to merge to 23.
Here's a summary of the differences:

1. BR scalability (HDFS-395, HDFS-2477, HDFS-2495, HDFS-2476)
2. BK support (HDFS-234 and related refactoring)
3. Protobuf RPC changes
4. A dozen straight-forward bugs and trivial cleanup
5. Refactorings (HdfsConstants rename, Refactor RPC out of NN, Untangle NN deps)
6. FSEditLog should not writes long and short as UTF8 (HDFS-362)
7. Configurable # low resource/failed vols (HDFS-2430)
8. Test fixes (for the above)

Seems like we should merge these:
#1 - significant improvement, also aids merges of important bug fixes
like HDFS-1765
#3 - once it's completed
#4 - these are no brainers
#5 - needed to aids future merges like HDFS-1623
#7 - good feature, not very intrusive
#8 - required for others

Seems like we can pass on:
#2 - is contrib, should soak
#5 - changes the image format so requires an HDFS upgrade

I'm planning on merging #4, will update target version on jira before
doing so, as they should be uncontroversial, but wanted to see what
Arun and others thought about the other changes.  Reasonable?

Thanks,
Eli


Re: Merging some trunk changes to 23

2012-01-05 Thread Dhruba Borthakur
+1 for 1 adding "BR scalability (HDFS-395, HDFS-2477, HDFS-2495, HDFS-2476)"

These are very significant performance improvements that would be very
valuable in 0.23.

-dhruba


On Thu, Jan 5, 2012 at 7:52 PM, Eli Collins  wrote:

> Hey gang,
>
> I was looking at the difference between hdfs trunk and 23, for the
> purpose of looking at what to merge to 23.
> Here's a summary of the differences:
>
> 1. BR scalability (HDFS-395, HDFS-2477, HDFS-2495, HDFS-2476)
> 2. BK support (HDFS-234 and related refactoring)
> 3. Protobuf RPC changes
> 4. A dozen straight-forward bugs and trivial cleanup
> 5. Refactorings (HdfsConstants rename, Refactor RPC out of NN, Untangle NN
> deps)
> 6. FSEditLog should not writes long and short as UTF8 (HDFS-362)
> 7. Configurable # low resource/failed vols (HDFS-2430)
> 8. Test fixes (for the above)
>
> Seems like we should merge these:
> #1 - significant improvement, also aids merges of important bug fixes
> like HDFS-1765
> #3 - once it's completed
> #4 - these are no brainers
> #5 - needed to aids future merges like HDFS-1623
> #7 - good feature, not very intrusive
> #8 - required for others
>
> Seems like we can pass on:
> #2 - is contrib, should soak
> #5 - changes the image format so requires an HDFS upgrade
>
> I'm planning on merging #4, will update target version on jira before
> doing so, as they should be uncontroversial, but wanted to see what
> Arun and others thought about the other changes.  Reasonable?
>
> Thanks,
> Eli
>



-- 
Subscribe to my posts at http://www.facebook.com/dhruba