Review Request: Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)

2010-10-29 Thread Patrick Kling

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18/
---

Review request for hadoop-hdfs.


Summary
---

Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)

As discussed in HDFS-, it would be beneficial for tools such as the RAID 
block fixer and RAID FSCK to have access to listCorruptFileBlocks via the 
DistributedFileSystem (rather than having to parse Servlet output, which could 
present a performance problem).

For further details, see https://issues.apache.org/jira/browse/HDFS-1482


This addresses bug HDFS-1482.
https://issues.apache.org/jira/browse/HDFS-1482


Diffs
-

  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSClient.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/CorruptFileBlock.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestCorruptFilesJsp.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/webapps/hdfs/corrupt_files.jsp
 1028517 

Diff: https://reviews.apache.org/r/18/diff


Testing
---

Unit tests (including new test case in TestListCorruptFileBlocks)


Thanks,

Patrick



Re: Review Request: Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)

2010-11-01 Thread Patrick Kling

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18/
---

(Updated 2010-11-01 11:50:37.134842)


Review request for hadoop-hdfs.


Changes
---

ClientProtocol.listCorruptFileBlocks now returns a list of file names and a 
cookie string, which can be used to iteratively retrieve all corrupt files.


Summary
---

Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)

As discussed in HDFS-, it would be beneficial for tools such as the RAID 
block fixer and RAID FSCK to have access to listCorruptFileBlocks via the 
DistributedFileSystem (rather than having to parse Servlet output, which could 
present a performance problem).

For further details, see https://issues.apache.org/jira/browse/HDFS-1482


This addresses bug HDFS-1482.
https://issues.apache.org/jira/browse/HDFS-1482


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSClient.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/CorruptFileBlocks.java
 PRE-CREATION 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestCorruptFilesJsp.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
 1028517 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/webapps/hdfs/corrupt_files.jsp
 1028517 

Diff: https://reviews.apache.org/r/18/diff


Testing
---

Unit tests (including new test case in TestListCorruptFileBlocks)


Thanks,

Patrick



Review Request: DFSClient.getBlockLocations returns BlockLocations with no indication that the corresponding blocks are corrupt

2010-11-02 Thread Patrick Kling

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27/
---

Review request for hadoop-hdfs.


Summary
---

DFSClient.getBlockLocations returns BlockLocations with no indication that the 
corresponding blocks are corrupt

When there are no uncorrupted replicas of a block, 
FSNamesystem.getBlockLocations returns LocatedBlocks corresponding to corrupt 
blocks. When DFSClient converts these to BlockLocations, the information that 
the corresponding block is corrupt is lost. We should add a field to 
BlockLocation to indicate whether the corresponding block is corrupt in order 
to warn the client that reading this block will fail. This would be especially 
useful for tools such as RAID FSCK, which could then easily inspect whether 
data or parity blocks are corrupted without having to make direct RPC calls


This addresses bug HDFS-1483.
https://issues.apache.org/jira/browse/HDFS-1483


Diffs
-

  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSUtil.java
 1028386 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/TestDFSUtil.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/27/diff


Testing
---

TestDFSUtil


Thanks,

Patrick



Re: Review Request: DFSClient.getBlockLocations returns BlockLocations with no indication that the corresponding blocks are corrupt

2010-11-03 Thread Patrick Kling

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27/
---

(Updated 2010-11-03 11:33:39.415750)


Review request for hadoop-hdfs.


Changes
---

Incorporated Ram's feedback. Thank you!


Summary
---

DFSClient.getBlockLocations returns BlockLocations with no indication that the 
corresponding blocks are corrupt

When there are no uncorrupted replicas of a block, 
FSNamesystem.getBlockLocations returns LocatedBlocks corresponding to corrupt 
blocks. When DFSClient converts these to BlockLocations, the information that 
the corresponding block is corrupt is lost. We should add a field to 
BlockLocation to indicate whether the corresponding block is corrupt in order 
to warn the client that reading this block will fail. This would be especially 
useful for tools such as RAID FSCK, which could then easily inspect whether 
data or parity blocks are corrupted without having to make direct RPC calls


This addresses bug HDFS-1483.
https://issues.apache.org/jira/browse/HDFS-1483


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSUtil.java
 1028386 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/TestDFSUtil.java
 PRE-CREATION 

Diff: https://reviews.apache.org/r/27/diff


Testing
---

TestDFSUtil


Thanks,

Patrick



Re: Review Request: DFSClient.getBlockLocations returns BlockLocations with no indication that the corresponding blocks are corrupt

2010-11-03 Thread Patrick Kling


> On 2010-11-03 11:41:39, Ramkumar Vadali wrote:
> > Looks good to me, but this diff depends on a hadoop-common change, right?

It depends on HADOOP-7013, which can be found here: 
https://reviews.apache.org/r/26/


- Patrick


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/27/#review29
---


On 2010-11-03 11:33:39, Patrick Kling wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/27/
> ---
> 
> (Updated 2010-11-03 11:33:39)
> 
> 
> Review request for hadoop-hdfs.
> 
> 
> Summary
> ---
> 
> DFSClient.getBlockLocations returns BlockLocations with no indication that 
> the corresponding blocks are corrupt
> 
> When there are no uncorrupted replicas of a block, 
> FSNamesystem.getBlockLocations returns LocatedBlocks corresponding to corrupt 
> blocks. When DFSClient converts these to BlockLocations, the information that 
> the corresponding block is corrupt is lost. We should add a field to 
> BlockLocation to indicate whether the corresponding block is corrupt in order 
> to warn the client that reading this block will fail. This would be 
> especially useful for tools such as RAID FSCK, which could then easily 
> inspect whether data or parity blocks are corrupted without having to make 
> direct RPC calls
> 
> 
> This addresses bug HDFS-1483.
> https://issues.apache.org/jira/browse/HDFS-1483
> 
> 
> Diffs
> -
> 
>   
> http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSUtil.java
>  1028386 
>   
> http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/TestDFSUtil.java
>  PRE-CREATION 
> 
> Diff: https://reviews.apache.org/r/27/diff
> 
> 
> Testing
> ---
> 
> TestDFSUtil
> 
> 
> Thanks,
> 
> Patrick
> 
>



Re: Review Request: Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)

2010-11-08 Thread Patrick Kling

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18/
---

(Updated 2010-11-08 19:01:36.459005)


Review request for hadoop-hdfs.


Changes
---

Added listCorruptFileBlocks to FileSystem


Summary
---

Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)

As discussed in HDFS-, it would be beneficial for tools such as the RAID 
block fixer and RAID FSCK to have access to listCorruptFileBlocks via the 
DistributedFileSystem (rather than having to parse Servlet output, which could 
present a performance problem).

For further details, see https://issues.apache.org/jira/browse/HDFS-1482


This addresses bug HDFS-1482.
https://issues.apache.org/jira/browse/HDFS-1482


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSClient.java
 1032664 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
 1032664 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/HftpFileSystem.java
 1032664 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
 1032664 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
 1032664 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
 1032664 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestCorruptFilesJsp.java
 1032664 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
 1032664 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
 1032664 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/webapps/hdfs/corrupt_files.jsp
 1032664 

Diff: https://reviews.apache.org/r/18/diff


Testing
---

Unit tests (including new test case in TestListCorruptFileBlocks)


Thanks,

Patrick



Review Request: Populate needed replication queues before leaving safe mode.

2010-11-16 Thread Patrick Kling

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/
---

Review request for hadoop-hdfs.


Summary
---

This patch introduces a new configuration variable 
dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for 
which block reports have to be received before the NameNode will start 
initializing the needed replication queues. Once a sufficient number of block 
reports have been received, the queues are initialized while the NameNode is 
still in safe mode. After the queues are initialized, subsequent block reports 
are handled by updating the queues incrementally.

The benefit of this is twofold:
- It allows us to compute the replication queues while we are waiting for the 
last few block reports (when the NameNode is mostly idle). Once these block 
reports have been received, we can then immediately leave safe mode without 
having to wait for the computation of the needed replication queues (which 
requires a full traversal of the blocks map).
- With Raid, it may not be necessary to stay in safe mode until all blocks have 
been reported. Using this change, we could monitor if all of the missing blocks 
can be recreated using parity information and if so leave safe mode early. In 
order for this monitoring to work, we need access to the needed replication 
queues while the NameNode is still in safe mode.


This addresses bug HDFS-1476.
https://issues.apache.org/jira/browse/HDFS-1476


Diffs
-

  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
 1035545 

Diff: https://reviews.apache.org/r/105/diff


Testing
---

new test case in TestListCorruptFileBlocks


Thanks,

Patrick



Re: Review Request: Populate needed replication queues before leaving safe mode.

2010-11-16 Thread Patrick Kling

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/
---

(Updated 2010-11-16 18:01:44.268029)


Review request for hadoop-hdfs.


Changes
---

Incorporated Dhruba's feedback. Thank you!


Summary
---

This patch introduces a new configuration variable 
dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for 
which block reports have to be received before the NameNode will start 
initializing the needed replication queues. Once a sufficient number of block 
reports have been received, the queues are initialized while the NameNode is 
still in safe mode. After the queues are initialized, subsequent block reports 
are handled by updating the queues incrementally.

The benefit of this is twofold:
- It allows us to compute the replication queues while we are waiting for the 
last few block reports (when the NameNode is mostly idle). Once these block 
reports have been received, we can then immediately leave safe mode without 
having to wait for the computation of the needed replication queues (which 
requires a full traversal of the blocks map).
- With Raid, it may not be necessary to stay in safe mode until all blocks have 
been reported. Using this change, we could monitor if all of the missing blocks 
can be recreated using parity information and if so leave safe mode early. In 
order for this monitoring to work, we need access to the needed replication 
queues while the NameNode is still in safe mode.


This addresses bug HDFS-1476.
https://issues.apache.org/jira/browse/HDFS-1476


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
 1035545 

Diff: https://reviews.apache.org/r/105/diff


Testing
---

new test case in TestListCorruptFileBlocks


Thanks,

Patrick



Re: Review Request: Populate needed replication queues before leaving safe mode.

2010-11-18 Thread Patrick Kling

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/
---

(Updated 2010-11-18 10:49:38.102334)


Review request for hadoop-hdfs.


Changes
---

Changed default value of replication queue threshold to safe mode threshold.


Summary
---

This patch introduces a new configuration variable 
dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for 
which block reports have to be received before the NameNode will start 
initializing the needed replication queues. Once a sufficient number of block 
reports have been received, the queues are initialized while the NameNode is 
still in safe mode. After the queues are initialized, subsequent block reports 
are handled by updating the queues incrementally.

The benefit of this is twofold:
- It allows us to compute the replication queues while we are waiting for the 
last few block reports (when the NameNode is mostly idle). Once these block 
reports have been received, we can then immediately leave safe mode without 
having to wait for the computation of the needed replication queues (which 
requires a full traversal of the blocks map).
- With Raid, it may not be necessary to stay in safe mode until all blocks have 
been reported. Using this change, we could monitor if all of the missing blocks 
can be recreated using parity information and if so leave safe mode early. In 
order for this monitoring to work, we need access to the needed replication 
queues while the NameNode is still in safe mode.


This addresses bug HDFS-1476.
https://issues.apache.org/jira/browse/HDFS-1476


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
 1035545 

Diff: https://reviews.apache.org/r/105/diff


Testing
---

new test case in TestListCorruptFileBlocks


Thanks,

Patrick



Re: Review Request: Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)

2010-11-18 Thread Patrick Kling

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18/
---

(Updated 2010-11-18 20:10:36.970120)


Review request for hadoop-hdfs.


Changes
---

Added listCorruptFileBlocks to FileContext.


Summary
---

Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)

As discussed in HDFS-, it would be beneficial for tools such as the RAID 
block fixer and RAID FSCK to have access to listCorruptFileBlocks via the 
DistributedFileSystem (rather than having to parse Servlet output, which could 
present a performance problem).

For further details, see https://issues.apache.org/jira/browse/HDFS-1482


This addresses bug HDFS-1482.
https://issues.apache.org/jira/browse/HDFS-1482


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/fs/Hdfs.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSClient.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestCorruptFilesJsp.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/webapps/hdfs/corrupt_files.jsp
 1036663 

Diff: https://reviews.apache.org/r/18/diff


Testing
---

Unit tests (including new test case in TestListCorruptFileBlocks)


Thanks,

Patrick



Re: Review Request: Populate needed replication queues before leaving safe mode.

2010-11-19 Thread Patrick Kling

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/
---

(Updated 2010-11-19 13:07:20.231197)


Review request for hadoop-hdfs.


Changes
---

Updated test case to play nice with HDFS-1482.


Summary
---

This patch introduces a new configuration variable 
dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for 
which block reports have to be received before the NameNode will start 
initializing the needed replication queues. Once a sufficient number of block 
reports have been received, the queues are initialized while the NameNode is 
still in safe mode. After the queues are initialized, subsequent block reports 
are handled by updating the queues incrementally.

The benefit of this is twofold:
- It allows us to compute the replication queues while we are waiting for the 
last few block reports (when the NameNode is mostly idle). Once these block 
reports have been received, we can then immediately leave safe mode without 
having to wait for the computation of the needed replication queues (which 
requires a full traversal of the blocks map).
- With Raid, it may not be necessary to stay in safe mode until all blocks have 
been reported. Using this change, we could monitor if all of the missing blocks 
can be recreated using parity information and if so leave safe mode early. In 
order for this monitoring to work, we need access to the needed replication 
queues while the NameNode is still in safe mode.


This addresses bug HDFS-1476.
https://issues.apache.org/jira/browse/HDFS-1476


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java
 1035545 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
 1035545 

Diff: https://reviews.apache.org/r/105/diff


Testing
---

new test case in TestListCorruptFileBlocks


Thanks,

Patrick



Re: Review Request: Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)

2010-11-22 Thread Patrick Kling

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/18/
---

(Updated 2010-11-22 13:47:46.894388)


Review request for hadoop-hdfs.


Changes
---

Fixed javadoc warnings.


Summary
---

Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)

As discussed in HDFS-, it would be beneficial for tools such as the RAID 
block fixer and RAID FSCK to have access to listCorruptFileBlocks via the 
DistributedFileSystem (rather than having to parse Servlet output, which could 
present a performance problem).

For further details, see https://issues.apache.org/jira/browse/HDFS-1482


This addresses bug HDFS-1482.
https://issues.apache.org/jira/browse/HDFS-1482


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/fs/Hdfs.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSClient.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DistributedFileSystem.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestCorruptFilesJsp.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFsck.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
 1036663 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/webapps/hdfs/corrupt_files.jsp
 1036663 

Diff: https://reviews.apache.org/r/18/diff


Testing
---

Unit tests (including new test case in TestListCorruptFileBlocks)


Thanks,

Patrick



Re: Review Request: Populate needed replication queues before leaving safe mode.

2010-12-09 Thread Patrick Kling

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/105/
---

(Updated 2010-12-09 19:52:13.188412)


Review request for hadoop-hdfs.


Changes
---

- Updated patch to apply to current trunk.
- In BlockManager.markBlockAsCorrupt only update needed replication queues if 
they have been initialized


Summary
---

This patch introduces a new configuration variable 
dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for 
which block reports have to be received before the NameNode will start 
initializing the needed replication queues. Once a sufficient number of block 
reports have been received, the queues are initialized while the NameNode is 
still in safe mode. After the queues are initialized, subsequent block reports 
are handled by updating the queues incrementally.

The benefit of this is twofold:
- It allows us to compute the replication queues while we are waiting for the 
last few block reports (when the NameNode is mostly idle). Once these block 
reports have been received, we can then immediately leave safe mode without 
having to wait for the computation of the needed replication queues (which 
requires a full traversal of the blocks map).
- With Raid, it may not be necessary to stay in safe mode until all blocks have 
been reported. Using this change, we could monitor if all of the missing blocks 
can be recreated using parity information and if so leave safe mode early. In 
order for this monitoring to work, we need access to the needed replication 
queues while the NameNode is still in safe mode.


This addresses bug HDFS-1476.
https://issues.apache.org/jira/browse/HDFS-1476


Diffs (updated)
-

  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java
 1044182 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java
 1044182 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java
 1044182 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java
 1044182 
  
http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java
 1044182 

Diff: https://reviews.apache.org/r/105/diff


Testing
---

new test case in TestListCorruptFileBlocks


Thanks,

Patrick



[jira] Created: (HDFS-1476) listCorruptFileBlocks should be functional while the name node is still in safe mode

2010-10-25 Thread Patrick Kling (JIRA)
listCorruptFileBlocks should be functional while the name node is still in safe 
mode


 Key: HDFS-1476
 URL: https://issues.apache.org/jira/browse/HDFS-1476
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Patrick Kling


This would allow us to detect whether missing blocks can be fixed using Raid 
and if that is the case exit safe mode earlier.

One way to make listCorruptFileBlocks available before the name node has exited 
from safe mode would be to perform a scan of the blocks map on each call to 
listCorruptFileBlocks to determine if there are any blocks with no replicas. 
This scan could be parallelized by dividing the space of block IDs into 
multiple intervals than can be scanned independently.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1477) Make NameNode Reconfigurable.

2010-10-25 Thread Patrick Kling (JIRA)
Make NameNode Reconfigurable.
-

 Key: HDFS-1477
 URL: https://issues.apache.org/jira/browse/HDFS-1477
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Patrick Kling


Modify NameNode to implement the interface Reconfigurable proposed in 
HADOOP-7001. This would allow us to change certain configuration properties 
without restarting the name node.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1482) Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)

2010-10-27 Thread Patrick Kling (JIRA)
Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)
---

 Key: HDFS-1482
 URL: https://issues.apache.org/jira/browse/HDFS-1482
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Patrick Kling


As discussed in HDFS-, it would be beneficial for tools such as the RAID 
block fixer and RAID FSCK to have access to listCorruptFileBlocks via the 
DistributedFileSystem (rather than having to parse Servlet output, which could 
present a performance problem).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1483) DFSClient.getBlockLocations returns BlockLocations with no indication that the corresponding blocks are corrupt

2010-10-27 Thread Patrick Kling (JIRA)
DFSClient.getBlockLocations returns BlockLocations with no indication that the 
corresponding blocks are corrupt
---

 Key: HDFS-1483
 URL: https://issues.apache.org/jira/browse/HDFS-1483
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Patrick Kling


When there are no uncorrupted replicas of a block, 
FSNamesystem.getBlockLocations returns LocatedBlocks corresponding to corrupt 
blocks. When DFSClient converts these to BlockLocations, the information that 
the corresponding block is corrupt is lost. We should add a field to 
BlockLocation to indicate whether the corresponding block is corrupt in order 
to warn the client that reading this block will fail. This would be especially 
useful for tools such as RAID FSCK, which could then easily inspect whether 
data or parity blocks are corrupted without having to make direct RPC calls.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1514) project.version in aop.xml is out of sync with build.xml

2010-11-22 Thread Patrick Kling (JIRA)
project.version in aop.xml is out of sync with build.xml


 Key: HDFS-1514
 URL: https://issues.apache.org/jira/browse/HDFS-1514
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Patrick Kling


project.version in aop.xml is set to 0.22.0-SNAPSHOT whereas version in 
build.xml is set to 0.23.0-SNAPSHOT. This causes ant test-patch to fail when 
using a local maven repository.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1527) SocketOutputStream.transferToFully fails for blocks >= 2GB on 32 bit JVM

2010-12-03 Thread Patrick Kling (JIRA)
SocketOutputStream.transferToFully fails for blocks >= 2GB on 32 bit JVM


 Key: HDFS-1527
 URL: https://issues.apache.org/jira/browse/HDFS-1527
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: data-node
Affects Versions: 0.23.0
 Environment: 32 bit JVM
Reporter: Patrick Kling
 Fix For: 0.23.0


On 32 bit JVM, SocketOutputStream.transferToFully() fails if the block size is 
>= 2GB. We should fall back to a normal transfer in this case. 


{code}
2010-12-02 19:04:23,490 ERROR datanode.DataNode 
(BlockSender.java:sendChunks(399)) - BlockSender.sendChunks() exception: 
java.io.IOException: Value too large
 for defined data type
at sun.nio.ch.FileChannelImpl.transferTo0(Native Method)
at 
sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:418)
at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:519)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:204)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:386)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:475)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.opReadBlock(DataXceiver.java:196)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opReadBlock(DataTransferProtocol.java:356)
at 
org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:328)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:130)
at java.lang.Thread.run(Thread.java:619)
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1533) A more elegant FileSystem#listCorruptFileBlocks AP (HDFS portion)

2010-12-08 Thread Patrick Kling (JIRA)
A more elegant FileSystem#listCorruptFileBlocks AP (HDFS portion)
-

 Key: HDFS-1533
 URL: https://issues.apache.org/jira/browse/HDFS-1533
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: hdfs client
Reporter: Patrick Kling
Assignee: Patrick Kling




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HDFS-1535) TestBlockRecovery should not use fixed port

2010-12-10 Thread Patrick Kling (JIRA)
TestBlockRecovery should not use fixed port
---

 Key: HDFS-1535
 URL: https://issues.apache.org/jira/browse/HDFS-1535
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.23.0
Reporter: Patrick Kling


TestBlockRecovery uses the default data node port 50075. This causes the test 
to fail if this port is not available.

{code}
Testcase: testFinalizedReplicas took 0.567 sec
Caused an ERROR
Port in use: 0.0.0.0:50075
java.net.BindException: Port in use: 0.0.0.0:50075
at org.apache.hadoop.http.HttpServer.start(HttpServer.java:625)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:358)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:502)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:281)
at 
org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery.startUp(TestBlockRecovery.java:104)
Caused by: java.net.BindException: Address already in use
at sun.nio.ch.Net.bind(Native Method)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
at 
org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
at org.apache.hadoop.http.HttpServer.start(HttpServer.java:582)
{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HDFS-1535) TestBlockRecovery should not use fixed port

2010-12-11 Thread Patrick Kling (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Kling resolved HDFS-1535.
-

Resolution: Duplicate

> TestBlockRecovery should not use fixed port
> ---
>
> Key: HDFS-1535
> URL: https://issues.apache.org/jira/browse/HDFS-1535
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 0.23.0
>    Reporter: Patrick Kling
>
> TestBlockRecovery uses the default data node port 50075. This causes the test 
> to fail if this port is not available.
> {code}
> Testcase: testFinalizedReplicas took 0.567 sec
> Caused an ERROR
> Port in use: 0.0.0.0:50075
> java.net.BindException: Port in use: 0.0.0.0:50075
> at org.apache.hadoop.http.HttpServer.start(HttpServer.java:625)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:358)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:502)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:281)
> at 
> org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery.startUp(TestBlockRecovery.java:104)
> Caused by: java.net.BindException: Address already in use
> at sun.nio.ch.Net.bind(Native Method)
> at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119)
> at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
> at 
> org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
> at org.apache.hadoop.http.HttpServer.start(HttpServer.java:582)
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.