Review Request: Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18/ --- Review request for hadoop-hdfs. Summary --- Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol) As discussed in HDFS-, it would be beneficial for tools such as the RAID block fixer and RAID FSCK to have access to listCorruptFileBlocks via the DistributedFileSystem (rather than having to parse Servlet output, which could present a performance problem). For further details, see https://issues.apache.org/jira/browse/HDFS-1482 This addresses bug HDFS-1482. https://issues.apache.org/jira/browse/HDFS-1482 Diffs - http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSClient.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DistributedFileSystem.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/CorruptFileBlock.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestCorruptFilesJsp.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFsck.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/webapps/hdfs/corrupt_files.jsp 1028517 Diff: https://reviews.apache.org/r/18/diff Testing --- Unit tests (including new test case in TestListCorruptFileBlocks) Thanks, Patrick
Re: Review Request: Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18/ --- (Updated 2010-11-01 11:50:37.134842) Review request for hadoop-hdfs. Changes --- ClientProtocol.listCorruptFileBlocks now returns a list of file names and a cookie string, which can be used to iteratively retrieve all corrupt files. Summary --- Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol) As discussed in HDFS-, it would be beneficial for tools such as the RAID block fixer and RAID FSCK to have access to listCorruptFileBlocks via the DistributedFileSystem (rather than having to parse Servlet output, which could present a performance problem). For further details, see https://issues.apache.org/jira/browse/HDFS-1482 This addresses bug HDFS-1482. https://issues.apache.org/jira/browse/HDFS-1482 Diffs (updated) - http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSClient.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DistributedFileSystem.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/CorruptFileBlocks.java PRE-CREATION http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestCorruptFilesJsp.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFsck.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1028517 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/webapps/hdfs/corrupt_files.jsp 1028517 Diff: https://reviews.apache.org/r/18/diff Testing --- Unit tests (including new test case in TestListCorruptFileBlocks) Thanks, Patrick
Review Request: DFSClient.getBlockLocations returns BlockLocations with no indication that the corresponding blocks are corrupt
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27/ --- Review request for hadoop-hdfs. Summary --- DFSClient.getBlockLocations returns BlockLocations with no indication that the corresponding blocks are corrupt When there are no uncorrupted replicas of a block, FSNamesystem.getBlockLocations returns LocatedBlocks corresponding to corrupt blocks. When DFSClient converts these to BlockLocations, the information that the corresponding block is corrupt is lost. We should add a field to BlockLocation to indicate whether the corresponding block is corrupt in order to warn the client that reading this block will fail. This would be especially useful for tools such as RAID FSCK, which could then easily inspect whether data or parity blocks are corrupted without having to make direct RPC calls This addresses bug HDFS-1483. https://issues.apache.org/jira/browse/HDFS-1483 Diffs - http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSUtil.java 1028386 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/TestDFSUtil.java PRE-CREATION Diff: https://reviews.apache.org/r/27/diff Testing --- TestDFSUtil Thanks, Patrick
Re: Review Request: DFSClient.getBlockLocations returns BlockLocations with no indication that the corresponding blocks are corrupt
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27/ --- (Updated 2010-11-03 11:33:39.415750) Review request for hadoop-hdfs. Changes --- Incorporated Ram's feedback. Thank you! Summary --- DFSClient.getBlockLocations returns BlockLocations with no indication that the corresponding blocks are corrupt When there are no uncorrupted replicas of a block, FSNamesystem.getBlockLocations returns LocatedBlocks corresponding to corrupt blocks. When DFSClient converts these to BlockLocations, the information that the corresponding block is corrupt is lost. We should add a field to BlockLocation to indicate whether the corresponding block is corrupt in order to warn the client that reading this block will fail. This would be especially useful for tools such as RAID FSCK, which could then easily inspect whether data or parity blocks are corrupted without having to make direct RPC calls This addresses bug HDFS-1483. https://issues.apache.org/jira/browse/HDFS-1483 Diffs (updated) - http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSUtil.java 1028386 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/TestDFSUtil.java PRE-CREATION Diff: https://reviews.apache.org/r/27/diff Testing --- TestDFSUtil Thanks, Patrick
Re: Review Request: DFSClient.getBlockLocations returns BlockLocations with no indication that the corresponding blocks are corrupt
> On 2010-11-03 11:41:39, Ramkumar Vadali wrote: > > Looks good to me, but this diff depends on a hadoop-common change, right? It depends on HADOOP-7013, which can be found here: https://reviews.apache.org/r/26/ - Patrick --- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/27/#review29 --- On 2010-11-03 11:33:39, Patrick Kling wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/27/ > --- > > (Updated 2010-11-03 11:33:39) > > > Review request for hadoop-hdfs. > > > Summary > --- > > DFSClient.getBlockLocations returns BlockLocations with no indication that > the corresponding blocks are corrupt > > When there are no uncorrupted replicas of a block, > FSNamesystem.getBlockLocations returns LocatedBlocks corresponding to corrupt > blocks. When DFSClient converts these to BlockLocations, the information that > the corresponding block is corrupt is lost. We should add a field to > BlockLocation to indicate whether the corresponding block is corrupt in order > to warn the client that reading this block will fail. This would be > especially useful for tools such as RAID FSCK, which could then easily > inspect whether data or parity blocks are corrupted without having to make > direct RPC calls > > > This addresses bug HDFS-1483. > https://issues.apache.org/jira/browse/HDFS-1483 > > > Diffs > - > > > http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSUtil.java > 1028386 > > http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/TestDFSUtil.java > PRE-CREATION > > Diff: https://reviews.apache.org/r/27/diff > > > Testing > --- > > TestDFSUtil > > > Thanks, > > Patrick > >
Re: Review Request: Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18/ --- (Updated 2010-11-08 19:01:36.459005) Review request for hadoop-hdfs. Changes --- Added listCorruptFileBlocks to FileSystem Summary --- Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol) As discussed in HDFS-, it would be beneficial for tools such as the RAID block fixer and RAID FSCK to have access to listCorruptFileBlocks via the DistributedFileSystem (rather than having to parse Servlet output, which could present a performance problem). For further details, see https://issues.apache.org/jira/browse/HDFS-1482 This addresses bug HDFS-1482. https://issues.apache.org/jira/browse/HDFS-1482 Diffs (updated) - http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSClient.java 1032664 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DistributedFileSystem.java 1032664 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/HftpFileSystem.java 1032664 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java 1032664 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java 1032664 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java 1032664 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestCorruptFilesJsp.java 1032664 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFsck.java 1032664 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1032664 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/webapps/hdfs/corrupt_files.jsp 1032664 Diff: https://reviews.apache.org/r/18/diff Testing --- Unit tests (including new test case in TestListCorruptFileBlocks) Thanks, Patrick
Review Request: Populate needed replication queues before leaving safe mode.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/105/ --- Review request for hadoop-hdfs. Summary --- This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for which block reports have to be received before the NameNode will start initializing the needed replication queues. Once a sufficient number of block reports have been received, the queues are initialized while the NameNode is still in safe mode. After the queues are initialized, subsequent block reports are handled by updating the queues incrementally. The benefit of this is twofold: - It allows us to compute the replication queues while we are waiting for the last few block reports (when the NameNode is mostly idle). Once these block reports have been received, we can then immediately leave safe mode without having to wait for the computation of the needed replication queues (which requires a full traversal of the blocks map). - With Raid, it may not be necessary to stay in safe mode until all blocks have been reported. Using this change, we could monitor if all of the missing blocks can be recreated using parity information and if so leave safe mode early. In order for this monitoring to work, we need access to the needed replication queues while the NameNode is still in safe mode. This addresses bug HDFS-1476. https://issues.apache.org/jira/browse/HDFS-1476 Diffs - http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1035545 Diff: https://reviews.apache.org/r/105/diff Testing --- new test case in TestListCorruptFileBlocks Thanks, Patrick
Re: Review Request: Populate needed replication queues before leaving safe mode.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/105/ --- (Updated 2010-11-16 18:01:44.268029) Review request for hadoop-hdfs. Changes --- Incorporated Dhruba's feedback. Thank you! Summary --- This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for which block reports have to be received before the NameNode will start initializing the needed replication queues. Once a sufficient number of block reports have been received, the queues are initialized while the NameNode is still in safe mode. After the queues are initialized, subsequent block reports are handled by updating the queues incrementally. The benefit of this is twofold: - It allows us to compute the replication queues while we are waiting for the last few block reports (when the NameNode is mostly idle). Once these block reports have been received, we can then immediately leave safe mode without having to wait for the computation of the needed replication queues (which requires a full traversal of the blocks map). - With Raid, it may not be necessary to stay in safe mode until all blocks have been reported. Using this change, we could monitor if all of the missing blocks can be recreated using parity information and if so leave safe mode early. In order for this monitoring to work, we need access to the needed replication queues while the NameNode is still in safe mode. This addresses bug HDFS-1476. https://issues.apache.org/jira/browse/HDFS-1476 Diffs (updated) - http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1035545 Diff: https://reviews.apache.org/r/105/diff Testing --- new test case in TestListCorruptFileBlocks Thanks, Patrick
Re: Review Request: Populate needed replication queues before leaving safe mode.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/105/ --- (Updated 2010-11-18 10:49:38.102334) Review request for hadoop-hdfs. Changes --- Changed default value of replication queue threshold to safe mode threshold. Summary --- This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for which block reports have to be received before the NameNode will start initializing the needed replication queues. Once a sufficient number of block reports have been received, the queues are initialized while the NameNode is still in safe mode. After the queues are initialized, subsequent block reports are handled by updating the queues incrementally. The benefit of this is twofold: - It allows us to compute the replication queues while we are waiting for the last few block reports (when the NameNode is mostly idle). Once these block reports have been received, we can then immediately leave safe mode without having to wait for the computation of the needed replication queues (which requires a full traversal of the blocks map). - With Raid, it may not be necessary to stay in safe mode until all blocks have been reported. Using this change, we could monitor if all of the missing blocks can be recreated using parity information and if so leave safe mode early. In order for this monitoring to work, we need access to the needed replication queues while the NameNode is still in safe mode. This addresses bug HDFS-1476. https://issues.apache.org/jira/browse/HDFS-1476 Diffs (updated) - http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1035545 Diff: https://reviews.apache.org/r/105/diff Testing --- new test case in TestListCorruptFileBlocks Thanks, Patrick
Re: Review Request: Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18/ --- (Updated 2010-11-18 20:10:36.970120) Review request for hadoop-hdfs. Changes --- Added listCorruptFileBlocks to FileContext. Summary --- Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol) As discussed in HDFS-, it would be beneficial for tools such as the RAID block fixer and RAID FSCK to have access to listCorruptFileBlocks via the DistributedFileSystem (rather than having to parse Servlet output, which could present a performance problem). For further details, see https://issues.apache.org/jira/browse/HDFS-1482 This addresses bug HDFS-1482. https://issues.apache.org/jira/browse/HDFS-1482 Diffs (updated) - http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/fs/Hdfs.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSClient.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DistributedFileSystem.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestCorruptFilesJsp.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFsck.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/webapps/hdfs/corrupt_files.jsp 1036663 Diff: https://reviews.apache.org/r/18/diff Testing --- Unit tests (including new test case in TestListCorruptFileBlocks) Thanks, Patrick
Re: Review Request: Populate needed replication queues before leaving safe mode.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/105/ --- (Updated 2010-11-19 13:07:20.231197) Review request for hadoop-hdfs. Changes --- Updated test case to play nice with HDFS-1482. Summary --- This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for which block reports have to be received before the NameNode will start initializing the needed replication queues. Once a sufficient number of block reports have been received, the queues are initialized while the NameNode is still in safe mode. After the queues are initialized, subsequent block reports are handled by updating the queues incrementally. The benefit of this is twofold: - It allows us to compute the replication queues while we are waiting for the last few block reports (when the NameNode is mostly idle). Once these block reports have been received, we can then immediately leave safe mode without having to wait for the computation of the needed replication queues (which requires a full traversal of the blocks map). - With Raid, it may not be necessary to stay in safe mode until all blocks have been reported. Using this change, we could monitor if all of the missing blocks can be recreated using parity information and if so leave safe mode early. In order for this monitoring to work, we need access to the needed replication queues while the NameNode is still in safe mode. This addresses bug HDFS-1476. https://issues.apache.org/jira/browse/HDFS-1476 Diffs (updated) - http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java 1035545 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1035545 Diff: https://reviews.apache.org/r/105/diff Testing --- new test case in TestListCorruptFileBlocks Thanks, Patrick
Re: Review Request: Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/18/ --- (Updated 2010-11-22 13:47:46.894388) Review request for hadoop-hdfs. Changes --- Fixed javadoc warnings. Summary --- Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol) As discussed in HDFS-, it would be beneficial for tools such as the RAID block fixer and RAID FSCK to have access to listCorruptFileBlocks via the DistributedFileSystem (rather than having to parse Servlet output, which could present a performance problem). For further details, see https://issues.apache.org/jira/browse/HDFS-1482 This addresses bug HDFS-1482. https://issues.apache.org/jira/browse/HDFS-1482 Diffs (updated) - http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/fs/Hdfs.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSClient.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DistributedFileSystem.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/protocol/ClientProtocol.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NameNode.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/NamenodeFsck.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestCorruptFilesJsp.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestFsck.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1036663 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/webapps/hdfs/corrupt_files.jsp 1036663 Diff: https://reviews.apache.org/r/18/diff Testing --- Unit tests (including new test case in TestListCorruptFileBlocks) Thanks, Patrick
Re: Review Request: Populate needed replication queues before leaving safe mode.
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/105/ --- (Updated 2010-12-09 19:52:13.188412) Review request for hadoop-hdfs. Changes --- - Updated patch to apply to current trunk. - In BlockManager.markBlockAsCorrupt only update needed replication queues if they have been initialized Summary --- This patch introduces a new configuration variable dfs.namenode.replqueue.threshold-pct that determines the fraction of blocks for which block reports have to be received before the NameNode will start initializing the needed replication queues. Once a sufficient number of block reports have been received, the queues are initialized while the NameNode is still in safe mode. After the queues are initialized, subsequent block reports are handled by updating the queues incrementally. The benefit of this is twofold: - It allows us to compute the replication queues while we are waiting for the last few block reports (when the NameNode is mostly idle). Once these block reports have been received, we can then immediately leave safe mode without having to wait for the computation of the needed replication queues (which requires a full traversal of the blocks map). - With Raid, it may not be necessary to stay in safe mode until all blocks have been reported. Using this change, we could monitor if all of the missing blocks can be recreated using parity information and if so leave safe mode early. In order for this monitoring to work, we need access to the needed replication queues while the NameNode is still in safe mode. This addresses bug HDFS-1476. https://issues.apache.org/jira/browse/HDFS-1476 Diffs (updated) - http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/DFSConfigKeys.java 1044182 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/BlockManager.java 1044182 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/java/org/apache/hadoop/hdfs/server/namenode/FSNamesystem.java 1044182 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/MiniDFSCluster.java 1044182 http://svn.apache.org/repos/asf/hadoop/hdfs/trunk/src/test/hdfs/org/apache/hadoop/hdfs/server/namenode/TestListCorruptFileBlocks.java 1044182 Diff: https://reviews.apache.org/r/105/diff Testing --- new test case in TestListCorruptFileBlocks Thanks, Patrick
[jira] Created: (HDFS-1476) listCorruptFileBlocks should be functional while the name node is still in safe mode
listCorruptFileBlocks should be functional while the name node is still in safe mode Key: HDFS-1476 URL: https://issues.apache.org/jira/browse/HDFS-1476 Project: Hadoop HDFS Issue Type: Improvement Reporter: Patrick Kling This would allow us to detect whether missing blocks can be fixed using Raid and if that is the case exit safe mode earlier. One way to make listCorruptFileBlocks available before the name node has exited from safe mode would be to perform a scan of the blocks map on each call to listCorruptFileBlocks to determine if there are any blocks with no replicas. This scan could be parallelized by dividing the space of block IDs into multiple intervals than can be scanned independently. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1477) Make NameNode Reconfigurable.
Make NameNode Reconfigurable. - Key: HDFS-1477 URL: https://issues.apache.org/jira/browse/HDFS-1477 Project: Hadoop HDFS Issue Type: Improvement Reporter: Patrick Kling Modify NameNode to implement the interface Reconfigurable proposed in HADOOP-7001. This would allow us to change certain configuration properties without restarting the name node. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1482) Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol)
Add listCorruptFileBlocks to DistributedFileSystem (and ClientProtocol) --- Key: HDFS-1482 URL: https://issues.apache.org/jira/browse/HDFS-1482 Project: Hadoop HDFS Issue Type: Improvement Reporter: Patrick Kling As discussed in HDFS-, it would be beneficial for tools such as the RAID block fixer and RAID FSCK to have access to listCorruptFileBlocks via the DistributedFileSystem (rather than having to parse Servlet output, which could present a performance problem). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1483) DFSClient.getBlockLocations returns BlockLocations with no indication that the corresponding blocks are corrupt
DFSClient.getBlockLocations returns BlockLocations with no indication that the corresponding blocks are corrupt --- Key: HDFS-1483 URL: https://issues.apache.org/jira/browse/HDFS-1483 Project: Hadoop HDFS Issue Type: Bug Reporter: Patrick Kling When there are no uncorrupted replicas of a block, FSNamesystem.getBlockLocations returns LocatedBlocks corresponding to corrupt blocks. When DFSClient converts these to BlockLocations, the information that the corresponding block is corrupt is lost. We should add a field to BlockLocation to indicate whether the corresponding block is corrupt in order to warn the client that reading this block will fail. This would be especially useful for tools such as RAID FSCK, which could then easily inspect whether data or parity blocks are corrupted without having to make direct RPC calls. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1514) project.version in aop.xml is out of sync with build.xml
project.version in aop.xml is out of sync with build.xml Key: HDFS-1514 URL: https://issues.apache.org/jira/browse/HDFS-1514 Project: Hadoop HDFS Issue Type: Bug Reporter: Patrick Kling project.version in aop.xml is set to 0.22.0-SNAPSHOT whereas version in build.xml is set to 0.23.0-SNAPSHOT. This causes ant test-patch to fail when using a local maven repository. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1527) SocketOutputStream.transferToFully fails for blocks >= 2GB on 32 bit JVM
SocketOutputStream.transferToFully fails for blocks >= 2GB on 32 bit JVM Key: HDFS-1527 URL: https://issues.apache.org/jira/browse/HDFS-1527 Project: Hadoop HDFS Issue Type: Bug Components: data-node Affects Versions: 0.23.0 Environment: 32 bit JVM Reporter: Patrick Kling Fix For: 0.23.0 On 32 bit JVM, SocketOutputStream.transferToFully() fails if the block size is >= 2GB. We should fall back to a normal transfer in this case. {code} 2010-12-02 19:04:23,490 ERROR datanode.DataNode (BlockSender.java:sendChunks(399)) - BlockSender.sendChunks() exception: java.io.IOException: Value too large for defined data type at sun.nio.ch.FileChannelImpl.transferTo0(Native Method) at sun.nio.ch.FileChannelImpl.transferToDirectly(FileChannelImpl.java:418) at sun.nio.ch.FileChannelImpl.transferTo(FileChannelImpl.java:519) at org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:204) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:386) at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:475) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.opReadBlock(DataXceiver.java:196) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.opReadBlock(DataTransferProtocol.java:356) at org.apache.hadoop.hdfs.protocol.DataTransferProtocol$Receiver.processOp(DataTransferProtocol.java:328) at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:130) at java.lang.Thread.run(Thread.java:619) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1533) A more elegant FileSystem#listCorruptFileBlocks AP (HDFS portion)
A more elegant FileSystem#listCorruptFileBlocks AP (HDFS portion) - Key: HDFS-1533 URL: https://issues.apache.org/jira/browse/HDFS-1533 Project: Hadoop HDFS Issue Type: Bug Components: hdfs client Reporter: Patrick Kling Assignee: Patrick Kling -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (HDFS-1535) TestBlockRecovery should not use fixed port
TestBlockRecovery should not use fixed port --- Key: HDFS-1535 URL: https://issues.apache.org/jira/browse/HDFS-1535 Project: Hadoop HDFS Issue Type: Bug Affects Versions: 0.23.0 Reporter: Patrick Kling TestBlockRecovery uses the default data node port 50075. This causes the test to fail if this port is not available. {code} Testcase: testFinalizedReplicas took 0.567 sec Caused an ERROR Port in use: 0.0.0.0:50075 java.net.BindException: Port in use: 0.0.0.0:50075 at org.apache.hadoop.http.HttpServer.start(HttpServer.java:625) at org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:358) at org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:502) at org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:281) at org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery.startUp(TestBlockRecovery.java:104) Caused by: java.net.BindException: Address already in use at sun.nio.ch.Net.bind(Native Method) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) at org.apache.hadoop.http.HttpServer.start(HttpServer.java:582) {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (HDFS-1535) TestBlockRecovery should not use fixed port
[ https://issues.apache.org/jira/browse/HDFS-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Kling resolved HDFS-1535. - Resolution: Duplicate > TestBlockRecovery should not use fixed port > --- > > Key: HDFS-1535 > URL: https://issues.apache.org/jira/browse/HDFS-1535 > Project: Hadoop HDFS > Issue Type: Bug >Affects Versions: 0.23.0 > Reporter: Patrick Kling > > TestBlockRecovery uses the default data node port 50075. This causes the test > to fail if this port is not available. > {code} > Testcase: testFinalizedReplicas took 0.567 sec > Caused an ERROR > Port in use: 0.0.0.0:50075 > java.net.BindException: Port in use: 0.0.0.0:50075 > at org.apache.hadoop.http.HttpServer.start(HttpServer.java:625) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startInfoServer(DataNode.java:358) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:502) > at > org.apache.hadoop.hdfs.server.datanode.DataNode.(DataNode.java:281) > at > org.apache.hadoop.hdfs.server.datanode.TestBlockRecovery.startUp(TestBlockRecovery.java:104) > Caused by: java.net.BindException: Address already in use > at sun.nio.ch.Net.bind(Native Method) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:119) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59) > at > org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216) > at org.apache.hadoop.http.HttpServer.start(HttpServer.java:582) > {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.