Re: Merging auto-HA to branch-2

2012-06-08 Thread lars hofhansl
For what's it worth... +1 on behalf of Salesforce.com.




 From: Todd Lipcon 
To: hdfs-dev@hadoop.apache.org 
Sent: Wednesday, June 6, 2012 11:46 AM
Subject: Merging auto-HA to branch-2
 
Hi folks,

We merged automatic failover (HDFS-3042) into trunk a couple weeks
back, and now I'd like to merge it into branch-2 for the next
2.0.x-alpha release. It's almost all new code so I don't think there
is any risk that it destabilizes things. Meanwhile, it's an important
feature that we should have in the 2.0 line.

If I don't hear any objections in the next day or two, I'll go ahead and merge.

Thanks
-Todd
-- 
Todd Lipcon
Software Engineer, Cloudera

Jenkins build is back to normal : Hadoop-Hdfs-trunk #1071

2012-06-08 Thread Apache Jenkins Server
See 



Re: Merging some improvements to branch-2

2012-06-08 Thread Todd Lipcon
Thanks. I'll merge these this this afternoon.

-Todd

On Thu, Jun 7, 2012 at 4:25 PM, Eli Collins  wrote:
> +1
>
> Will be great to have the direct read optimization (thanks Henry) in 2.x.
>
> Thanks,
> Eli
>
> On Thu, Jun 7, 2012 at 2:51 PM, Todd Lipcon  wrote:
>> Hi all,
>>
>> I plan to merge the following JIRAs to branch-2 this week unless there
>> are any objections:
>>
>> "Direct read" optimization:
>>
>> 6e51b33 HADOOP-8135. Add ByteBufferReadable interface to
>> FSDataInputStream. Contributed by Henry Robinson.
>> 4418682 HADOOP-8244. Improve comments on ByteBufferReadable.read.
>> Contributed by Henry Robinson.
>> c66f982 HDFS-2834. Add a ByteBuffer-based read API to DFSInputStream.
>> Contributed by Henry Robinson.
>> 467acd1 HDFS-3110. Use directRead API to reduce the number of buffer
>> copies in libhdfs. Contributed by Henry Robinson.
>>
>> These improvements give a substantial savings in CPU to applications
>> using libhdfs, and have some potential for usage in apps like HBase
>> and MR as well.
>>
>> MiniDFSClusterManager:
>> bfa5c0a HDFS-3167. CLI-based driver for MiniDFSCluster. Contributed by
>> Henry Robinson.
>> 21dfa6a HDFS-3235. MiniDFSClusterManager doesn't correctly support
>> -format option. Contributed by Henry Robinson.
>>
>> These test-only improvements make it easier to construct system tests
>> against realistic pseudo-distributed clusters.
>>
>> All of the above patches have been baking in trunk for quite some time.
>>
>> Thanks
>> -Todd
>> --
>> Todd Lipcon
>> Software Engineer, Cloudera



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HDFS-3517) TestStartup should bind ephemeral ports

2012-06-08 Thread Eli Collins (JIRA)
Eli Collins created HDFS-3517:
-

 Summary: TestStartup should bind ephemeral ports
 Key: HDFS-3517
 URL: https://issues.apache.org/jira/browse/HDFS-3517
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: test
Affects Versions: 2.0.0-alpha
Reporter: Eli Collins
Assignee: Eli Collins
Priority: Minor
 Attachments: hdfs-3517.txt

TestStartup starts a DN but doesn't bind to ephemeral ports.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3518) Provide API to check HDFS operational state

2012-06-08 Thread Bikas Saha (JIRA)
Bikas Saha created HDFS-3518:


 Summary: Provide API to check HDFS operational state
 Key: HDFS-3518
 URL: https://issues.apache.org/jira/browse/HDFS-3518
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Bikas Saha


This will improve the usability of JobTracker safe mode.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Merging some improvements to branch-2

2012-06-08 Thread Todd Lipcon
Thanks all, I merged these as well as HDFS-3243 and HDFS-3514, some
small test fixes I forgot to add to my original list. Hopefully I got
all the CHANGES.txt, etc, right :)

-Todd

On Fri, Jun 8, 2012 at 11:53 AM, Todd Lipcon  wrote:
> Thanks. I'll merge these this this afternoon.
>
> -Todd
>
> On Thu, Jun 7, 2012 at 4:25 PM, Eli Collins  wrote:
>> +1
>>
>> Will be great to have the direct read optimization (thanks Henry) in 2.x.
>>
>> Thanks,
>> Eli
>>
>> On Thu, Jun 7, 2012 at 2:51 PM, Todd Lipcon  wrote:
>>> Hi all,
>>>
>>> I plan to merge the following JIRAs to branch-2 this week unless there
>>> are any objections:
>>>
>>> "Direct read" optimization:
>>>
>>> 6e51b33 HADOOP-8135. Add ByteBufferReadable interface to
>>> FSDataInputStream. Contributed by Henry Robinson.
>>> 4418682 HADOOP-8244. Improve comments on ByteBufferReadable.read.
>>> Contributed by Henry Robinson.
>>> c66f982 HDFS-2834. Add a ByteBuffer-based read API to DFSInputStream.
>>> Contributed by Henry Robinson.
>>> 467acd1 HDFS-3110. Use directRead API to reduce the number of buffer
>>> copies in libhdfs. Contributed by Henry Robinson.
>>>
>>> These improvements give a substantial savings in CPU to applications
>>> using libhdfs, and have some potential for usage in apps like HBase
>>> and MR as well.
>>>
>>> MiniDFSClusterManager:
>>> bfa5c0a HDFS-3167. CLI-based driver for MiniDFSCluster. Contributed by
>>> Henry Robinson.
>>> 21dfa6a HDFS-3235. MiniDFSClusterManager doesn't correctly support
>>> -format option. Contributed by Henry Robinson.
>>>
>>> These test-only improvements make it easier to construct system tests
>>> against realistic pseudo-distributed clusters.
>>>
>>> All of the above patches have been baking in trunk for quite some time.
>>>
>>> Thanks
>>> -Todd
>>> --
>>> Todd Lipcon
>>> Software Engineer, Cloudera
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera



-- 
Todd Lipcon
Software Engineer, Cloudera


[jira] [Created] (HDFS-3519) Checkpoint upload may interfere with a concurrent saveNamespace

2012-06-08 Thread Todd Lipcon (JIRA)
Todd Lipcon created HDFS-3519:
-

 Summary: Checkpoint upload may interfere with a concurrent 
saveNamespace
 Key: HDFS-3519
 URL: https://issues.apache.org/jira/browse/HDFS-3519
 Project: Hadoop HDFS
  Issue Type: Bug
  Components: name-node
Affects Versions: 2.0.0-alpha, 1.0.3
Reporter: Todd Lipcon
Assignee: Todd Lipcon
Priority: Critical


TestStandbyCheckpoints failed in [precommit build 
2620|https://builds.apache.org/job/PreCommit-HDFS-Build/2620//testReport/] due 
to the following issue:
- both nodes were in Standby state, and configured to checkpoint "as fast as 
possible"
- NN1 starts to save its own namespace
- NN2 starts to upload a checkpoint for the same txid. So, both threads are 
writing to the same file fsimage.ckpt_12, but the actual file contents 
correspond to the uploading thread's data.
- NN1 finished its saveNamespace operation while NN2 was still uploading. So, 
it renamed the ckpt file. However, the contents of the file are still empty 
since NN2 hasn't sent any bytes
- NN2 finishes the upload, and the rename() call fails, which causes the 
directory to be marked failed, etc.

The result is that there is a file fsimage_12 which appears to be a finalized 
image but in fact is incompletely transferred. When the transfer completes, the 
problem "heals itself" so there wouldn't be persistent corruption unless the 
machine crashes at the same time. And even then, we'd still have the earlier 
checkpoint to restore from.

This same race could occur in a non-HA setup if a user puts the NN in safe mode 
and issues saveNamespace operations concurrent with a 2NN checkpointing, I 
believe.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-3515) Port HDFS-1457 to branch-1

2012-06-08 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HDFS-3515.
---

  Resolution: Fixed
   Fix Version/s: 1.2.0
Target Version/s:   (was: 1.1.1)
Hadoop Flags: Reviewed

Thanks Todd. I've confirmed the testing and committed to branch-1.

> Port HDFS-1457 to branch-1
> --
>
> Key: HDFS-3515
> URL: https://issues.apache.org/jira/browse/HDFS-3515
> Project: Hadoop HDFS
>  Issue Type: New Feature
>  Components: name-node
>Affects Versions: 1.0.0
>Reporter: Eli Collins
>Assignee: Eli Collins
> Fix For: 1.2.0
>
> Attachments: hdfs-3515.txt
>
>
> Let's port HDFS-1457 (configuration option to enable limiting the transfer 
> rate used when sending the image and edits for checkpointing) to branch-1.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Resolved] (HDFS-3479) backport HDFS-3335 (check for edit log corruption at the end of the log) to branch-1

2012-06-08 Thread Eli Collins (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-3479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eli Collins resolved HDFS-3479.
---

Resolution: Fixed

> backport HDFS-3335 (check for edit log corruption at the end of the log) to 
> branch-1
> 
>
> Key: HDFS-3479
> URL: https://issues.apache.org/jira/browse/HDFS-3479
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Affects Versions: 1.0.0
>Reporter: Colin Patrick McCabe
>Assignee: Colin Patrick McCabe
> Fix For: 1.2.0
>
> Attachments: HDFS-3335-b1.005.patch, HDFS-3479-b1.002.patch, 
> HDFS-3479-b1.003.patch
>
>
> backport HDFS-3335 (check for edit log corruption at the end of the log) to 
> branch-1

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3520) Add transfer rate logging to TransferFsImage

2012-06-08 Thread Eli Collins (JIRA)
Eli Collins created HDFS-3520:
-

 Summary: Add transfer rate logging to TransferFsImage
 Key: HDFS-3520
 URL: https://issues.apache.org/jira/browse/HDFS-3520
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Affects Versions: 2.0.0-alpha, 1.2.0
Reporter: Eli Collins
Assignee: Eli Collins


Logging the transfer rate for images and edits in TransferFsImage is useful for 
debugging network issues, especially when using 
dfs.datanode.balance.bandwidthPerSec.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HDFS-3521) Allow namenode to toleration edit log corruption

2012-06-08 Thread Tsz Wo (Nicholas), SZE (JIRA)
Tsz Wo (Nicholas), SZE created HDFS-3521:


 Summary: Allow namenode to toleration edit log corruption
 Key: HDFS-3521
 URL: https://issues.apache.org/jira/browse/HDFS-3521
 Project: Hadoop HDFS
  Issue Type: Improvement
  Components: name-node
Reporter: Tsz Wo (Nicholas), SZE
Assignee: Tsz Wo (Nicholas), SZE


HDFS-3479 adds checking for edit log corruption. It uses a fixed 
UNCHECKED_REGION_LENGTH (=PREALLOCATION_LENGTH) so that the bytes at the end 
within the length is not checked.  Instead of not checking the bytes, we should 
check everything and allow toleration.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira