[jira] Created: (HADOOP-7086) Retrying socket connection failure times can be made as configurable

2011-01-05 Thread Devaraj K (JIRA)
Retrying socket connection failure times can be made as configurable Key: HADOOP-7086 URL: https://issues.apache.org/jira/browse/HADOOP-7086 Project: Hadoop Common Issue Ty

RE: Hadoop use direct I/O in Linux?

2011-01-05 Thread Segel, Mike
You are mixing a few things up. You're testing your I/O using C. What do you see if you try testing your direct I/O from Java? I'm guessing that you'll keep your i/o piece in place and wrap it within some JNI code and then re-write the test in Java? Also are you testing large streams or random

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Jay Booth
On Tue, Jan 4, 2011 at 12:58 PM, Da Zheng wrote: > The most important reason for me to use direct I/O is that the Atom > processor is too weak. If I wrote a simple program to write data to the > disk, CPU is almost 100% but the disk hasn't reached its maximal bandwidth. > When I write data to SSD

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Da Zheng
On 1/5/11 12:44 AM, Christopher Smith wrote: > On Tue, Jan 4, 2011 at 9:11 PM, Da Zheng wrote: > >> On 1/4/11 5:17 PM, Christopher Smith wrote: >>> If you use direct I/O to reduce CPU time, that means you are saving CPU >> via >>> DMA. If you are using Java's heap though, you can kiss that goodby

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Da Zheng
On 1/5/11 9:50 AM, Segel, Mike wrote: > You are mixing a few things up. > > You're testing your I/O using C. > What do you see if you try testing your direct I/O from Java? > I'm guessing that you'll keep your i/o piece in place and wrap it within some > JNI code and then re-write the test in Ja

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Greg Roelofs
Da Zheng wrote: > I already did "ant compile-c++-libhdfs -Dlibhdfs=1", but it seems nothing is > compiled as it prints the following: > check-c++-libhdfs: > check-c++-makefile-libhdfs: > create-c++-libhdfs-makefile: > compile-c++-libhdfs: > BUILD SUCCESSFUL > Total time: 2 seconds You may ne

Using git grafts to merge history across project split

2011-01-05 Thread Todd Lipcon
I know many people use git, so wanted to share a neat tip I figured out this morning that lets you graft the pre-split history into the post-split repositories. I'm using git 1.7.1, not sure how new these features are. Here are the steps: 1) Check out the git repos from git.apache.org into git/had

Build failed, hudson broken?

2011-01-05 Thread Niels Basjes
Hi, I just submitted a patch for the feature I've been working on. https://issues.apache.org/jira/browse/HADOOP-7076 This patch works fine on my system and passes all the unit tests. Now some 30 minutes later it seems the build on the hudson has failed. https://hudson.apache.org/hudson/job/PreCo

Re: Build failed, hudson broken?

2011-01-05 Thread Niels Basjes
I found where to report this ... so I did: https://issues.apache.org/jira/browse/INFRA-3340 2011/1/5 Niels Basjes : > Hi, > > I just submitted a patch for the feature I've been working on. > https://issues.apache.org/jira/browse/HADOOP-7076 > > This patch works fine on my system and passes all the

Re: Using git grafts to merge history across project split

2011-01-05 Thread Chris Douglas
This is great. Thanks, Todd. -C On Wed, Jan 5, 2011 at 12:36 PM, Todd Lipcon wrote: > I know many people use git, so wanted to share a neat tip I figured out this > morning that lets you graft the pre-split history into the post-split > repositories. I'm using git 1.7.1, not sure how new these fe

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Milind Bhandarkar
I agree with Jay B. Checksumming is usually the culprit for high CPU on clients and datanodes. Plus, a checksum of 4 bytes for every 512, means for 64MB block, the checksum will be 512KB, i.e. 128 ext3 blocks. Changing it to generate 1 ext3 checksum block per DFS block will speedup read/write wi

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Brian Bockelman
On Jan 5, 2011, at 4:03 PM, Milind Bhandarkar wrote: > I agree with Jay B. Checksumming is usually the culprit for high CPU on > clients and datanodes. Plus, a checksum of 4 bytes for every 512, means for > 64MB block, the checksum will be 512KB, i.e. 128 ext3 blocks. Changing it to > generate

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Milind Bhandarkar
> > Know thine usage scenarios. Yup. - milind --- Milind Bhandarkar (mbhandar...@linkedin.com) (650-776-3236)

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Da Zheng
I'm not sure of that. I wrote a small checksum program for testing. After the size of a block gets to larger than 8192 bytes, I don't see much performance improvement. See the code below. I don't think 64MB can bring us any benefit. I did change io.bytes.per.checksum to 131072 in hadoop, and the

Re: svn commit: r1055684 - /hadoop/common/branches/branch-0.20/CHANGES.txt

2011-01-05 Thread Ian Holsman
Is 20.3 a 'dead' release ? I haven't seen any discussion of this on the apache lists about creating a 20.3 release, and kind of goes against all the discussion that we recently had with StAck about creating a 'append' release on 0.20. I'm not against 20.3, but I would like to see some discussi

Re: setting "mapred.task.cache.levels" to 0 makes Hadoop stall

2011-01-05 Thread Greg Roelofs
Zhenhua Guo wrote: > It seems that mapred.task.cache.levels is used by JobTracker to create > task caches for nodes at various levels. This makes data-locality > scheduling possible. > If I set mapred.task.cache.levels to 0 and use default network > topology, then mapreduce job will stall forever

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Milind Bhandarkar
Have you tried with org.apache.hadoop.util.DataChecksum and org.apache.hadoop.util.PureJavaCrc32 ? - Milind On Jan 5, 2011, at 3:42 PM, Da Zheng wrote: > I'm not sure of that. I wrote a small checksum program for testing. After the > size of a block gets to larger than 8192 bytes, I don't see

[jira] Created: (HADOOP-7087) SequenceFile.createWriter ignores FileSystem parameter

2011-01-05 Thread Todd Lipcon (JIRA)
SequenceFile.createWriter ignores FileSystem parameter -- Key: HADOOP-7087 URL: https://issues.apache.org/jira/browse/HADOOP-7087 Project: Hadoop Common Issue Type: Bug Components

[jira] Created: (HADOOP-7088) JMX Bean that exposes version and build information

2011-01-05 Thread Dmytro Molkov (JIRA)
JMX Bean that exposes version and build information --- Key: HADOOP-7088 URL: https://issues.apache.org/jira/browse/HADOOP-7088 Project: Hadoop Common Issue Type: New Feature Report

[jira] Created: (HADOOP-7089) Use readlink to get absolute paths in the scripts

2011-01-05 Thread Eli Collins (JIRA)
Use readlink to get absolute paths in the scripts -- Key: HADOOP-7089 URL: https://issues.apache.org/jira/browse/HADOOP-7089 Project: Hadoop Common Issue Type: Improvement Components

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Da Zheng
isn't DataChecksum just a wrapper of CRC32? I'm still using Hadoop 0.20.2. there is no PureJavaCrc32 Da On 1/5/11 7:44 PM, Milind Bhandarkar wrote: > Have you tried with org.apache.hadoop.util.DataChecksum and > org.apache.hadoop.util.PureJavaCrc32 ? > > - Milind > > On Jan 5, 2011, at 3:42 PM

[jira] Created: (HADOOP-7090) Possible resource leaks in hadoop core code

2011-01-05 Thread Gokul (JIRA)
Possible resource leaks in hadoop core code --- Key: HADOOP-7090 URL: https://issues.apache.org/jira/browse/HADOOP-7090 Project: Hadoop Common Issue Type: Bug Affects Versions: 0.21.0 R

[jira] Resolved: (HADOOP-6872) ChecksumFs#listStatus should filter out .crc files

2011-01-05 Thread Konstantin Shvachko (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-6872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HADOOP-6872. - Resolution: Duplicate Fixed as a part of HADOOP-6906. > ChecksumFs#listStatus s

[jira] Resolved: (HADOOP-6718) Client does not close connection when an exception happens during SASL negotiation

2011-01-05 Thread Konstantin Shvachko (JIRA)
[ https://issues.apache.org/jira/browse/HADOOP-6718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Konstantin Shvachko resolved HADOOP-6718. - Resolution: Duplicate Incorporated in HADOOP-6706 for 0.22. > Client does not c