Re: Latency and speed of HDFS

2011-01-28 Thread Da Zheng
Have you tried TestDFSIO? I think it's quite a good benchmark to measure the performance of HDFS. If you want to know how to write data to HDFS directly, you can read its code. Da On 1/28/11 6:36 PM, Pei HE wrote: > Hi all, > I want to know the detailed performance of Hadoop. > > I am writing a

Re: Hadoop use direct I/O in Linux?

2011-01-06 Thread Da Zheng
On 01/05/2011 12:44 AM, Christopher Smith wrote: Yes, my C program can reach 100MB/s or even 110MB/s when writing data to the disk sequentially, but with direct I/O enabled, the maximal throughput is about 140MB/s. But the biggest difference is CPU usage. Without direct I/O, operating system uses

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Da Zheng
> On Jan 5, 2011, at 3:42 PM, Da Zheng wrote: > >> I'm not sure of that. I wrote a small checksum program for testing. After >> the size of a block gets to larger than 8192 bytes, I don't see much >> performance improvement. See the code below. I don

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Da Zheng
I'm not sure of that. I wrote a small checksum program for testing. After the size of a block gets to larger than 8192 bytes, I don't see much performance improvement. See the code below. I don't think 64MB can bring us any benefit. I did change io.bytes.per.checksum to 131072 in hadoop, and the

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Da Zheng
won't see > much, if any performance improvement. > > > > -Original Message- > From: Da Zheng [mailto:zhen...@cs.jhu.edu] > Sent: Tuesday, January 04, 2011 11:11 PM > To: common-dev@hadoop.apache.org > Subject: Re: Hadoop use direct I/O in Linux? > >

Re: Hadoop use direct I/O in Linux?

2011-01-05 Thread Da Zheng
On 1/5/11 12:44 AM, Christopher Smith wrote: > On Tue, Jan 4, 2011 at 9:11 PM, Da Zheng wrote: > >> On 1/4/11 5:17 PM, Christopher Smith wrote: >>> If you use direct I/O to reduce CPU time, that means you are saving CPU >> via >>> DMA. If you are using J

Re: Hadoop use direct I/O in Linux?

2011-01-04 Thread Da Zheng
ore processor with hyperthread enabled). Cpu(s): 3.4%us, 32.8%sy, 0.0%ni, 50.0%id, 12.1%wa, 0.0%hi, 1.6%si, 0.0%st But with direct I/O, the system time can be as little as 3%. Best, Da > > On Tue, Jan 4, 2011 at 9:58 AM, Da Zheng wrote: > >> The most important reason for

Re: Hadoop use direct I/O in Linux?

2011-01-04 Thread Da Zheng
The most important reason for me to use direct I/O is that the Atom processor is too weak. If I wrote a simple program to write data to the disk, CPU is almost 100% but the disk hasn't reached its maximal bandwidth. When I write data to SSD, the difference is even larger. Even if the program ha

Fwd: Hadoop use direct I/O in Linux?

2011-01-03 Thread Da Zheng
ronment of compiling Hadoop. I can use jposix, but I don't know how to integrate it to Hadoop (jposix uses JNI). Any instructions to do it? Thank you, Da Original Message Subject:Hadoop use direct I/O in Linux? Date: Sun, 02 Jan 2011 15:01:18 -0500 From: Da Zheng T

Fwd: Hadoop use direct I/O in Linux?

2011-01-03 Thread Da Zheng
ronment of compiling Hadoop. I can use jposix, but I don't know how to integrate it to Hadoop (jposix uses JNI). Any instructions to do it? Thank you, Da Original Message Subject:Hadoop use direct I/O in Linux? Date: Sun, 02 Jan 2011 15:01:18 -0500 From: Da Zheng T