Re: Beware sun's jvm version 1.6.0_05-b13 on linux

2009-05-18 Thread Steve Loughran
Allen Wittenauer wrote: On 5/15/09 11:38 AM, "Owen O'Malley" wrote: We have observed that the default jvm on RedHat 5 I'm sure some people are scratching their heads at this. The default JVM on at least RHEL5u0/1 is a GCJ-based 1.4, clearly incapable of running Hadoop. We [and, r

Re: Is there any performance issue with Jrockit JVM for Hadoop

2009-05-18 Thread Steve Loughran
Grace wrote: To follow up this question, I have also asked help on Jrockit forum. They kindly offered some useful and detailed suggestions according to the JRA results. After updating the option list, the performance did become better to some extend. But it is still not comparable with the Sun JV

JobInProgress and TaskInProgress

2009-05-18 Thread Rakhi Khatwani
Hi, how do i get the job progress n task progress information programmaticaly at any point of time using the API's there is a JobInProgress and TaskInProgress classes... but both of them are private any suggestions? Thanks, Raakhi

Re: Is there any performance issue with Jrockit JVM for Hadoop

2009-05-18 Thread Tom White
On Mon, May 18, 2009 at 11:44 AM, Steve Loughran wrote: > Grace wrote: >> >> To follow up this question, I have also asked help on Jrockit forum. They >> kindly offered some useful and detailed suggestions according to the JRA >> results. After updating the option list, the performance did become

RE: proper method for writing files to hdfs

2009-05-18 Thread Bill Habermaas
Sasha, Connecting to the namenode is the proper way to establish the hdfs connection. Afterwards the Hadoop client handler that is called by your code will go directly to the datanodes. There is no reason for you to communicate directly with a datanode nor is there a way for you to even know wher

Re: proper method for writing files to hdfs

2009-05-18 Thread Sasha Dolgy
Hi Bill, Thanks for that. If the NameNode is unavailable, how do we find the secondary name node? Is there a way to deal with this in the code or should a load balancer of some type sit above each and only direct traffic to the name node if its listening? -sd On Mon, May 18, 2009 at 2:09 PM, B

Re: Is there any performance issue with Jrockit JVM for Hadoop

2009-05-18 Thread Steve Loughran
Tom White wrote: On Mon, May 18, 2009 at 11:44 AM, Steve Loughran wrote: Grace wrote: To follow up this question, I have also asked help on Jrockit forum. They kindly offered some useful and detailed suggestions according to the JRA results. After updating the option list, the performance did

RE: proper method for writing files to hdfs

2009-05-18 Thread Bill Habermaas
Sasha, If the namenode is unavailable then you cannot communicate with Hadoop. It is the single point of failure and once it is down then the system is unusable. The secondary name node is not a failover substitute for the name node. The name is misleading. It's purpose is simply to checkpoint

Re: proper method for writing files to hdfs

2009-05-18 Thread Sasha Dolgy
Ok, on the same page with that. Going back to the original question. In our scenario we are trying to stream data into HDFS and despite the posts and hints I've been reading, it's still tough to crack this nut and this is why I thought (and thankfully I wasn't right) that we were going about this

RE: proper method for writing files to hdfs

2009-05-18 Thread Habermaas, William
-Original Message- From: Sasha Dolgy [mailto:sdo...@gmail.com] Sent: Monday, May 18, 2009 9:50 AM To: core-user@hadoop.apache.org Subject: Re: proper method for writing files to hdfs Ok, on the same page with that. Going back to the original question. In our scenario we are trying to

Re: FSDataOutputStream flush() not working?

2009-05-18 Thread Sasha Dolgy
Hi Jason, If the bufferSize is set when the stream is created, when the size is reached, will it automatically write itself out to HDFS? What happens when the buffer size is exceeded? -sasha On Mon, May 18, 2009 at 3:04 AM, jason hadoop wrote: > When you open a file you have the option, blockS

RE: proper method for writing files to hdfs

2009-05-18 Thread Bill Habermaas
Hadoop writes data to the local filesystem, when the blocksize is reached it is written into hdfs. Think of hdfs as a block management system rather than a file system even though the end result is a series of blocks that constitute a file. You will not see the data in hdfs until the file is closed

Optimal Filesystem (and Settings) for HDFS

2009-05-18 Thread Bob Schulze
We are currently rebuilding our cluster - has anybody recommendations on the underlaying file system? Just standard Ext3? I could imagine that the block size could be larger than its default... Thx for any tips, Bob

Re: Beware sun's jvm version 1.6.0_05-b13 on linux

2009-05-18 Thread Owen O'Malley
On May 18, 2009, at 3:42 AM, Steve Loughran wrote: Presumably its one of those hard-to-reproduce race conditions that only surfaces under load on a big cluster so is hard to replicate in a unit test, right? Yes. It reliably happens on a 100TB or larger sort, but almost never happens on a

Re: Optimal Filesystem (and Settings) for HDFS

2009-05-18 Thread Alex Loddengaard
I believe Yahoo! uses ext3, though I know other people have said that XFS has performed better in various benchmarks. We use ext3, though we haven't done any benchmarks to prove its worth. This question has come up a lot, so I think it'd be worth doing a benchmark and writing up the results. I h

Re: Optimal Filesystem (and Settings) for HDFS

2009-05-18 Thread Edward Capriolo
Do not forget 'tune2fs -m 2'. By default this value gets set at 5%. With 1 TB disks we got 33 GB more usable space. Talk about instant savings! On Mon, May 18, 2009 at 1:31 PM, Alex Loddengaard wrote: > I believe Yahoo! uses ext3, though I know other people have said that XFS > has performed bett

Re: Optimal Filesystem (and Settings) for HDFS

2009-05-18 Thread Allen Wittenauer
On 5/18/09 11:33 AM, "Edward Capriolo" wrote: > Do not forget 'tune2fs -m 2'. By default this value gets set at 5%. > With 1 TB disks we got 33 GB more usable space. Talk about instant > savings! Yup. Although, I think we're using -m 1. > On Mon, May 18, 2009 at 1:31 PM, Alex Loddengaard w

unexpected overloaded mapper number

2009-05-18 Thread He Chen
Hei everyone, I've updated my hadoop to 0.20. however, when I run my old version mapreduce program, it will report me " Task Id : attempt_200905181657_0002_m_000102_0, Status : FAILED Error initializing attempt_200905181818_0001_m_000102_0: java.io.FileNotFoundException: File file:/hadoop/hadoop-r

Re: A brief report of Second Hadoop in China Salon

2009-05-18 Thread Min Zhou
Cheers! Thank you all organizers and speakers! On Sun, May 17, 2009 at 6:49 PM, Qingyan(Evan) Liu wrote: > yes, that's a great conference. thanks a lot to the organizers and > reporters. > > 2009/5/16 He Yongqiang > > > Hi, all > > In May 9, we held the second Hadoop In China salon. About 150

problems running filebench

2009-05-18 Thread 葉筱楓
Hi, everyone, I’m trying to do some performance test on HDFS with filebench which comes with hadoop-0.19.0-test.jar. I successfully run the command with –r attribute (read testing) and get the output. (the command :hadoop jar hadoop-0.19.0-test.jar filebench -r -txt -blk -pln -dir "hdfs://maste

Re: JobInProgress and TaskInProgress

2009-05-18 Thread Jothi Padmanabhan
Could you let us know what information are you looking to extract from these classes? You possibly could get them from other classes. Jothi On 5/18/09 6:23 PM, "Rakhi Khatwani" wrote: > Hi, > how do i get the job progress n task progress information > programmaticaly at any point of time

Re: sort example

2009-05-18 Thread David Rio
On Sun, May 17, 2009 at 3:33 PM, Chuck Lam wrote: > The mapred.text.key.comparator.options property is active only if you use > the KeyFieldBasedComparator. > > -D > mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator > Thanks Chuck. That was it. > There's a

Re: JobInProgress and TaskInProgress

2009-05-18 Thread Rakhi Khatwani
Hi, I am looking for the following: for each task: % completed for both map n reduce, exceptions (if encountered). for each job: % completed, status (RUNNING,FAILED,PAUSED etc). i would wanna write a program so that i can programatically access the above information at any point of time. Tha

Re: JobInProgress and TaskInProgress

2009-05-18 Thread Jothi Padmanabhan
Look at JobClient -- There are some useful methods there. For example, displayTasks and monitorAndPrintJob might provide most of the information that you are looking for. Jothi On 5/19/09 10:14 AM, "Rakhi Khatwani" wrote: > Hi, > I am looking for the following: > for each task: % complet

Re: JobInProgress and TaskInProgress

2009-05-18 Thread Amareshwari Sriramadasu
You can use RunningJob handle to query map/reduce progress. See api @ http://hadoop.apache.org/core/docs/r0.20.0/api/org/apache/hadoop/mapred/RunningJob.html Thanks Amareshwari Jothi Padmanabhan wrote: Look at JobClient -- There are some useful methods there. For example, displayTasks and moni