Allen Wittenauer wrote:
On 5/15/09 11:38 AM, "Owen O'Malley" wrote:
We have observed that the default jvm on RedHat 5
I'm sure some people are scratching their heads at this.
The default JVM on at least RHEL5u0/1 is a GCJ-based 1.4, clearly
incapable of running Hadoop. We [and, r
Grace wrote:
To follow up this question, I have also asked help on Jrockit forum. They
kindly offered some useful and detailed suggestions according to the JRA
results. After updating the option list, the performance did become better
to some extend. But it is still not comparable with the Sun JV
Hi,
how do i get the job progress n task progress information
programmaticaly at any point of time using the API's
there is a JobInProgress and TaskInProgress classes... but both of them are
private
any suggestions?
Thanks,
Raakhi
On Mon, May 18, 2009 at 11:44 AM, Steve Loughran wrote:
> Grace wrote:
>>
>> To follow up this question, I have also asked help on Jrockit forum. They
>> kindly offered some useful and detailed suggestions according to the JRA
>> results. After updating the option list, the performance did become
Sasha,
Connecting to the namenode is the proper way to establish the hdfs
connection. Afterwards the Hadoop client handler that is called by your
code will go directly to the datanodes. There is no reason for you to
communicate directly with a datanode nor is there a way for you to even know
wher
Hi Bill,
Thanks for that. If the NameNode is unavailable, how do we find the
secondary name node? Is there a way to deal with this in the code or
should a load balancer of some type sit above each and only direct
traffic to the name node if its listening?
-sd
On Mon, May 18, 2009 at 2:09 PM, B
Tom White wrote:
On Mon, May 18, 2009 at 11:44 AM, Steve Loughran wrote:
Grace wrote:
To follow up this question, I have also asked help on Jrockit forum. They
kindly offered some useful and detailed suggestions according to the JRA
results. After updating the option list, the performance did
Sasha,
If the namenode is unavailable then you cannot communicate with Hadoop. It
is the single point of failure and once it is down then the system is
unusable. The secondary name node is not a failover substitute for the name
node. The name is misleading. It's purpose is simply to checkpoint
Ok, on the same page with that.
Going back to the original question. In our scenario we are trying to
stream data into HDFS and despite the posts and hints I've been
reading, it's still tough to crack this nut and this is why I thought
(and thankfully I wasn't right) that we were going about this
-Original Message-
From: Sasha Dolgy [mailto:sdo...@gmail.com]
Sent: Monday, May 18, 2009 9:50 AM
To: core-user@hadoop.apache.org
Subject: Re: proper method for writing files to hdfs
Ok, on the same page with that.
Going back to the original question. In our scenario we are trying to
Hi Jason,
If the bufferSize is set when the stream is created, when the size is
reached, will it automatically write itself out to HDFS? What happens
when the buffer size is exceeded?
-sasha
On Mon, May 18, 2009 at 3:04 AM, jason hadoop wrote:
> When you open a file you have the option, blockS
Hadoop writes data to the local filesystem, when the blocksize is reached it
is written into hdfs. Think of hdfs as a block management system rather than
a file system even though the end result is a series of blocks that
constitute a file. You will not see the data in hdfs until the file is
closed
We are currently rebuilding our cluster - has anybody recommendations on
the underlaying file system? Just standard Ext3?
I could imagine that the block size could be larger than its default...
Thx for any tips,
Bob
On May 18, 2009, at 3:42 AM, Steve Loughran wrote:
Presumably its one of those hard-to-reproduce race conditions that
only surfaces under load on a big cluster so is hard to replicate in
a unit test, right?
Yes. It reliably happens on a 100TB or larger sort, but almost never
happens on a
I believe Yahoo! uses ext3, though I know other people have said that XFS
has performed better in various benchmarks. We use ext3, though we haven't
done any benchmarks to prove its worth.
This question has come up a lot, so I think it'd be worth doing a benchmark
and writing up the results. I h
Do not forget 'tune2fs -m 2'. By default this value gets set at 5%.
With 1 TB disks we got 33 GB more usable space. Talk about instant
savings!
On Mon, May 18, 2009 at 1:31 PM, Alex Loddengaard wrote:
> I believe Yahoo! uses ext3, though I know other people have said that XFS
> has performed bett
On 5/18/09 11:33 AM, "Edward Capriolo" wrote:
> Do not forget 'tune2fs -m 2'. By default this value gets set at 5%.
> With 1 TB disks we got 33 GB more usable space. Talk about instant
> savings!
Yup. Although, I think we're using -m 1.
> On Mon, May 18, 2009 at 1:31 PM, Alex Loddengaard w
Hei everyone,
I've updated my hadoop to 0.20. however, when I run my old version mapreduce
program, it will report me
" Task Id : attempt_200905181657_0002_m_000102_0, Status : FAILED
Error initializing attempt_200905181818_0001_m_000102_0:
java.io.FileNotFoundException: File
file:/hadoop/hadoop-r
Cheers! Thank you all organizers and speakers!
On Sun, May 17, 2009 at 6:49 PM, Qingyan(Evan) Liu wrote:
> yes, that's a great conference. thanks a lot to the organizers and
> reporters.
>
> 2009/5/16 He Yongqiang
>
> > Hi, all
> > In May 9, we held the second Hadoop In China salon. About 150
Hi, everyone,
I’m trying to do some performance test on HDFS with filebench which comes with
hadoop-0.19.0-test.jar.
I successfully run the command with –r attribute (read testing) and get the
output.
(the command :hadoop jar hadoop-0.19.0-test.jar filebench -r -txt -blk -pln
-dir "hdfs://maste
Could you let us know what information are you looking to extract from these
classes? You possibly could get them from other classes.
Jothi
On 5/18/09 6:23 PM, "Rakhi Khatwani" wrote:
> Hi,
> how do i get the job progress n task progress information
> programmaticaly at any point of time
On Sun, May 17, 2009 at 3:33 PM, Chuck Lam wrote:
> The mapred.text.key.comparator.options property is active only if you use
> the KeyFieldBasedComparator.
>
> -D
> mapred.output.key.comparator.class=org.apache.hadoop.mapred.lib.KeyFieldBasedComparator
>
Thanks Chuck. That was it.
> There's a
Hi,
I am looking for the following:
for each task: % completed for both map n reduce, exceptions (if
encountered).
for each job: % completed, status (RUNNING,FAILED,PAUSED etc).
i would wanna write a program so that i can programatically access the above
information at any point of time.
Tha
Look at JobClient -- There are some useful methods there.
For example, displayTasks and monitorAndPrintJob might provide most of the
information that you are looking for.
Jothi
On 5/19/09 10:14 AM, "Rakhi Khatwani" wrote:
> Hi,
> I am looking for the following:
> for each task: % complet
You can use RunningJob handle to query map/reduce progress.
See api @
http://hadoop.apache.org/core/docs/r0.20.0/api/org/apache/hadoop/mapred/RunningJob.html
Thanks
Amareshwari
Jothi Padmanabhan wrote:
Look at JobClient -- There are some useful methods there.
For example, displayTasks and moni
25 matches
Mail list logo