Re: OutOfMemory errors on joining 2 large tables.

2011-02-23 Thread Igor Tatarinov
The one setting that's still unknown is mapred.tasktracker.reduce.tasks.maximum Have you tried setting that to 1 as an experiment and increasing the java heap. I assume the mappers are not eating all the memory. On Wed, Feb 23, 2011 at 9:57 PM, hadoop n00b wrote: > I have suddenly began to get

Re: OutOfMemory errors on joining 2 large tables.

2011-02-23 Thread hadoop n00b
I have suddenly began to get this error (hadoop error code 2) even for not-so-big queries. I am running a 6 node cluster. I tried to run the queries with 6 and 10 reducers but got the same result. On Wed, Feb 23, 2011 at 8:25 PM, Bennie Schut wrote: > We filter nulls already before the tables ar

Re: OutOfMemory errors on joining 2 large tables.

2011-02-23 Thread Bennie Schut
We filter nulls already before the tables are filled but then this will probably cause a skew in the keys like Paul was saying. I'm running some queries on the keys to see if that's the case. I do expect there will be large differences in distribution of some of the keys. I'm looking at "set hiv

Re: OutOfMemory errors on joining 2 large tables.

2011-02-22 Thread Mapred Learn
Oops I meant nulls. Sent from my iPhone On Feb 22, 2011, at 8:22 PM, Mapred Learn wrote: > Check if you can filter non-nulls. That might help. > > Sent from my iPhone > > On Feb 22, 2011, at 12:46 AM, Bennie Schut wrote: > >> I've just set the "hive.exec.reducers.bytes.per.reducer" to as lo

Re: OutOfMemory errors on joining 2 large tables.

2011-02-22 Thread Mapred Learn
Check if you can filter non-nulls. That might help. Sent from my iPhone On Feb 22, 2011, at 12:46 AM, Bennie Schut wrote: > I've just set the "hive.exec.reducers.bytes.per.reducer" to as low as 100k > which caused this job to run with 999 reducers. I still have 5 tasks failing > with an outof

RE: OutOfMemory errors on joining 2 large tables.

2011-02-22 Thread Paul Yang
] Sent: Tuesday, February 22, 2011 12:46 AM To: user@hive.apache.org Subject: Re: OutOfMemory errors on joining 2 large tables. I've just set the "hive.exec.reducers.bytes.per.reducer" to as low as 100k which caused this job to run with 999 reducers. I still have 5 tasks failing with

Re: OutOfMemory errors on joining 2 large tables.

2011-02-22 Thread Bennie Schut
I've just set the "hive.exec.reducers.bytes.per.reducer" to as low as 100k which caused this job to run with 999 reducers. I still have 5 tasks failing with an outofmemory. We have jvm reuse set to 8 but dropping it to 1 seems to greatly reduce this problem: set mapred.job.reuse.jvm.num.tasks

OutOfMemory errors on joining 2 large tables.

2011-02-18 Thread Bennie Schut
When we try to join two large tables some of the reducers stop with an OutOfMemory exception. Error: java.lang.OutOfMemoryError: Java heap space at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508) at org.apache.hadoop.mapred.ReduceTask$Re