A likely or possible cause is memory leaks. Another possibility may be heap
fragmentation. The OOM happened when the mapper was trying to allocate the
buffer whose size is controlled by io.sort.mb, and my setup has io.sort.mb=500
and -Xmx1000m.
From: Igor Tatarinov [mailto:i...@decide.com]
Sen
I didn't find a configuration property specifically for controlling the
merged input size for a mapper except those for map or reduce output size.
e.g. "hive.merge.size.smallfiles.avgsize" looks like what I want, but it
actually applies to output. In pig, " pig.maxCombinedSplitSize" does the
simila
Thanks Kumar.
Are there other settings to fine tune how small files are merged into a
bigger one that a mapper takes? Basically I want to match the size of a
merged file to the block size.
On Fri, Apr 8, 2011 at 11:43 AM, V.Senthil Kumar wrote:
> You can add these lines in hive-site.xml. It cr
You can add these lines in hive-site.xml. It creates only one file at the end.
Hope it helps.
hive.merge.mapredfiles
true
Merge small files at the end of a map-reduce job
hive.input.format
org.apache.hadoop.hive.ql.io.CombineHiveInputFormat
The default input format, if it is not s
Could not find the instructions regarding this to avoid performance issues
when too many mappers have to be created for every small file. Thanks!
I had a similar problem until I set this parameter to 1 (although 3 seems to
work fine too).
There is an explanation somewhere on the web. Basically, if you run 20 tasks
and the garbage collector cannot catch up with accumulated garbage, the java
process grows too big so when it finally decides to
Hi,
I agree with Geoff.
Before proceeding further, First, decide whether hive is suitable for your
problem or your project environment. To do that,you should understand the
basics of Hive. Kindly look at the apache wiki suggested by Geoff.
Once you had decided to proceed with Hive, there are sever