RE: Mapper OOMs disappear after disabling JVM reuse

2011-04-08 Thread Steven Wong
A likely or possible cause is memory leaks. Another possibility may be heap fragmentation. The OOM happened when the mapper was trying to allocate the buffer whose size is controlled by io.sort.mb, and my setup has io.sort.mb=500 and -Xmx1000m. From: Igor Tatarinov [mailto:i...@decide.com] Sen

Re: How to configure Hive to use CombineFileInputFormat in case of too many small files

2011-04-08 Thread Michael Jiang
I didn't find a configuration property specifically for controlling the merged input size for a mapper except those for map or reduce output size. e.g. "hive.merge.size.smallfiles.avgsize" looks like what I want, but it actually applies to output. In pig, " pig.maxCombinedSplitSize" does the simila

Re: How to configure Hive to use CombineFileInputFormat in case of too many small files

2011-04-08 Thread Michael Jiang
Thanks Kumar. Are there other settings to fine tune how small files are merged into a bigger one that a mapper takes? Basically I want to match the size of a merged file to the block size. On Fri, Apr 8, 2011 at 11:43 AM, V.Senthil Kumar wrote: > You can add these lines in hive-site.xml. It cr

Re: How to configure Hive to use CombineFileInputFormat in case of too many small files

2011-04-08 Thread V.Senthil Kumar
You can add these lines in hive-site.xml. It creates only one file at the end. Hope it helps. hive.merge.mapredfiles true Merge small files at the end of a map-reduce job hive.input.format org.apache.hadoop.hive.ql.io.CombineHiveInputFormat The default input format, if it is not s

How to configure Hive to use CombineFileInputFormat in case of too many small files

2011-04-08 Thread Michael Jiang
Could not find the instructions regarding this to avoid performance issues when too many mappers have to be created for every small file. Thanks!

Re: Mapper OOMs disappear after disabling JVM reuse

2011-04-08 Thread Igor Tatarinov
I had a similar problem until I set this parameter to 1 (although 3 seems to work fine too). There is an explanation somewhere on the web. Basically, if you run 20 tasks and the garbage collector cannot catch up with accumulated garbage, the java process grows too big so when it finally decides to

Re: How To Use Hive

2011-04-08 Thread sangeetha s
Hi, I agree with Geoff. Before proceeding further, First, decide whether hive is suitable for your problem or your project environment. To do that,you should understand the basics of Hive. Kindly look at the apache wiki suggested by Geoff. Once you had decided to proceed with Hive, there are sever