Hi Hive users, I have already asked Parquet users. Hoping to get quick reply from Hive and Hadoop Users!
Has anyone came across the OOME (Java Heap Space) problem while using Parquet + Snappy file format? Need help in resolving this issue. Below is the detailed information regarding the error and configurations. Any help is appreciated! On Mon, Oct 27, 2014 at 11:40 PM, Suraj Nayak <snay...@gmail.com> wrote: > Hi Parquet Developers, > > Am using Parquet for analytics on AWS EC2. A INSERT OVERWRITE from one > small hive to another tables works fine with parquet with snappy. But when > the file is having 1.2+ Billion records, the insert overwrite fails due to > Java Heap issue. > > Few other stats about the data : > > Parquet version =1.2.5(CDH5.1.3) > Total files in the HDFS directory : 555 > Attached PrintFooter Parquet.txt with the PrintFooter Information. (Note:- > Changes the column names to random characters) > HDFS block size : 256MB > > *Yarn configurations :* > yarn.nodemanager.resource.memory-mb = 24GB > yarn.scheduler.minimum-allocation-mb = 2GB > mapreduce.map.memory.mb = 6GB > mapreduce.reduce.memory.mb = 12GB > mapreduce.map.java.opts = 4.5GB > mapreduce.reduce.java.opts = 9GB > yarn.nodemanager.vmem-pmem-ratio = 2.1 > > > Can anyone help me debug or help me fix this issue ? Is this fixed in any > newer Parquet version (>v1.2.5)? > > Let me know in case if some one needs more information. > > Pasting the task attempt log below : > > 2014-10-27 13:10:41,556 FATAL [main] org.apache.hadoop.mapred.YarnChild: > Error running child : java.lang.OutOfMemoryError: Java heap space > at parquet.column.values.dictionary.IntList.initSlab(IntList.java:87) > at parquet.column.values.dictionary.IntList.<init>(IntList.java:83) > at > parquet.column.values.dictionary.DictionaryValuesWriter.<init>(DictionaryValuesWriter.java:85) > at > parquet.column.values.dictionary.DictionaryValuesWriter$PlainDoubleDictionaryValuesWriter.<init>(DictionaryValuesWriter.java:471) > at parquet.column.impl.ColumnWriterImpl.<init>(ColumnWriterImpl.java:91) > at > parquet.column.impl.ColumnWriteStoreImpl.newMemColumn(ColumnWriteStoreImpl.java:60) > at > parquet.column.impl.ColumnWriteStoreImpl.getColumnWriter(ColumnWriteStoreImpl.java:52) > at > parquet.io.MessageColumnIO$MessageColumnIORecordConsumer.<init>(MessageColumnIO.java:98) > at parquet.io.MessageColumnIO.getRecordWriter(MessageColumnIO.java:283) > at > parquet.hadoop.InternalParquetRecordWriter.initStore(InternalParquetRecordWriter.java:102) > at > parquet.hadoop.InternalParquetRecordWriter.<init>(InternalParquetRecordWriter.java:88) > at > parquet.hadoop.ParquetRecordWriter.<init>(ParquetRecordWriter.java:61) > at > parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:267) > at > parquet.hadoop.ParquetOutputFormat.getRecordWriter(ParquetOutputFormat.java:245) > at > org.apache.hadoop.hive.ql.io.parquet.write.ParquetRecordWriterWrapper.<init>(ParquetRecordWriterWrapper.java:54) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getParquerRecordWriterWrapper(MapredParquetOutputFormat.java:122) > at > org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat.getHiveRecordWriter(MapredParquetOutputFormat.java:113) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getRecordWriter(HiveFileFormatUtils.java:296) > at > org.apache.hadoop.hive.ql.io.HiveFileFormatUtils.getHiveRecordWriter(HiveFileFormatUtils.java:283) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createBucketFiles(FileSinkOperator.java:516) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.createNewPaths(FileSinkOperator.java:692) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.getDynOutPaths(FileSinkOperator.java:764) > at > org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:609) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) > at > org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) > at > org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:91) > at org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:501) > at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:847) > at > org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:519) > > > > -- > Thanks > Suraj Nayak M > -- Thanks Suraj Nayak M -- Thanks Suraj Nayak M