Re: ORC file in Hive 0.13 throws Java heap space error

John Omernik Fri, 16 May 2014 14:23:32 -0700

When I created the table, I had to reduce the orc.compress.size quite a bit
to make my table with many columns work. This was on Hive 0.12 (I thought
it was supposed to be fixed on Hive 0.13, but 3k+ columns is huge)  The
default of orc.compress size is quite a bit larger ( think in the 268k
range) Try moving that smaller and smaller if that level doesn't work.
 Good luck.


STORED AS orc tblproperties ("orc.compress.size"="8192");


On Thu, May 15, 2014 at 8:11 PM, Premal Shah <premal.j.s...@gmail.com>wrote:

> I have a table in hive stored as text file with 3283 columns. All columns
> are of string data type.
>
> I'm trying to convert that table into an orc file table using this command
> *create table orc_table stored as orc as select * from text_table;*
>
> This is the setting under mapred-site.xml
>
> <property>
>     <name>mapred.child.java.opts</name>
>     <value>-Xmx4G -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode
> -verbose:gc -Xloggc:/mnt/hadoop/@taskid@.gc</value>
>     <final>true</final>
>   </property>
>
> The tasks die with this error
>
> 2014-05-16 00:53:42,424 FATAL org.apache.hadoop.mapred.Child: Error running 
> child : java.lang.OutOfMemoryError: Java heap space
>       at java.nio.HeapByteBuffer.<init>(HeapByteBuffer.java:39)
>       at java.nio.ByteBuffer.allocate(ByteBuffer.java:312)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OutStream.getNewOutputBuffer(OutStream.java:117)
>       at org.apache.hadoop.hive.ql.io.orc.OutStream.spill(OutStream.java:168)
>       at org.apache.hadoop.hive.ql.io.orc.OutStream.flush(OutStream.java:239)
>       at 
> org.apache.hadoop.hive.ql.io.orc.RunLengthByteWriter.flush(RunLengthByteWriter.java:58)
>       at 
> org.apache.hadoop.hive.ql.io.orc.BitFieldWriter.flush(BitFieldWriter.java:44)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:553)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StringTreeWriter.writeStripe(WriterImpl.java:1012)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$ListTreeWriter.writeStripe(WriterImpl.java:1455)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1400)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:1780)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:221)
>       at 
> org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168)
>       at 
> org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157)
>       at 
> org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2028)
>       at 
> org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:86)
>       at 
> org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:622)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>       at 
> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:87)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>       at 
> org.apache.hadoop.hive.ql.exec.TableScanOperator.processOp(TableScanOperator.java:92)
>       at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:793)
>       at 
> org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:540)
>       at org.apache.hadoop.hive.ql.exec.mr.ExecMapper.map(ExecMapper.java:177)
>       at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
>       at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>       at org.apache.hadoop.mapred.MapTask.run(MapTask.java:366)
>       at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>       at java.security.AccessController.doPrivileged(Native Method)
>       at javax.security.auth.Subject.doAs(Subject.java:396)
>       at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1190)
>
>
> This is the GC output for a task that ran out of memory
>
> 0.690: [GC 17024K->768K(83008K), 0.0019170 secs]
> 0.842: [GC 8488K(83008K), 0.0066800 secs]
> 1.031: [GC 17792K->1481K(83008K), 0.0015400 secs]
> 1.352: [GC 17142K(83008K), 0.0041840 secs]
> 1.371: [GC 18505K->2249K(83008K), 0.0097240 secs]
> 34.779: [GC 28384K(4177280K), 0.0014050 secs]
>
>
> Anything I can tweak to make it work?
>
> --
> Regards,
> Premal Shah.
>

Re: ORC file in Hive 0.13 throws Java heap space error

Reply via email to