Re: Hive 0.12 ORC Heap Issues on Write

2014-04-28 Thread Prasanth Jayachandran
Glad that presentation was useful to you :) hive.exec.orc.memory.pool is the fraction of memory that ORC writers are allowed to use. If your heap size is 1GB and if the hive.exec.orc.memory.pool is set to 0.5 then ORC writers can use maximum of 500MB memory. If there are more ORC writers and if

Re: Hive 0.12 ORC Heap Issues on Write

2014-04-28 Thread John Omernik
Prasanth - This is easily the best and most complete explanation I've received to any online posted question ever. I know that sounds like a an overstatement, but this answer is awesome. :) I really appreciate your insight on this. My only follow-up is asking how the memory.pool percentage pla

Re: Hive 0.12 ORC Heap Issues on Write

2014-04-27 Thread John Omernik
So one more follow-up: The 16-.25-Success turns to a fail if I throw more data (and hence more partitions) at the problem. Could there be some sort of issue that rears it's head based on the number of output dynamic partitions? Thanks all! On Sun, Apr 27, 2014 at 3:33 PM, John Omernik wrote:

Re: Hive 0.12 ORC Heap Issues on Write

2014-04-27 Thread John Omernik
Here is some testing, I focused on two variables (Not really understanding what they do) orc.compress.size (256k by default) hive.exec.orc.memory.pool (0.50 by default). The job I am running is a admittedly complex job running through a Python Transform script. However, as noted above, RCFile wri

Hive 0.12 ORC Heap Issues on Write

2014-04-27 Thread John Omernik
Hello all, I am working with Hive 0.12 right now on YARN. When I am writing a table that is admittedly quite "wide" (there are lots of columns, near 60, including one binary field that can get quite large). Some tasks will fail on ORC file write with Java Heap Space Issues. I have confirmed th