I am realizing one of my challenges is that I have quite a few cores and map tasks per node, but (I didn't set it up) I am only running 4 GB per physical core (12) with 18 map slots. I am guessing right now that any given time, with 18 map slots, the 1.8 total GB of ram I am assigning to to the sort stuff is under sized, yet I am constrained on memory, so I can't just up it. Working on getting things upgraded. Thanks for all I appreciate the thoughts.
On Wed, Jan 30, 2013 at 10:40 AM, Dean Wampler < dean.wamp...@thinkbiganalytics.com> wrote: > We didn't ask yet, but to be sure, are all the slave nodes configured the > same, both in terms of hardware and other apps running, if any, running on > them? > > > On Wed, Jan 30, 2013 at 10:14 AM, Richard Nadeau <strout...@gmail.com>wrote: > >> What do you have set in core-site.XML for io.sort.mb, io.sort.factor, and >> io.file.buffer.size? You should be able to adjust these and get past the >> heap issue. Be careful about how much ram you ave though, and don't st them >> too high. >> >> Rick >> On Jan 30, 2013 8:55 AM, "John Omernik" <j...@omernik.com> wrote: >> >>> So it's filling up on the emitting stage, so I need to look at the task >>> logs and or my script that's printing to stdout as the likely culprits I am >>> guessing. >>> >>> >>> >>> On Wed, Jan 30, 2013 at 9:11 AM, Philip Tromans < >>> philip.j.trom...@gmail.com> wrote: >>> >>>> That particular OutOfMemoryError is happening on one of your hadoop >>>> nodes. It's the heap within the process forked by the hadoop tasktracker, I >>>> think. >>>> >>>> Phil. >>>> >>>> >>>> On 30 January 2013 14:28, John Omernik <j...@omernik.com> wrote: >>>> >>>>> So just a follow-up. I am less looking for specific troubleshooting on >>>>> how to fix my problem, and more looking for a general understanding of >>>>> heap >>>>> space usage with Hive. When I get an error like this, is it heap space on >>>>> a node, or heap space on my hive server? Is it the heap space of the >>>>> tasktracker? Heap of the job kicked off on the node? Which heap is being >>>>> affected? If it's not clear in my output, where can I better understand >>>>> this? I am sorely out of my league here when it comes to understanding the >>>>> JVM interactions of Hive and Hadoop, i.e. where hive is run, vs where task >>>>> trackers are run etc. >>>>> >>>>> Thanks is advance! >>>>> >>>>> >>>>> >>>>> On Tue, Jan 29, 2013 at 7:43 AM, John Omernik <j...@omernik.com>wrote: >>>>> >>>>>> I am running a transform script that parses through a bunch of binary >>>>>> data. In 99% of the cases it runs, it runs fine, but on certain files I >>>>>> get >>>>>> a failure (as seen below). Funny thing is, I can run a job with "only" >>>>>> the >>>>>> problem source file, and it will work fine, but when as a group of >>>>>> files, I >>>>>> get these warnings. I guess what I am asking here is this: Where is the >>>>>> heap error? Is this occurring on the nodes themselves or, since this is >>>>>> where the script is emitting records (and potentially large ones at that) >>>>>> and in this case my hive server running the job may be memory light, >>>>>> could >>>>>> the issue actually be due to heap on the hive server itself? My setup >>>>>> is >>>>>> 1 Hive node (that is woefully underpowered, under memoried, and under >>>>>> disk >>>>>> I/Oed) and 4 beefy hadoop nodes. I guess, my question is the heap issue >>>>>> on >>>>>> the sender or the receiver :) >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> 13-01-29 08:20:24,107 INFO org.apache.hadoop.hive.ql.io.CodecPool: >>>>>> Got brand-new compressor >>>>>> 2013-01-29 08:20:24,107 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 12 forwarding 1 rows >>>>>> 2013-01-29 08:20:24,410 INFO >>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: 3 forwarding 10 rows >>>>>> 2013-01-29 08:20:24,410 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 4 forwarding 10 rows >>>>>> 2013-01-29 08:20:24,411 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding 10 rows >>>>>> 2013-01-29 08:20:24,411 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 6 forwarding 10 rows >>>>>> 2013-01-29 08:20:24,411 INFO >>>>>> org.apache.hadoop.hive.ql.exec.FilterOperator: 8 forwarding 10 rows >>>>>> 2013-01-29 08:20:24,411 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 9 forwarding 10 rows >>>>>> 2013-01-29 08:20:24,411 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 10 forwarding 10 rows >>>>>> 2013-01-29 08:20:24,412 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 12 forwarding 10 rows >>>>>> 2013-01-29 08:20:27,170 INFO >>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: 3 forwarding 100 rows >>>>>> 2013-01-29 08:20:27,170 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 4 forwarding 100 rows >>>>>> 2013-01-29 08:20:27,170 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding 100 rows >>>>>> 2013-01-29 08:20:27,171 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 6 forwarding 100 rows >>>>>> 2013-01-29 08:20:27,171 INFO >>>>>> org.apache.hadoop.hive.ql.exec.FilterOperator: 8 forwarding 100 rows >>>>>> 2013-01-29 08:20:27,171 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 9 forwarding 100 rows >>>>>> 2013-01-29 08:20:27,171 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 10 forwarding 100 rows >>>>>> 2013-01-29 08:20:27,171 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 12 forwarding 100 rows >>>>>> 2013-01-29 08:21:16,247 INFO >>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: 3 forwarding 1000 rows >>>>>> 2013-01-29 08:21:16,247 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 4 forwarding 1000 rows >>>>>> 2013-01-29 08:21:16,247 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding 1000 rows >>>>>> 2013-01-29 08:21:16,247 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 6 forwarding 1000 rows >>>>>> 2013-01-29 08:21:16,248 INFO >>>>>> org.apache.hadoop.hive.ql.exec.FilterOperator: 8 forwarding 1000 rows >>>>>> 2013-01-29 08:21:16,248 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 9 forwarding 1000 rows >>>>>> 2013-01-29 08:21:16,248 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 10 forwarding 1000 rows >>>>>> 2013-01-29 08:21:16,248 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 12 forwarding 1000 rows >>>>>> 2013-01-29 08:25:47,532 INFO >>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: 3 forwarding 10000 rows >>>>>> 2013-01-29 08:25:47,532 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 4 forwarding 10000 rows >>>>>> 2013-01-29 08:25:47,532 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 5 forwarding 10000 rows >>>>>> 2013-01-29 08:25:47,532 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 6 forwarding 10000 rows >>>>>> 2013-01-29 08:25:47,532 INFO >>>>>> org.apache.hadoop.hive.ql.exec.FilterOperator: 8 forwarding 10000 rows >>>>>> 2013-01-29 08:25:47,532 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 9 forwarding 10000 rows >>>>>> 2013-01-29 08:25:47,532 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 10 forwarding 10000 rows >>>>>> 2013-01-29 08:25:47,532 INFO >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator: 12 forwarding 10000 rows >>>>>> 2013-01-29 08:27:34,276 WARN >>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: Exception in >>>>>> StreamThread.run(): Java heap space >>>>>> Cause: null >>>>>> 2013-01-29 08:27:34,277 WARN >>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: >>>>>> java.lang.OutOfMemoryError: >>>>>> Java heap space >>>>>> at java.util.Arrays.copyOfRange(Arrays.java:3209) >>>>>> at java.lang.String.<init>(String.java:215) >>>>>> at java.nio.HeapCharBuffer.toString(HeapCharBuffer.java:542) >>>>>> at java.nio.CharBuffer.toString(CharBuffer.java:1157) >>>>>> at org.apache.hadoop.io.Text.decode(Text.java:350) >>>>>> at org.apache.hadoop.io.Text.decode(Text.java:327) >>>>>> at org.apache.hadoop.io.Text.toString(Text.java:254) >>>>>> at java.lang.String.valueOf(String.java:2826) >>>>>> at java.lang.StringBuilder.append(StringBuilder.java:115) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.exec.FunctionRegistry.invoke(FunctionRegistry.java:873) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.udf.generic.GenericUDFBridge.evaluate(GenericUDFBridge.java:181) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.exec.ExprNodeGenericFuncEvaluator.evaluate(ExprNodeGenericFuncEvaluator.java:163) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:76) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.exec.Operator.process(Operator.java:471) >>>>>> at org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:762) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator$OutputStreamProcessor.processLine(ScriptOperator.java:477) >>>>>> at >>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator$StreamThread.run(ScriptOperator.java:563) >>>>>> >>>>>> 2013-01-29 08:27:34,306 INFO >>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: ErrorStreamProcessor >>>>>> calling >>>>>> reporter.progress() >>>>>> 2013-01-29 08:27:34,307 INFO >>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: StreamThread >>>>>> ErrorProcessor >>>>>> done >>>>>> 2013-01-29 08:27:34,307 ERROR >>>>>> org.apache.hadoop.hive.ql.exec.ScriptOperator: Script failed with code 1 >>>>>> >>>>> >>>>> >>>> >>> > > > -- > *Dean Wampler, Ph.D.* > thinkbiganalytics.com > +1-312-339-1330 > >