Re: The dreaded Heap Space Issue on a Transform

2013-01-30 Thread John Omernik
I am realizing one of my challenges is that I have quite a few cores and map tasks per node, but (I didn't set it up) I am only running 4 GB per physical core (12) with 18 map slots. I am guessing right now that any given time, with 18 map slots, the 1.8 total GB of ram I am assigning to to the so

Re: The dreaded Heap Space Issue on a Transform

2013-01-30 Thread Dean Wampler
We didn't ask yet, but to be sure, are all the slave nodes configured the same, both in terms of hardware and other apps running, if any, running on them? On Wed, Jan 30, 2013 at 10:14 AM, Richard Nadeau wrote: > What do you have set in core-site.XML for io.sort.mb, io.sort.factor, and > io.file.

Re: The dreaded Heap Space Issue on a Transform

2013-01-30 Thread Richard Nadeau
What do you have set in core-site.XML for io.sort.mb, io.sort.factor, and io.file.buffer.size? You should be able to adjust these and get past the heap issue. Be careful about how much ram you ave though, and don't st them too high. Rick On Jan 30, 2013 8:55 AM, "John Omernik" wrote: > So it's f

Re: The dreaded Heap Space Issue on a Transform

2013-01-30 Thread John Omernik
So it's filling up on the emitting stage, so I need to look at the task logs and or my script that's printing to stdout as the likely culprits I am guessing. On Wed, Jan 30, 2013 at 9:11 AM, Philip Tromans wrote: > That particular OutOfMemoryError is happening on one of your hadoop nodes. > It'

Re: The dreaded Heap Space Issue on a Transform

2013-01-30 Thread Philip Tromans
That particular OutOfMemoryError is happening on one of your hadoop nodes. It's the heap within the process forked by the hadoop tasktracker, I think. Phil. On 30 January 2013 14:28, John Omernik wrote: > So just a follow-up. I am less looking for specific troubleshooting on how > to fix my pr

Re: The dreaded Heap Space Issue on a Transform

2013-01-30 Thread John Omernik
So just a follow-up. I am less looking for specific troubleshooting on how to fix my problem, and more looking for a general understanding of heap space usage with Hive. When I get an error like this, is it heap space on a node, or heap space on my hive server? Is it the heap space of the tasktra

The dreaded Heap Space Issue on a Transform

2013-01-29 Thread John Omernik
I am running a transform script that parses through a bunch of binary data. In 99% of the cases it runs, it runs fine, but on certain files I get a failure (as seen below). Funny thing is, I can run a job with "only" the problem source file, and it will work fine, but when as a group of files, I g