I removed the part of the SerDe that handled the arbitrary key/value pairs and I was able to process my entire data set. Sadly the part I removed has all the interesting data.
I'll play more with the heap settings and see if that lets me process the key/value pairs. Is the below the correct way to set the child heap value? Thanks, Pat From: Christopher, Pat Sent: Thursday, January 27, 2011 10:27 AM To: user@hive.apache.org Subject: RE: Hive Error on medium sized dataset It will be tricky to clean up the data format as I'm operating on somewhat arbitrary key-value pairs in part of the record. I will try and create something similar though, might take a bit. Thanks. I've tried resetting the heap size, I think. I added the following block to my mapred-site.xml: <property> <name>mapred.child.java.opts</name> <value>-Xm512M</value> </property> Is that how I'm supposed to do that? Thanks, Pat From: hadoop n00b [mailto:new2h...@gmail.com] Sent: Wednesday, January 26, 2011 9:09 PM To: user@hive.apache.org Subject: Re: Hive Error on medium sized dataset We typically get this error while running complex queries on our 4-node setup when the child JVM runs out of heap size. Would be interested in what the experts have to say about this error. On Thu, Jan 27, 2011 at 7:27 AM, Ajo Fod <ajo....@gmail.com<mailto:ajo....@gmail.com>> wrote: Any chance you can convert the data to a tab separated text file and try the same query? It may not be the SerDe, but it may be good to isolate that away as a potential source of the problem. -Ajo. On Wed, Jan 26, 2011 at 5:47 PM, Christopher, Pat <patrick.christop...@hp.com<mailto:patrick.christop...@hp.com>> wrote: Hi, I'm attempting to load a small to medium sized log file, ~250MB, and produce some basic reports from it, counts etc. Nothing fancy. However, whenever I try and read the entire dataset, ~330k rows, I get the following error: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask This result gets produced with basic queries like: SELECT count(1) FROM medium_table; However, if do the following: SELECT count(1) FROM ( SELECT col1 FROM medium_table LIMIT 70000 ) tbl; It works okay until I get to around 70,800ish then I get the first error message again. I'm running my HDFS system in single node, pseudo distributed mode with 1.5GB of memory and 20 GB of disk as a virtual machine. And I am using a custom SerDe. I don't think it's the SerDe but I'm open to suggestions for how I can check if it is causing the problem. I can't see anything in the data that would be causing it though. Anyone have any ideas of what might be causing this or something I can check? Thanks, Pat