When the following query was run with mapred.job.reuse.jvm.num.tasks=20, some
of the map tasks failed with "Error: Java heap space", causing the job to fail.
After changing to mapred.job.reuse.jvm.num.tasks=1, the job succeeded.
FROM (
FROM intable1
SELECT acct_id, esn) b
JOIN (
FROM intable2
SELECT acct_id, xid, devtype_id, esn, other_properties['cdndldist'] AS
cdndldist
WHERE (dateint>=20110201 AND dateint<=20110228)
AND (other_properties['cdndldist'] is not null)
AND (client_msg_type='endplay')
AND (devtype_id=272 OR devtype_id=129 OR devtype_id=12 OR devtype_id=13 OR
devtype_id=14)) c
ON (b.acct_id=c.acct_id) AND (b.esn=c.esn)
INSERT OVERWRITE TABLE outtable PARTITION (dateint='20110201-20110228')
SELECT b.acct_id, c.xid, c.devtype_id, c.esn, c.cdndldist;
I'm on a pre-release version of Hive 0.7, with Hadoop 0.20. Does anyone know
about any Hive/Hadoop issue that may be related to this?
Thanks.
Steven