subject:"Re\: Hive map join \- process a little larger tables with moderatenumber of rows"

Re: Hive map join - process a little larger tables with moderatenumber of rows

2011-04-01 Thread Viral Bajaria

Bejoy, We still use older version of Hive (0.5). In that version the join order used to matter. You needed to keep the largest table as the rightmost in your JOIN sequence to make sure that it is streamed and thus avoid the OOM exceptions which are caused by mappers which load the entire table in

Re: Hive map join - process a little larger tables with moderatenumber of rows

2011-03-31 Thread yongqiang he

Can you try this one "hive.mapred.local.mem" (in MB)? It is to control the heapsize of the join's local child process. You can also try to increase the HADOOP_HEAPSIZE for your hive client. But these all depends on how big is your small file. thanks yongqiang On Thu, Mar 31, 2011 at 10:15 PM, w

Re: Hive map join - process a little larger tables with moderatenumber of rows

2011-03-31 Thread bejoy_ks

Thanks Yongqiang for your reply. I'm running a hive script which has nearly 10 joins within. From those joins all map joins(9 of them involves one small table) involving smaller tables are running fine. Just 1 join is on two larger tables and this map join fails, however since the back up task(c