Hi Experts I'm currently working with hive 0.7 mostly with JOINS. In all permissible cases i'm using map joins by setting the hive.auto.convert.join=true parameter. Usage of local map joins have made a considerable performance improvement in hive queries.I have used this local map join only on the default set of hive configuration parameters now i'd try to dig more deeper into this. Want to try out this local map join on little bigger tables with more no of rows. Given below is a failure log of one of my local map tasks and in turn executing its back up common join task
2011-03-31 09:56:54 Starting to launch local task to process map join; maximum memory = 932118528 2011-03-31 09:56:57 Processing rows: 200000 Hashtable size: 199999 Memory usage: 115481024 rate: 0.124 2011-03-31 09:57:00 Processing rows: 300000 Hashtable size: 299999 Memory usage: 169344064 rate: 0.182 2011-03-31 09:57:03 Processing rows: 400000 Hashtable size: 399999 Memory usage: 232132792 rate: 0.249 2011-03-31 09:57:06 Processing rows: 500000 Hashtable size: 499999 Memory usage: 282338544 rate: 0.303 2011-03-31 09:57:10 Processing rows: 600000 Hashtable size: 599999 Memory usage: 336738640 rate: 0.361 2011-03-31 09:57:14 Processing rows: 700000 Hashtable size: 699999 Memory usage: 391117888 rate: 0.42 2011-03-31 09:57:22 Processing rows: 800000 Hashtable size: 799999 Memory usage: 453906496 rate: 0.487 2011-03-31 09:57:27 Processing rows: 900000 Hashtable size: 899999 Memory usage: 508306552 rate: 0.545 2011-03-31 09:57:34 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 562706496 rate: 0.604 FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapredLocalTask ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.MapRedTask Launching Job 4 out of 6 Here i"d like to make this local map task running, for the same i tried setting the following hive parameters as hive -f HiveJob.txt -hiveconf hive.mapjoin.maxsize=1000000 -hiveconf hive.mapjoin.smalltable.filesize=40000000 -hiveconf hive.auto.convert.join=true Butting setting the two config parameters doesn't make my local map task proceed beyond this stage. I didn't try out overriding the hive.mapjoin.localtask.max.memory.usage=0.90 because from my task log shows that the memory usage rate is just 0.604, so i assume setting the same with a larger value wont cater to a solution in my case.Could some one please guide me what are the actual parameters and the values I should set to get things rolling. Thank You Regards Bejoy.K.S