Hive map join - process a little larger tables with moderate number of rows

Bejoy Ks Thu, 31 Mar 2011 07:26:19 -0700

Hi Experts
    I'm currently working with hive 0.7 mostly with JOINS. In all permissible 
cases i'm using map joins by setting the hive.auto.convert.join=true  
parameter. 
Usage of local map joins have made a considerable performance improvement in 
hive queries.I have used this local map join only on the default set of hive 
configuration parameters now i'd try to dig more deeper into this. Want to try 
out this local map join on little bigger tables with more no of rows. Given 
below is a failure log of one of my local map tasks and in turn executing its 
back up common join task


2011-03-31 09:56:54     Starting to launch local task to process map join;      
maximum memory = 932118528
2011-03-31 09:56:57     Processing rows:        200000  Hashtable size: 199999  
Memory usage:   115481024       rate:   0.124
2011-03-31 09:57:00     Processing rows:        300000  Hashtable size: 299999  
Memory usage:   169344064       rate:   0.182
2011-03-31 09:57:03     Processing rows:        400000  Hashtable size: 399999  
Memory usage:   232132792       rate:   0.249
2011-03-31 09:57:06     Processing rows:        500000  Hashtable size: 499999  
Memory usage:   282338544       rate:   0.303
2011-03-31 09:57:10     Processing rows:        600000  Hashtable size: 599999  
Memory usage:   336738640       rate:   0.361
2011-03-31 09:57:14     Processing rows:        700000  Hashtable size: 699999  
Memory usage:   391117888       rate:   0.42
2011-03-31 09:57:22     Processing rows:        800000  Hashtable size: 799999  
Memory usage:   453906496       rate:   0.487
2011-03-31 09:57:27     Processing rows:        900000  Hashtable size: 899999  
Memory usage:   508306552       rate:   0.545
2011-03-31 09:57:34     Processing rows:        1000000 Hashtable size: 999999  
Memory usage:   562706496       rate:   0.604
FAILED: Execution Error, return code 2 from 
org.apache.hadoop.hive.ql.exec.MapredLocalTask
ATTEMPT: Execute BackupTask: org.apache.hadoop.hive.ql.exec.MapRedTask
Launching Job 4 out of 6


Here i"d like to make this local map task running, for the same i tried setting 
the following hive parameters as
hive -f  HiveJob.txt -hiveconf hive.mapjoin.maxsize=1000000 -hiveconf 
hive.mapjoin.smalltable.filesize=40000000 -hiveconf hive.auto.convert.join=true
Butting setting the two config parameters doesn't make my local map task 
proceed 
beyond this stage.  I didn't try out 

overriding the hive.mapjoin.localtask.max.memory.usage=0.90 because from my 
task 
log shows that the memory usage rate is just 0.604, so i assume setting the 
same 
with a larger value wont cater to a solution in my case.Could some one please 
guide me what are the actual parameters and the values I should set to get 
things rolling. 


Thank You

Regards 
Bejoy.K.S

Hive map join - process a little larger tables with moderate number of rows

Reply via email to