hi guys I am trying to optimize a hive join query, I have a join of two big tables. The join between them is taking too long, no matter how many reducers I set, there are always two reducers struggling to finish in the end of the job The job not always ends, sometime it fails with memory problems
In the fast completed reducers I can see: 7688459 rows: used memory = 991337736 In the long running reducers: 43363436 rows: used memory = 1142368456 At first I thought am dealing with skew key, but I set the hive.optimize.skewjoin to true, and it didn't change a thing, I played with hive.skewjoin.key also didn't change a thing Any other ideas I can try? I am using hive 0.10 of CDH4.2.1 the source tables are using customized serdes Thanks Guy Doulberg Team leader @ Perion
