Hi Hive team, I have a Hive query translated and running as 2000+ map and 1009 reduce jobs. Reduce jobs are configured to run after all map jobs are completed. In reduce phase, 1008 of those reduce jobs complete within 5 minutes, but the one last reduce job takes more than 14 hours.
I expect to see reduce jobs complete roughly at the same time if I optimize data skew. For example,I have set the following parameters to optimize data skew. But it didn't help. set hive.optimize.skewjoin=true; set hive.skewjoin.key=100000000; Any idea what else parameters I need to set? Or how to optimize the run time for reduce jobs? Query is as follows: WITH uaf AS ( SELECT user_id FROM db1.table1 WHERE ds = '2018-11-25' AND is_valid AND days_since_last_visit = 0) SELECT * FROM db2.table2 c WHERE c.user_id IN ( SELECT user_id FROM uaf) AND Substr(datehour, 1, 8) = '20181125' LIMIT 10 Da