Please refer the document below as well: Hive on Tez Performance Tuning - Determining Reducer Counts https://community.hortonworks.com/articles/22419/hive-on-tez-performance-tuning-determining-reducer.html might
I hope it gives you some clue to understand Tez inside. 2017-01-21 23:35 GMT+09:00 Mahender Sarangam <mahender.bigd...@outlook.com>: > Yes below option, i tried it, But I'm not sure about work load (data > ingestion). I cant go with fixed hard coded value,I would like to know > reason for getting 1009 reducer task. > > On 1/20/2017 7:45 PM, goun na wrote: > > Hi Mahender , > > 1st : > Didn't work the following option in Tez? > > set mapreduce.job.reduces=100 > or > set mapred.reduce.tasks=100 (deprecated) > > 2nd : > Possibility of data skew. It happens when handling null sometimes. > > Goun > > > 2017-01-21 9:58 GMT+09:00 Mahender Sarangam <mahender.bigd...@outlook.com> > : > >> Hi All, >> >> We have ORC table which is of 2 GB size. When we perform operation on >> top of this ORC table, Tez always deduce 1009 reducer every time. I >> searched 1009 is considered as Maximum value of number of Tez task. Is >> there a way to reduce the number of reducer. I see file generated >> underlying ORC some of them 500 MB or 1 GB etc. Is there way to >> distribute file size to same value/same size. >> >> >> My Second scenario, we have join on 5 tables all of them are left join. >> Query goes fast till reached 99%. From 99% to 100% it takes too much >> time. We are not involving our partition column as part of LEFT JOIN >> Statement, Is there better way to resolving issues on 99% hanging >> condition. My table is of 20 GB we are left joining with another table ( >> 9,00,00,000) records. >> >> >> Mahens >> >> > >