Hi Nikolaos

+user@hive list

Hive not running a tez job is because of fetch task optimization which directly 
fetches data and run it through operator pipeline for specific set of queries.

If you want to fully disable it try “set hive.fetch.task.conversion=none”.

If you want to trigger it for much smaller data sizes lower the value for 
hive.fetch.task.conversion.threshold.

Thanks
Prasanth

On Jun 28, 2018, at 10:50 AM, Nikolaos Tsipas 
<nicktg...@gmail.com<mailto:nicktg...@gmail.com>> wrote:

Hi,

I'm using Tez with Hive to query data on S3 and I notice the following two 
cases.

Case A

When the query is covering a smaller amount of data a TEZ job (yarn 
application) is not created

select dt from my_db_schema.my_table where dt in ('2018-03-10','2018-03-09') 
and header ='xxx';

The output in the above case is:

OK
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
2018-03-10
2018-03-10
2018-03-09
2018-03-09
Time taken: 7.043 seconds, Fetched: 4 row(s)


Case B

When the query is scanning more data

select dt from my_db_schema.my_table where  header ='xxx';

then the output is as follows and I can see a TEZ job logged in the TEZ ui and 
in yarn.

----------------------------------------------------------------------------------------------
        VERTICES      MODE        STATUS  TOTAL  COMPLETED  RUNNING  PENDING  
FAILED  KILLED
----------------------------------------------------------------------------------------------
Map 1 .......... container     SUCCEEDED     22         22        0        0    
   0       0
----------------------------------------------------------------------------------------------
VERTICES: 01/01  [==========================>>] 100%  ELAPSED TIME: 38.12 s
----------------------------------------------------------------------------------------------
OK
2018-03-05
2018-03-05
2018-03-06
2018-03-06
2018-03-07
2018-03-07
2018-03-08
2018-03-08
2018-03-09
2018-03-09
2018-03-10
2018-03-10
2018-03-25
2018-03-25
2018-03-26
2018-03-26
2018-03-28
2018-03-28
2018-05-09
2018-05-09
2018-05-10
2018-05-10
Time taken: 47.197 seconds, Fetched: 22 row(s)

The problem in case A is that sometimes Hive decides not to trigger a TEZ job 
and the query is taking a long time to complete. In this case the worker nodes 
are not utilised at all, it's only the master node executing the query.

Is there a way to force Hive to always trigger a TEZ job?

Reply via email to