[ 
https://issues.apache.org/jira/browse/HIVE-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421247#comment-17421247
 ] 

Stamatis Zampetakis commented on HIVE-25557:
--------------------------------------------

I am not sure I understand if the problem is in Tez, Parquet or the 
combination. Is the COUNT query fast with MR and Parquet? Is the COUNT query 
fast with Tez and other format e.g., ORC? 

Please also include the plans ({{EXPLAIN}}) for the queries you are testing.

> Hive 3.1.2 with Tez is slow to clount data in parquet format
> ------------------------------------------------------------
>
>                 Key: HIVE-25557
>                 URL: https://issues.apache.org/jira/browse/HIVE-25557
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 3.1.2
>         Environment: Tez *0.10.1*
>            Reporter: katty he
>            Priority: Major
>
> recently, i use test a sql like seelct count(*) from table in Hive 3.1.2 with 
> Tez, and the table is in parquet format, normally, when counting, the query 
> engin can read metadata instead of reading the full data, but in my case,  
> Tez can not get count by metadata only, it will read the data, so it's slow, 
> when count 2 billion data, tez wil use 500s , and spend 60s to initialized, 
> ts that a problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to