[ 
https://issues.apache.org/jira/browse/HIVE-25557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17421905#comment-17421905
 ] 

katty he commented on HIVE-25557:
---------------------------------

count(*) on MR wil faster than Tez, normally, count operation can only read 
parquet metadata, but in this case it read all the data and compute, do i am 
confused and there is plan:

!image-2021-09-29-11-07-04-118.png!

> Hive 3.1.2 with Tez is slow to clount data in parquet format
> ------------------------------------------------------------
>
>                 Key: HIVE-25557
>                 URL: https://issues.apache.org/jira/browse/HIVE-25557
>             Project: Hive
>          Issue Type: Improvement
>    Affects Versions: 3.1.2
>         Environment: Tez *0.10.1*
>            Reporter: katty he
>            Priority: Major
>         Attachments: image-2021-09-29-11-07-04-118.png
>
>
> recently, i use test a sql like seelct count(*) from table in Hive 3.1.2 with 
> Tez, and the table is in parquet format, normally, when counting, the query 
> engin can read metadata instead of reading the full data, but in my case,  
> Tez can not get count by metadata only, it will read the data, so it's slow, 
> when count 2 billion data, tez wil use 500s , and spend 60s to initialized, 
> ts that a problem?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to