Hari Sekhon created HIVE-12359:
----------------------------------

             Summary: Hive ORC table reports different counts between select * 
and count(*)
                 Key: HIVE-12359
                 URL: https://issues.apache.org/jira/browse/HIVE-12359
             Project: Hive
          Issue Type: Bug
          Components: CBO, HiveServer2, ORC, Statistics
    Affects Versions: 1.2.1
         Environment: HDP 2.3 + Kerberos
            Reporter: Hari Sekhon
            Assignee: Vaibhav Gumashta


I have an ORC table which is giving different figures between select count( * ) 
and select *:
{code}> select count(*) from myTable;
+--------+--+
|  _c0   |
+--------+--+
| 56471  |
+--------+--+
{code}
{code}> select * from myTable;
...
109,295 rows selected (62.993 seconds)
{code}
At first I thought this was obvious just "analyze table ... compute statistics" 
and it'll correct itself, however I've tried that as well as adding "for 
columns" but the results remain the same. The select count( * ) is very fast so 
it must be using the pre-computed stats.

When I transform the table to text or to another orc table the count star on 
that new tables returns the correct number.

I've even tried disabling stats, CBO, the works, restart, same result, with 
very fast return each time for select count( * ), indicating it's using either 
pre-computed stats stored in Metastore or ORC stats in file format, but I'm not 
sure how ORC could store the wrong count, especially as doing a CTAS to another 
ORC table returns the correct count when I select count( * ) that new ORC table.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to