One possibility is that count(*) gives a cached stat, while count(distinct 
field) actually read data and perform the logic.

Try to set the below and test again:

set hive.compute.query.using.stats=false;



From: Igor Kuzmenko [mailto:f1she...@gmail.com]
Sent: Monday, August 21, 2017 10:01 AM
To: user@hive.apache.org
Subject: Unexpected query result

Runnuning simple 'select count(*) from test_table'  query returned me 500_000 
result.
But when i run 'select count(distinct field) from test_table' query result is 
500_001.

How it coud happen, that in table with 500_000 records have 500_001 unique 
field values?

I'm using Hive from HDP 2.5.0 platform.
Table stored as ORC.

Access Merkle’s award-winning Digital Marketing Report for the latest trends 
and benchmarks in digital 
marketing<http://www2.merkleinc.com/l/47252/2017-01-25/4k525x>

This email and any attachments transmitted with it are intended for use by the 
intended recipient(s) only. If you have received this email in error, please 
notify the sender immediately and then delete it. If you are not the intended 
recipient, you must not keep, use, disclose, copy or distribute this email 
without the author’s prior permission. We take precautions to minimize the risk 
of transmitting software viruses, but we advise you to perform your own virus 
checks on any attachment to this message. We cannot accept liability for any 
loss or damage caused by software viruses. The information contained in this 
communication may be confidential and may be subject to the attorney-client 
privilege.

Reply via email to