One possibility is that count(*) gives a cached stat, while count(distinct field) actually read data and perform the logic.
Try to set the below and test again: set hive.compute.query.using.stats=false; From: Igor Kuzmenko [mailto:f1she...@gmail.com] Sent: Monday, August 21, 2017 10:01 AM To: user@hive.apache.org Subject: Unexpected query result Runnuning simple 'select count(*) from test_table' query returned me 500_000 result. But when i run 'select count(distinct field) from test_table' query result is 500_001. How it coud happen, that in table with 500_000 records have 500_001 unique field values? I'm using Hive from HDP 2.5.0 platform. Table stored as ORC. Access Merkle’s award-winning Digital Marketing Report for the latest trends and benchmarks in digital marketing<http://www2.merkleinc.com/l/47252/2017-01-25/4k525x> This email and any attachments transmitted with it are intended for use by the intended recipient(s) only. If you have received this email in error, please notify the sender immediately and then delete it. If you are not the intended recipient, you must not keep, use, disclose, copy or distribute this email without the author’s prior permission. We take precautions to minimize the risk of transmitting software viruses, but we advise you to perform your own virus checks on any attachment to this message. We cannot accept liability for any loss or damage caused by software viruses. The information contained in this communication may be confidential and may be subject to the attorney-client privilege.