OK, so I get the similar diffs  with ORC, so is not Parquet.
The expected .out files are created running mvn test on Windows, so the issue 
is Windows specific not Parquet specific. I'll investigate...

From: Remus Rusanu [mailto:rem...@microsoft.com]
Sent: Monday, February 17, 2014 3:59 PM
To: dev@hive.apache.org
Cc: Brock Noland
Subject: Why do I get statistics diff in EXPLAIN for Parquet?

Looking at the failed Jenkins runs for HIVE-5998, I see there are diffs in the 
statistics in the EXPLAIN:

Running: diff -a 
/root/hive/itests/qtest/../../itests/qtest/target/qfile-results/clientpositive/vectorized_parquet.q.out
 
/root/hive/itests/qtest/../../ql/src/test/results/clientpositive/vectorized_parquet.q.out
72c72
<             Statistics: Num rows: 12288 Data size: 73728 Basic stats: 
COMPLETE Column stats: NONE
---
>             Statistics: Num rows: 2072 Data size: 257046 Basic stats: 
> COMPLETE Column stats: NONE
75c75
<               Statistics: Num rows: 6144 Data size: 36864 Basic stats: 
COMPLETE Column stats: NONE
---
>               Statistics: Num rows: 1036 Data size: 128523 Basic stats: 
> COMPLETE Column stats: NONE
79c79
<                 Statistics: Num rows: 6144 Data size: 36864 Basic stats: 
COMPLETE Column stats: NONE
---
>                 Statistics: Num rows: 1036 Data size: 128523 Basic stats: 
> COMPLETE Column stats: NONE
82c82
<                   Statistics: Num rows: 10 Data size: 60 Basic stats: 
COMPLETE Column stats: NONE
---
>                   Statistics: Num rows: 10 Data size: 1240 Basic stats: 
> COMPLETE Column stats: NONE

What would cause such statistics diffs? The Parquet file is created as:

create table if not exists alltypes_parquet (
  cint int,
  ctinyint tinyint,
  csmallint smallint,
  cfloat float,
  cdouble double,
  cstring1 string) stored as parquet;

insert overwrite table alltypes_parquet
  select cint,
    ctinyint,
    csmallint,
    cfloat,
    cdouble,
    cstring1
  from alltypesorc;

Note that there are no diffs in the actual query results.

Thanks,
~Remus

Reply via email to