Sergio Peña created HIVE-11763:
----------------------------------

             Summary: Use * instead of sum(hash(*)) on Parquet predicate (PPD) 
integration tests
                 Key: HIVE-11763
                 URL: https://issues.apache.org/jira/browse/HIVE-11763
             Project: Hive
          Issue Type: Sub-task
            Reporter: Sergio Peña


The integration tests for Parquet predicate push down (PPD) use the following 
query to validate the values filtered:
{noformat}
select sum(hash(*)) from ...
{noformat}

It would be better if we use {{select * from ...}} instead to see that those 
values are correct. It is difficult to see if a value was filtered by seeing 
the hash.

Also, we can try to limit the number of rows of the INSERT ... SELECT statmenet 
to avoid displaying many rows when validating the data. I think a LIMIT 2 on 
each of the SELECT.

For example, the parquet_ppd_boolean.ppd has this:
{noformat}
insert overwrite table newtypestbl select * from (select cast("apple" as 
char(10)), cast("bee" as varchar(10)), 0.22, true from src src1 union all 
select cast("hello" as char(10)), cast("world" as varchar(10)), 11.22, false 
from src src2) uniontbl;
{noformat}

If we use LIMIT 2, then we will reduce the # of rows:
{noformat}
insert overwrite table newtypestbl select * from (select cast("apple" as 
char(10)), cast("bee" as varchar(10)), 0.22, true from src src1 LIMIT 2 union 
all select cast("hello" as char(10)), cast("world" as varchar(10)), 11.22, 
false from src src2 LIMIT 2) uniontbl;
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to