[ https://issues.apache.org/jira/browse/HIVE-11763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14739992#comment-14739992 ]
Ferdinand Xu commented on HIVE-11763: ------------------------------------- +1 pending to the tests. > Use * instead of sum(hash(*)) on Parquet predicate (PPD) integration tests > -------------------------------------------------------------------------- > > Key: HIVE-11763 > URL: https://issues.apache.org/jira/browse/HIVE-11763 > Project: Hive > Issue Type: Sub-task > Reporter: Sergio Peña > Assignee: Sergio Peña > Attachments: HIVE-11763.1.patch > > > The integration tests for Parquet predicate push down (PPD) use the following > query to validate the values filtered: > {noformat} > select sum(hash(*)) from ... > {noformat} > It would be better if we use {{select * from ...}} instead to see that those > values are correct. It is difficult to see if a value was filtered by seeing > the hash. > Also, we can try to limit the number of rows of the INSERT ... SELECT > statmenet to avoid displaying many rows when validating the data. I think a > LIMIT 2 on each of the SELECT. > For example, the parquet_ppd_boolean.ppd has this: > {noformat} > insert overwrite table newtypestbl select * from (select cast("apple" as > char(10)), cast("bee" as varchar(10)), 0.22, true from src src1 union all > select cast("hello" as char(10)), cast("world" as varchar(10)), 11.22, false > from src src2) uniontbl; > {noformat} > If we use LIMIT 2, then we will reduce the # of rows: > {noformat} > insert overwrite table newtypestbl select * from (select cast("apple" as > char(10)), cast("bee" as varchar(10)), 0.22, true from src src1 LIMIT 2 union > all select cast("hello" as char(10)), cast("world" as varchar(10)), 11.22, > false from src src2 LIMIT 2) uniontbl; > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)