[ https://issues.apache.org/jira/browse/HIVE-12898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HIVE-12898: ---------------------------------- Labels: pull-request-available (was: ) > Hive should support ORC block skipping on nested fields > ------------------------------------------------------- > > Key: HIVE-12898 > URL: https://issues.apache.org/jira/browse/HIVE-12898 > Project: Hive > Issue Type: Improvement > Components: ORC > Affects Versions: 0.14.0, 1.2.1 > Reporter: Michael Haeusler > Assignee: Ashish Sharma > Priority: Major > Labels: pull-request-available > > Hive supports predicate pushdown (block skipping) for ORC tables only on > top-level fields. Hive should also support block skipping on nested fields > (within structs). > Example top-level: the following query selects 0 rows, using a predicate on > top-level column foo. We also see 0 INPUT_RECORDS in the summary: > {code:sql} > SET hive.tez.exec.print.summary=true; > CREATE TABLE t_toplevel STORED AS ORC AS SELECT 23 AS foo; > SELECT * FROM t_toplevel WHERE foo=42 ORDER BY foo; > [...] > VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS > CPU_TIME_MILLIS GC_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS > Map 1 1 0 0 1.22 > 2,640 102 0 0 > {code} > Example nested: the following query also selects 0 rows, but using a > predicate on nested column foo.bar. Unfortunately we see 1 INPUT_RECORDS in > the summary: > {code:sql} > SET hive.tez.exec.print.summary=true; > CREATE TABLE t_nested STORED AS ORC AS SELECT NAMED_STRUCT('bar', 23) AS foo; > SELECT * FROM t_nested WHERE foo.bar=42 ORDER BY foo; > [...] > VERTICES TOTAL_TASKS FAILED_ATTEMPTS KILLED_TASKS DURATION_SECONDS > CPU_TIME_MILLIS GC_TIME_MILLIS INPUT_RECORDS OUTPUT_RECORDS > Map 1 1 0 0 3.66 > 5,210 68 1 0 > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)