[ https://issues.apache.org/jira/browse/HIVE-9983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Sergey Shelukhin updated HIVE-9983: ----------------------------------- Component/s: Vectorization > Vectorizer doesn't vectorize (1) partitions with different schema (2) any > MapWork with >1 table scans in MR > ----------------------------------------------------------------------------------------------------------- > > Key: HIVE-9983 > URL: https://issues.apache.org/jira/browse/HIVE-9983 > Project: Hive > Issue Type: Bug > Components: Vectorization > Reporter: Sergey Shelukhin > Assignee: Matt McCline > > For some test, tables are created as such: > {noformat} > CREATE TABLE orc_llap_part( > csmallint SMALLINT, > cint INT, > cbigint BIGINT, > cfloat FLOAT, > cdouble DOUBLE, > cstring1 STRING, > cstring2 STRING, > ctimestamp1 TIMESTAMP, > ctimestamp2 TIMESTAMP, > cboolean1 BOOLEAN, > cboolean2 BOOLEAN > ) PARTITIONED BY (ctinyint TINYINT) STORED AS ORC; > CREATE TABLE orc_llap_dim_part( > cbigint BIGINT > ) PARTITIONED BY (ctinyint TINYINT) STORED AS ORC; > INSERT OVERWRITE TABLE orc_llap_part PARTITION (ctinyint) > SELECT csmallint, cint, cbigint, cfloat, cdouble, cstring1, cstring2, > ctimestamp1, ctimestamp2, cboolean1, cboolean2, ctinyint FROM alltypesorc; > INSERT OVERWRITE TABLE orc_llap_dim_part PARTITION (ctinyint) > SELECT sum(cbigint) as cbigint, ctinyint FROM alltypesorc WHERE ctinyint > 10 > AND ctinyint < 21 GROUP BY ctinyint; > {noformat} > The query is: > {noformat} > explain > SELECT oft.ctinyint, oft.cint FROM orc_llap_part oft > INNER JOIN orc_llap_dim_part od ON oft.ctinyint = od.ctinyint; > {noformat} > This results in a failure to vectorize in MR: > {noformat} > Could not vectorize partition > pfile:/Users/sergey/git/hive3/itests/qtest/target/warehouse/orc_llap_dim_part/ctinyint=11. > Its column names cbigint do not match the other column names > csmallint,cint,cbigint,cfloat,cdouble,cstring1,cstring2,ctimestamp1,ctimestamp2,cboolean1,cboolean2 > {noformat} > This is comparing schemas from different tables because MapWork has 2 > TableScan-s; in Tez this error will never happen as MapWork will not have 2 > scans. > In Tez (and MR as well), the other case can happen, namely partitions of the > same table having different schemas. > Tez case can be solved by making a super-schema to include all variations and > handling missing columns where necessary. > MR case may be harder to solve. > Of note is that despite schema being different (and not a prefix of a schema > by coincidence or some such), query passes if validation is commented out. > Perhaps in some cases it can work? -- This message was sent by Atlassian JIRA (v6.3.4#6332)