Sergey Shelukhin created HIVE-9983:
--------------------------------------
Summary: Vectorizer doesn't vectorize (1) partitions with
different schema (2) any MapWork with >1 table scans in MR
Key: HIVE-9983
URL: https://issues.apache.org/jira/browse/HIVE-9983
Project: Hive
Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Matt McCline
For some test, tables are created as such:
{noformat}
CREATE TABLE orc_llap_part(
csmallint SMALLINT,
cint INT,
cbigint BIGINT,
cfloat FLOAT,
cdouble DOUBLE,
cstring1 STRING,
cstring2 STRING,
ctimestamp1 TIMESTAMP,
ctimestamp2 TIMESTAMP,
cboolean1 BOOLEAN,
cboolean2 BOOLEAN
) PARTITIONED BY (ctinyint TINYINT) STORED AS ORC;
CREATE TABLE orc_llap_dim_part(
cbigint BIGINT
) PARTITIONED BY (ctinyint TINYINT) STORED AS ORC;
INSERT OVERWRITE TABLE orc_llap_part PARTITION (ctinyint)
SELECT csmallint, cint, cbigint, cfloat, cdouble, cstring1, cstring2,
ctimestamp1, ctimestamp2, cboolean1, cboolean2, ctinyint FROM alltypesorc;
INSERT OVERWRITE TABLE orc_llap_dim_part PARTITION (ctinyint)
SELECT sum(cbigint) as cbigint, ctinyint FROM alltypesorc WHERE ctinyint > 10
AND ctinyint < 21 GROUP BY ctinyint;
{noformat}
The query is:
{noformat}
explain
SELECT oft.ctinyint, oft.cint FROM orc_llap_part oft
INNER JOIN orc_llap_dim_part od ON oft.ctinyint = od.ctinyint;
{noformat}
This results in a failure to vectorize in MR:
{noformat}
Could not vectorize partition
pfile:/Users/sergey/git/hive3/itests/qtest/target/warehouse/orc_llap_dim_part/ctinyint=11.
Its column names cbigint do not match the other column names
csmallint,cint,cbigint,cfloat,cdouble,cstring1,cstring2,ctimestamp1,ctimestamp2,cboolean1,cboolean2
{noformat}
This is comparing schemas from different tables because MapWork has 2
TableScan-s; in Tez this error will never happen as MapWork will not have 2
scans.
In Tez (and MR as well), the other case can happen, namely partitions of the
same table having different schemas.
Tez case can be solved by making a super-schema to include all variations and
handling missing columns where necessary.
MR case may be harder to solve.
Of note is that despite schema being different (and not a prefix of a schema by
coincidence or some such), query passes if validation is commented out. Perhaps
in some cases it can work?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)