[ https://issues.apache.org/jira/browse/HIVE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190627#comment-14190627 ]
Jitendra Nath Pandey commented on HIVE-8498: -------------------------------------------- It might be ok to re-use the vectorization context on the branches of tablescan: 1) Same rowbatch is passed to each branch as table scan is emitting same row-schema to each branch. The temporary columns are re-used across the branches. If same rowbatch can be made to work, there shouldn't be a reason to create a new context. 2) A branch may have an operator that changes the rowbatch and might need a different vectorization context. However, this case is already handled using VectorizationContextRegion. This issue is being caused because we are not preserving the rowbatch to pass it to other children of table scan. The in-place filtering in the selected vector makes the rowbatch unusable for the other branches. I will post a patch with this fix shortly. > Insert into table misses some rows when vectorization is enabled > ---------------------------------------------------------------- > > Key: HIVE-8498 > URL: https://issues.apache.org/jira/browse/HIVE-8498 > Project: Hive > Issue Type: Bug > Components: Vectorization > Affects Versions: 0.14.0, 0.13.1 > Reporter: Prasanth J > Assignee: Jitendra Nath Pandey > Priority: Critical > Labels: vectorization > Attachments: HIVE-8498.01.patch, HIVE-8498.02.patch > > > Following is a small reproducible case for the issue > create table orc1 > stored as orc > tblproperties("orc.compress"="ZLIB") > as > select rn > from > ( > select cast(1 as int) as rn from src limit 1 > union all > select cast(100 as int) as rn from src limit 1 > union all > select cast(10000 as int) as rn from src limit 1 > ) t; > create table orc_rn1 (rn int); > create table orc_rn2 (rn int); > create table orc_rn3 (rn int); > // These inserts should produce 3 rows but only 1 row is produced > from orc1 a > insert overwrite table orc_rn1 select a.* where a.rn < 100 > insert overwrite table orc_rn2 select a.* where a.rn >= 100 and a.rn < 1000 > insert overwrite table orc_rn3 select a.* where a.rn >= 1000; > select * from orc_rn1 > union all > select * from orc_rn2 > union all > select * from orc_rn3; > The expected output of the query is > 1 > 100 > 10000 > But with vectorization enabled we get > 1 -- This message was sent by Atlassian JIRA (v6.3.4#6332)