[jira] [Commented] (HIVE-8498) Insert into table misses some rows when vectorization is enabled

Jitendra Nath Pandey (JIRA) Thu, 30 Oct 2014 12:07:16 -0700

    [ 
https://issues.apache.org/jira/browse/HIVE-8498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14190627#comment-14190627
 ]


Jitendra Nath Pandey commented on HIVE-8498:
--------------------------------------------

It might be ok to re-use the vectorization context on the branches of tablescan:
1) Same rowbatch is passed to each branch as table scan is emitting same 
row-schema to each branch. The temporary columns are re-used across the 
branches. If same rowbatch can be made to work, there shouldn't be a reason to 
create a new context.
2) A branch may have an operator that changes the rowbatch and might need a 
different vectorization context. However, this case is already handled using 
VectorizationContextRegion.

This issue is being caused because we are not preserving the rowbatch to pass 
it to other children of table scan. The in-place filtering in the selected 
vector makes the rowbatch unusable for the other branches.
I will post a patch with this fix shortly.

> Insert into table misses some rows when vectorization is enabled
> ----------------------------------------------------------------
>
>                 Key: HIVE-8498
>                 URL: https://issues.apache.org/jira/browse/HIVE-8498
>             Project: Hive
>          Issue Type: Bug
>          Components: Vectorization
>    Affects Versions: 0.14.0, 0.13.1
>            Reporter: Prasanth J
>            Assignee: Jitendra Nath Pandey
>            Priority: Critical
>              Labels: vectorization
>         Attachments: HIVE-8498.01.patch, HIVE-8498.02.patch
>
>
>  Following is a small reproducible case for the issue
> create table orc1
>   stored as orc
>   tblproperties("orc.compress"="ZLIB")
>   as
>     select rn
>     from
>     (
>       select cast(1 as int) as rn from src limit 1
>       union all
>       select cast(100 as int) as rn from src limit 1
>       union all
>       select cast(10000 as int) as rn from src limit 1
>     ) t;
> create table orc_rn1 (rn int);
> create table orc_rn2 (rn int);
> create table orc_rn3 (rn int);
> // These inserts should produce 3 rows but only 1 row is produced
> from orc1 a
> insert overwrite table orc_rn1 select a.* where a.rn < 100
> insert overwrite table orc_rn2 select a.* where a.rn >= 100 and a.rn < 1000
> insert overwrite table orc_rn3 select a.* where a.rn >= 1000;
> select * from orc_rn1
> union all
> select * from orc_rn2
> union all
> select * from orc_rn3;
> The expected output of the query is
> 1
> 100
> 10000
> But with vectorization enabled we get
> 1



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HIVE-8498) Insert into table misses some rows when vectorization is enabled

Reply via email to