[jira] [Commented] (HIVE-23230) "get_splits" udf ignores limit constraint while creating splits

Shubham Chaurasia (Jira) Thu, 23 Apr 2020 22:13:43 -0700


    [ 
https://issues.apache.org/jira/browse/HIVE-23230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091203#comment-17091203
 ]


Shubham Chaurasia commented on HIVE-23230:
------------------------------------------

[~adeshrao] 

HIVE-23230.2.patch looks good to me for fixing limit issue however these test 
failures seem related, all of them use get_splits(). I cannot access test 
report links above. Could you please check these locally ? and also reattach 
the same patch again.


cc [~sankarh]

> "get_splits" udf ignores limit constraint while creating splits
> ---------------------------------------------------------------
>
>                 Key: HIVE-23230
>                 URL: https://issues.apache.org/jira/browse/HIVE-23230
>             Project: Hive
>          Issue Type: Bug
>          Components: HiveServer2
>    Affects Versions: 3.1.0
>            Reporter: Adesh Kumar Rao
>            Assignee: Adesh Kumar Rao
>            Priority: Major
>         Attachments: HIVE-23230.1.patch, HIVE-23230.2.patch, HIVE-23230.patch
>
>
> Issue: Running the query {noformat}select * from <table> limit n{noformat} 
> from spark via hive warehouse connector may return more rows than "n".
> This happens because "get_splits" udf creates splits ignoring the limit 
> constraint. These splits when submitted to multiple llap daemons will return 
> "n" rows each.
> How to reproduce: Needs spark-shell, hive-warehouse-connector and hive on 
> llap with more that 1 llap daemons running.
> run below commands via beeline to create and populate the table
>  
> {noformat}
> create table test (id int);
> insert into table test values (1);
> insert into table test values (2);
> insert into table test values (3);
> insert into table test values (4);
> insert into table test values (5);
> insert into table test values (6);
> insert into table test values (7);
> delete from test where id = 7;{noformat}
> now running below query via spark-shell
> {noformat}
> import com.hortonworks.hwc.HiveWarehouseSession 
> val hive = HiveWarehouseSession.session(spark).build() 
> hive.executeQuery("select * from test limit 1").show()
> {noformat}
> will return more than 1 rows.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (HIVE-23230) "get_splits" udf ignores limit constraint while creating splits

Reply via email to