[ 
https://issues.apache.org/jira/browse/HIVE-27480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-27480:
----------------------------------
    Labels: pull-request-available  (was: )

> OFFSET without ORDER BY generates wrong results
> -----------------------------------------------
>
>                 Key: HIVE-27480
>                 URL: https://issues.apache.org/jira/browse/HIVE-27480
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 2.4.0, 3.1.3, 4.0.0-alpha-2
>            Reporter: okumin
>            Assignee: okumin
>            Priority: Major
>              Labels: pull-request-available
>
> Without ORDER BY, Hive doesn't add an additional single reducer, and it 
> pushes OFFSET and LIMIT to the original vertex. It can generate incorrect 
> results because OFFSET should be counted globally unlike LIMIT.
> We would make the following changes.
> - To fix the incorrect behavior
> - To add a new `hive.strict.checks.*` to prevent this usage. Mostly, OFFSET 
> without ORDER BY is meaningless
> We can reproduce the issue in the following steps.
> *Prepare test data*
> The following SQLs generate a test table with multiple files.
> {code:sql}
> $ beeline -e "
>   drop table test;
>   create table test(id int);
>   insert into test values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10);
>   insert into test values (11), (12), (13), (14), (15), (16), (17), (18), 
> (19), (20);
>   insert into test values (21), (22), (23), (24), (25), (26), (27), (28), 
> (29), (30);
> "
> {code}
> *Reproduce the issue*
> No rows are returned with `limit 10 offset 10`.
> {code:sql}
> $ beeline -e "select * from test limit 10, 10" --hiveconf 
> hive.fetch.task.conversion=none --hiveconf tez.grouping.min-size=1 --hiveconf 
> tez.grouping.max-size=1
> ...
> +----------+
> | test.id  |
> +----------+
> +----------+
> No rows selected (13.595 seconds)
> {code}
> *Expected behavior*
> It should return any 10 rows like this.
> {code:sql}
> $ beeline -e "select * from test limit 10, 10"
> ...
> +----------+
> | test.id  |
> +----------+
> | 11       |
> | 12       |
> | 13       |
> | 14       |
> | 15       |
> | 16       |
> | 17       |
> | 18       |
> | 19       |
> | 20       |
> +----------+
> 10 rows selected (0.175 seconds)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to