[ https://issues.apache.org/jira/browse/HIVE-27480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
ASF GitHub Bot updated HIVE-27480: ---------------------------------- Labels: pull-request-available (was: ) > OFFSET without ORDER BY generates wrong results > ----------------------------------------------- > > Key: HIVE-27480 > URL: https://issues.apache.org/jira/browse/HIVE-27480 > Project: Hive > Issue Type: Bug > Components: Query Planning > Affects Versions: 2.4.0, 3.1.3, 4.0.0-alpha-2 > Reporter: okumin > Assignee: okumin > Priority: Major > Labels: pull-request-available > > Without ORDER BY, Hive doesn't add an additional single reducer, and it > pushes OFFSET and LIMIT to the original vertex. It can generate incorrect > results because OFFSET should be counted globally unlike LIMIT. > We would make the following changes. > - To fix the incorrect behavior > - To add a new `hive.strict.checks.*` to prevent this usage. Mostly, OFFSET > without ORDER BY is meaningless > We can reproduce the issue in the following steps. > *Prepare test data* > The following SQLs generate a test table with multiple files. > {code:sql} > $ beeline -e " > drop table test; > create table test(id int); > insert into test values (1), (2), (3), (4), (5), (6), (7), (8), (9), (10); > insert into test values (11), (12), (13), (14), (15), (16), (17), (18), > (19), (20); > insert into test values (21), (22), (23), (24), (25), (26), (27), (28), > (29), (30); > " > {code} > *Reproduce the issue* > No rows are returned with `limit 10 offset 10`. > {code:sql} > $ beeline -e "select * from test limit 10, 10" --hiveconf > hive.fetch.task.conversion=none --hiveconf tez.grouping.min-size=1 --hiveconf > tez.grouping.max-size=1 > ... > +----------+ > | test.id | > +----------+ > +----------+ > No rows selected (13.595 seconds) > {code} > *Expected behavior* > It should return any 10 rows like this. > {code:sql} > $ beeline -e "select * from test limit 10, 10" > ... > +----------+ > | test.id | > +----------+ > | 11 | > | 12 | > | 13 | > | 14 | > | 15 | > | 16 | > | 17 | > | 18 | > | 19 | > | 20 | > +----------+ > 10 rows selected (0.175 seconds) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)