[ https://issues.apache.org/jira/browse/HIVE-25856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Stamatis Zampetakis resolved HIVE-25856. ---------------------------------------- Fix Version/s: 4.0.0 Resolution: Fixed Fixed in [https://github.com/apache/hive/commit/587c698fa25ca6da46d9c02e4199689426fec40f.] Thanks for the review [~kkasa] ! > Intermittent null ordering in plans of queries with GROUP BY and LIMIT > ---------------------------------------------------------------------- > > Key: HIVE-25856 > URL: https://issues.apache.org/jira/browse/HIVE-25856 > Project: Hive > Issue Type: Bug > Components: CBO > Reporter: Stamatis Zampetakis > Assignee: Stamatis Zampetakis > Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > Time Spent: 50m > Remaining Estimate: 0h > > {code:sql} > CREATE TABLE person (id INTEGER, country STRING); > EXPLAIN CBO SELECT country, count(1) FROM person GROUP BY country LIMIT 5; > {code} > The {{EXPLAIN}} query produces a slightly different plan (ordering of nulls) > from one execution to another. > {noformat} > CBO PLAN: > HiveSortLimit(sort0=[$1], dir0=[ASC-nulls-first], fetch=[5]) > HiveProject(country=[$0], $f1=[$1]) > HiveAggregate(group=[{1}], agg#0=[count()]) > HiveTableScan(table=[[default, person]], table:alias=[person]) > {noformat} > {noformat} > CBO PLAN: > HiveSortLimit(sort0=[$1], dir0=[ASC], fetch=[5]) > HiveProject(country=[$0], $f1=[$1]) > HiveAggregate(group=[{1}], agg#0=[count()]) > HiveTableScan(table=[[default, person]], table:alias=[person]) > {noformat} > This is unlikely to cause wrong results cause most aggregate functions (not > all) do not return nulls thus null ordering doesn't matter much but it can > lead to other problems such as: > * intermittent CI failures > * query/plan caching > I bumped into this problem after investigating test failures in CI. The > following query in > [offset_limit_ppd_optimizer.q|https://github.com/apache/hive/blob/9cfdac44975bf38193de7449fc21b9536109daea/ql/src/test/queries/clientpositive/offset_limit_ppd_optimizer.q] > returns different plan when it runs individually and when it runs along with > some other qtest files. > {code:sql} > explain > select * from > (select key, count(1) from src group by key order by key limit 10,20) subq > join > (select key, count(1) from src group by key limit 20,20) subq2 > on subq.key=subq2.key limit 3,5; > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001)