[ https://issues.apache.org/jira/browse/IGNITE-23968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Konstantin Orlov updated IGNITE-23968: -------------------------------------- Fix Version/s: 3.1 > Sql. Improve row count estimation for joins > ------------------------------------------- > > Key: IGNITE-23968 > URL: https://issues.apache.org/jira/browse/IGNITE-23968 > Project: Ignite > Issue Type: Improvement > Components: sql > Reporter: Konstantin Orlov > Assignee: Konstantin Orlov > Priority: Major > Labels: ignite-3 > Fix For: 3.1 > > Time Spent: 1.5h > Remaining Estimate: 0h > > Current rows count estimation significantly underestimates result set of > joins, which causes optimizer to pick up suboptimal plans in certain cases. > For example, let's have a look at query below: > {code:java} > // Some comments here > public String getFoo() > { > create table CATALOG_RETURNS > ( > <...> > constraint CATALOG_RETURNS_PK > primary key (CR_ITEM_SK, CR_ORDER_NUMBER) > ); > create table CATALOG_SALES > ( > <...> > constraint CATALOG_RETURNS_PK > primary key (CS_ITEM_SK, CS_ORDER_NUMBER) > ); > explain plan for > select * > from catalog_sales > ,catalog_returns > where cs_item_sk = cr_item_sk > and cs_order_number = cr_order_number; > -------------- > HashJoin(...): rowcount = 22500.0 > TableScan(table=[[PUBLIC, CATALOG_RETURNS]]): rowcount = 1000000.0 > Exchange(...): <...> > TableScan(table=[[PUBLIC, CATALOG_SALES]]): rowcount = 1000000.0 > } > {code} > When joining two tables with 1kk rows each by primary key, estimated result > set size is only 22.5k rows. Things get even worse when there is several > joins with dimensions tables: after a few joins estimated result set is close > to 1 (a single row). > Given that we don't support foreign keys, as well as we don't have proper > statistics yet, we need to introduce heuristics to improve row count > estimation for joins. -- This message was sent by Atlassian Jira (v8.20.10#820010)