[ 
https://issues.apache.org/jira/browse/IGNITE-23968?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Konstantin Orlov updated IGNITE-23968:
--------------------------------------
    Fix Version/s: 3.1

> Sql. Improve row count estimation for joins
> -------------------------------------------
>
>                 Key: IGNITE-23968
>                 URL: https://issues.apache.org/jira/browse/IGNITE-23968
>             Project: Ignite
>          Issue Type: Improvement
>          Components: sql
>            Reporter: Konstantin Orlov
>            Assignee: Konstantin Orlov
>            Priority: Major
>              Labels: ignite-3
>             Fix For: 3.1
>
>          Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Current rows count estimation significantly underestimates result set of 
> joins, which causes optimizer to pick up suboptimal plans in certain cases.
> For example, let's have a look at query below:
> {code:java}
> // Some comments here
> public String getFoo()
> {
> create table CATALOG_RETURNS
> (
>     <...>
>     constraint CATALOG_RETURNS_PK
>         primary key (CR_ITEM_SK, CR_ORDER_NUMBER)
> );
> create table CATALOG_SALES
> (
>     <...>
>     constraint CATALOG_RETURNS_PK
>         primary key (CS_ITEM_SK, CS_ORDER_NUMBER)
> );
> explain plan for
> select *
>   from catalog_sales
>       ,catalog_returns
>   where cs_item_sk = cr_item_sk
>     and cs_order_number = cr_order_number;
> --------------
> HashJoin(...): rowcount = 22500.0
>   TableScan(table=[[PUBLIC, CATALOG_RETURNS]]): rowcount = 1000000.0
>   Exchange(...): <...>
>     TableScan(table=[[PUBLIC, CATALOG_SALES]]): rowcount = 1000000.0
> }
> {code}
> When joining two tables with 1kk rows each by primary key, estimated result 
> set size is only 22.5k rows. Things get even worse when there is several 
> joins with dimensions tables: after a few joins estimated result set is close 
> to 1 (a single row). 
> Given that we don't support foreign keys, as well as we don't have proper 
> statistics yet, we need to introduce heuristics to improve row count 
> estimation for joins.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to