dir_current changes often, but is analyzed after significant changes, so effectively it's analyzed probably once an hour. The approximate ratio of rows with volume_id=5 to the whole number of rows doesn't change (i.e. volume_id=5 will appear roughly in 1.5M-2M rows, total is around 750-800M rows). dir_process is created once, analyzed and doesn't change later.
Assuming dir_process is the outer side in plans shown here has only duplicates - i.e. all rows have volume_id=5 in this example. Do you think there is anything that could be changed with the query itself? Any hints would be appreciated. śr., 17 mar 2021 o 20:47 Tom Lane <t...@sss.pgh.pa.us> napisał(a): > Marcin Gozdalik <goz...@gmail.com> writes: > > Sometimes Postgres will choose very inefficient plan, which involves > > looping many times over same rows, producing hundreds of millions or > > billions of rows: > > Yeah, this can happen if the outer side of the join has a lot of > duplicate rows. The query planner is aware of that effect and will > charge an increased cost when it applies, so I wonder if your > statistics for the tables being joined are up-to-date. > > regards, tom lane > -- Marcin Gozdalik