On Fri, Aug 14, 2020 at 03:40:40PM -0700, Ken Tanzer wrote:
> On Fri, Aug 14, 2020 at 3:04 PM Justin Pryzby <pry...@telsasoft.com> wrote:
> > Due to the over-estimated rowcount, the planner believes that (more) rows
> > will be output (sooner) than they actually are:
> >
> >    ->  Nested Loop Semi Join  (cost=47.11..31367302.81 ROWS=611586
> > width=336) (actual time=47.098..97236.123 ROWS=25 loops=1)
> >
> > So it thinks there's something to be saved/gained by using a plan that has a
> > low startup cost.  But instead, it ends up running for a substantial 
> > fraction
> > of the total (estimated) cost.
>
> Got it.  Is there any way to address this other than re-writing the
> query?  (Statistics? Or something else?)

A usual trick is to change to write something like:
|ORDER BY added_at + '0 seconds'::interval"
which means an index scan on added_at doesn't match the ORDER BY exactly (the
planner isn't smart enough to know better).

You could try to address the misestimate, which is probably a good idea anyway.

Or make it believe that it's going to work harder to return those 25 rows.
Maybe you could change the "25" to a bind parameter, like LIMIT $N, or however
the ORM wants it.

You could change the index to a BRIN index, which doesn't help with ORDER BY
(but will also affect other queries which want to ORDER BY).

Maybe you could add an index on this expression and ANALYZE the table.  I think
it might help the estimate.  Or it might totally change the shape of the plan,
like allowing indexonly scan (which would probably require VACUUM)..
       Group Key: CASE WHEN (tbl_reference.to_table = 'client'::name) THEN 
tbl_reference.from_id WHEN (tbl_reference.from_table = 'client'::name) THEN 
tbl_reference.to_id ELSE NULL::integer END

I know you didn't want to rewrite the query, but it looks to me like adjusting
the schema or query might be desirable.

agency=> EXPLAIN (ANALYZE,VERBOSE,BUFFERS,TIMING) SELECT * FROM Log WHERE 
log_id IN (SELECT CASE WHEN to_table='client' THEN from_id WHEN 
from_table='client' THEN to_id END FROM reference WHERE ((from_id_field = 
E'client_id'
        AND from_id =  E'34918'
        AND from_table =  E'client'
        AND to_table =  E'log'
        )
        OR  (to_id_field =  E'client_id'
        AND to_id =  E'34918'
        AND to_table =  E'client'
        AND from_table =  E'log'
        ))) ORDER BY added_at DESC;

To me that smells like a UNION (maybe not UNION ALL):
SELECT FROM log WHERE EXISTS (SELECT 1 FROM reference ref WHERE 
log.log_id=ref.from_id AND to_table='client' AND from_id_field='client_id' AND 
from_id=$1 AND from_table='client' AND to_table='log')
UNION
SELECT FROM log WHERE EXISTS (SELECT 1 FROM reference ref WHERE 
log.log_id=ref.to_id AND from_table='client' AND to_id_field='client_id' AND 
to_id=$1 AND to_table='client' AND from_table='log')

I guess you might know that various indexes are redundant:

    "index_tbl_log_log_type_code" btree (log_type_code)
    "tbl_log_log_type_code" btree (log_type_code)
    "tbl_log_test2" btree (log_type_code, added_at)

    "tbl_log_added_at" btree (added_at)
    "tbl_log_test" btree (added_at, log_type_code)

    "index_tbl_reference_to_table" btree (to_table)
    "index_tbl_reference_to_table_id" btree (to_table, to_id)

    "index_tbl_reference_is_deleted" btree (is_deleted)
=> Maybe that would be better as a WHERE NOT is_deleted clause on various 
indexes (?)

-- 
Justin


Reply via email to