On Friday, June 23rd, 2023 at 2:52 PM, Laurenz Albe <laurenz.a...@cybertec.at> 
wrote:


> 

> 

> On Fri, 2023-06-23 at 12:08 +0000, Nicolas Seinlet wrote:
> 

> > we faced an issue with a select query on a relatively large table on our 
> > database.
> > The query involves one single table. The table has more than 10 million 
> > records.
> > It's mainly composed of varchar fields, have a primary key (id) of type 
> > serial,
> > and when records of this table are shown to users, they are sorted users 2 
> > fields,
> > display_name (varchar) and id (the primary key). Because this table is 
> > heavily used
> > in various contexts in our application, we have multiple indexes on it. 
> > Among other
> > index, we have gin index on some fields of the table.
> > 

> > The btree index res_partner_displayname_id_idx have been added lately and 
> > perfectly
> > match a criteria (where active) and sorting (display_name, id) we have in 
> > quite all
> > our queries on this table.
> > 

> > The query that cause the issue is this one:
> > SELECT "res_partner"."id"
> > FROM "res_partner"
> > WHERE (("res_partner"."active" = true) AND
> > (
> > (
> > (
> > ((unaccent("res_partner"."display_name"::text) ilike unaccent('%nse%'))
> > OR (unaccent("res_partner"."email"::text) ilike unaccent('%nse%')))
> > OR (unaccent("res_partner"."ref"::text) ilike unaccent('%nse)%')))
> > OR (unaccent("res_partner"."vat"::text) ilike unaccent('%nse%')))
> > OR (unaccent("res_partner"."company_registry"::text) ilike 
> > unaccent('%nse)%'))))
> > 

> > AND ((("res_partner"."type" != 'private') OR "res_partner"."type" IS NULL) 
> > OR "res_partner"."type" IS NULL )
> > 

> > ORDER BY "res_partner"."display_name" ,"res_partner"."id"
> > LIMIT 100
> > 

> > We have the common criteria (active=true), the common sorting, a limit, and 
> > a search
> > on various fields. The fields on which we're searching with criteria like 
> > '% whatever%' are gin indexed.
> > 

> > Here is the query plan:
> > Limit (cost=0.56..10703.36 rows=100 width=25) (actual 
> > time=56383.794..86509.036 rows=1 loops=1)
> > Output: id, display_name
> > Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
> > -> Index Scan using res_partner_displayname_id_idx on public.res_partner 
> > (cost=0.56..1200212.37 rows=11214 width=25) (actual 
> > time=56383.793..86509.022 rows=1 loops=1)
> > Output: id, display_name
> > Filter: ((((res_partner.type)::text <> 'private'::text) OR 
> > (res_partner.type IS NULL) OR (res_partner.type IS NULL)) AND 
> > ((unaccent((res_partner.display_name)::text) ~~* '%nse%'::text) OR
> > (unaccent((res_partner.email)::text) ~~
> > * '%nse%'::text) OR (unaccent((res_partner.ref)::text) ~~* '%nse%'::text) 
> > OR (unaccent((res_partner.vat)::text) ~~* '%nse%'::text) OR 
> > (unaccent((res_partner.company_registry)::text) ~~*
> > '%nse%'::text)))
> > Rows Removed by Filter: 6226870
> > Buffers: shared hit=4322296 read=1608998 dirtied=1 written=1247
> > Planning Time: 0.891 ms
> > Execution Time: 86509.070 ms
> > (10 rows)
> > 

> > It's not using our gin index at all, but the btree one.
> 

> 

> The problem is that PostgreSQL estimates that the index scan will return 11214
> rows, when it is actually one. This makes the plan to scan the table using
> an index that matches the ORDER BY clause appealing: we might find 100 rows
> quickly and avoid a sort.
> 

> You can try to improve the estimates with more detailed statistics,
> but if that doesn't do the job, you can modify the ORDER BY clause so
> that it cannot use the bad index:
> 

> ORDER BY res_partner.display_name ,res_partner.id + 0
> 

> Yours,
> Laurenz Albe

Hello,

First of all, thanks, this solves the issue for the given query.

Some more questions then,
> we might find 100 rows quickly
The cost estimate for 11214 rows is 1200212.37
If I look at the other plan, none of the estimated cost reach such levels (~2k 
for indexes + 1k for the BitmapOr, 3k for Bitmap Heap Scan, and finally 1k for 
sort and limit), roughly 7k

And that's part of what I didn't understand. How is the first cost estimated? 
If we divide by 110 the cost to go from 11k records to 100, it's still ~10k, 
more than the other plan.

Thanks again,

Nicolas.

Attachment: publickey - nicolas@seinlet.com - 0xCAEB7FAF.asc
Description: application/pgp-keys

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to