On Mon, Dec 19, 2016 at 12:05 PM, Christophe Escobar < christophe.e...@gmail.com> wrote:
> Hi, > > I am using PostgreSQL 9.4.8 on x86_64-unknown-linux-gnu, compiled by gcc > (Debian 4.9.2-10) 4.9.2, 64-bit. > > I have a notification table with about ~45 000 000 rows. > > I have some performance issues when trying to fetch rows from the table > with a specific query, I suspect the planner to choose the wrong index > because of the limit. > The query look like this: SELECT * FROM notifications WHERE bucket_id IN > (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC > LIMIT 20; > It returns 0 rows for this example. > > The table looks like this with the indices: > > Table "notifications" > Column | Type | > Modifiers > ----------------------------+-----------------------------+- > ----------------------------------------------------------- > id | integer | not null > default nextval('notifications_id_seq'::regclass) > account_id | integer | > version_id | integer | > item_id | integer | > type | character varying(255) | > created_at | timestamp without time zone | > updated_at | timestamp without time zone | > meta_data | text | > bucket_id | integer | > Indexes: > "notifications_pkey" PRIMARY KEY, btree (id) > "index_notifications_on_account_id" btree (account_id) > "index_notifications_on_bucket_id" btree (bucket_id) > "index_notifications_on_item_id" btree (item_id) > "index_notifications_on_created_at_and_bucket_id" btree (created_at, > bucket_id) > "index_notifications_on_type_and_bucket_id" btree (type, bucket_id) > "index_notifications_on_version_id" btree (version_id) > > Before testing, I have run a VACUUM ANALYZE, and here are the statistics > on the indices: > > SELECT relname, relkind, reltuples, relpages > FROM pg_class > WHERE relname LIKE '%notifications%'; > relname | relkind | reltuples > | relpages > ---------------------------------------------------+-------- > -+-------------+---------- > notifications_pkey | i | 4.55221e+07 > | 124820 > index_notifications_on_account_id | i | > 4.55221e+07 | 124820 > index_notifications_on_bucket_id | i | > 4.55221e+07 | 124819 > index_notifications_on_item_id | i | > 4.55221e+07 | 124821 > index_notifications_on_version_id | i | > 4.55221e+07 | 124821 > index_notifications_on_created_at_and_bucket_id | i | > 4.55221e+07 | 175281 > index_notifications_on_type_and_bucket_id | i | > 4.55221e+07 | 188281 > notifications_id_seq | S | 1 > | 1 > notifications | r | 4.55225e+07 > | 566412 > > > I tried three different EXPLAIN ANALYZE, on a subset of my table (with the > entire table, I have yet to see what is the total duration of the query > when using LIMIT 20, but it takes more than 5 minutes which is not > acceptable for my use case). > > ** Without limit ** > > EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN > (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC; > > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------------------------ > Sort (cost=7474.92..7480.08 rows=2061 width=187) (actual > time=0.149..0.149 rows=0 loops=1) > Sort Key: created_at > Sort Method: quicksort Memory: 25kB > -> Bitmap Heap Scan on notifications (cost=71.68..7361.47 rows=2061 > width=187) (actual time=0.136..0.136 rows=0 loops=1) > Recheck Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND > (bucket_id = ANY ('{30231,30230,30104}'::integer[]))) > -> Bitmap Index Scan on index_notifications_on_type_and_bucket_id > (cost=0.00..71.16 rows=2061 width=0) (actual time=0.135..0.135 rows=0 > loops=1) > Index Cond: (((type)::text = ANY ('{foo,bar}'::text[])) AND > (bucket_id = ANY ('{30231,30230,30104}'::integer[]))) > Planning time: 0.715 ms > Execution time: 0.198 ms > > ** With LIMIT 20 ** > > EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN > (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC > limit 20; > > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------------------------------------------------------------ > Limit (cost=0.43..3341.84 rows=20 width=187) (actual > time=60133.701..60133.701 rows=0 loops=1) > -> Index Scan Backward using > index_notifications_on_created_at_and_bucket_id > on notifications (cost=0.43..344332.66 rows=2061 width=187) (actual > time=60133.695..60133.695 rows=0 loops=1) > Filter: (((type)::text = ANY ('{foo,bar}'::text[])) AND > (bucket_id = ANY ('{30231,30230,30104}'::integer[]))) > Rows Removed by Filter: 3441510 > Planning time: 1.034 ms > Execution time: 60133.740 ms > > ** With limit 50 ** > > EXPLAIN ANALYZE SELECT * FROM notifications WHERE bucket_id IN > (30231,30230,30104) AND type IN ('foo', 'bar') ORDER BY created_at DESC > limit 50; > > ------------------------------------------------------------ > ------------------------------------------------------------ > ------------------------------------------ > Limit (cost=7429.94..7430.06 rows=50 width=187) (actual > time=0.111..0.111 rows=0 loops=1) > -> Sort (cost=7429.94..7435.09 rows=2061 width=187) (actual > time=0.110..0.110 rows=0 loops=1) > Sort Key: created_at > Sort Method: quicksort Memory: 25kB > -> Bitmap Heap Scan on notifications (cost=71.68..7361.47 > rows=2061 width=187) (actual time=0.107..0.107 rows=0 loops=1) > Recheck Cond: (((type)::text = ANY ('{foo,bar}'::text[])) > AND (bucket_id = ANY ('{30231,30230,30104}'::integer[]))) > -> Bitmap Index Scan on > index_notifications_on_type_and_bucket_id > (cost=0.00..71.16 rows=2061 width=0) (actual time=0.105..0.105 rows=0 > loops=1) > Index Cond: (((type)::text = ANY > ('{foo,bar}'::text[])) AND (bucket_id = ANY ('{30231,30230,30104}'::intege > r[]))) > Planning time: 0.151 ms > Execution time: 0.139 ms > > > As you can see, when I have the LIMIT 20, the execution time takes around > 1 minutes (on a very small subset of the entire table). > Actually I have tried different LIMIT, and when the LIMIT is <= 45, it > will use the index scan backward. > > Removing the index 'index_notifications_on_created_at_and_bucket_id' may > prevent the planner from choosing the index scan backward for this query, > but this index is used for other querying on that table... > > 1) Why is the planner changing index scanning at the threshold of 45 for > the LIMIT ? Why not 50 ? 100 ? I may take the solution in my application to > have a LIMIT > 45 in order to prevent the performance issue, but am I sure > that this threshold will always be the same ? > > 2) Is it possible for a specific query to force the planner on choosing a > given index or preventing it from choosing one ? > > What kind of other options do I have to solve this performance issue ? > > Thanks in advance for any help, > > Regards, > > -- > Christophe Escobar > *You can temporarily disable index scanning for a session with* *SET enable_indexscan = off;* *and/orSET enable_indexonlyscan = off;* -- *Melvin Davidson* I reserve the right to fantasize. Whether or not you wish to share my fantasy is entirely up to you.