Hi all,
I'm having a good bit of trouble making production-ready a query I wouldn't have thought would be too challenging. Below is the relevant portions of my table, which is partitioned on values of part_key from 1-5. The number of rows is on the order of 10^7, and the number of unique "parent_id" values is on the order of 10^4. "tmstmp" is continuous (more or less). Neither parent_id nor tmstmp is nullable. The table receives a fair amount of INSERTS, and also a number of UPDATES. db=> \d a Table "public.a" Column | Type | Collation | Nullable | Default -------------------------+--------------------------+-----------+----------+---------------------------------------------- id | integer | | | nextval('a_id_seq'::regclass) parent_id | integer | | not null | tmstmp | timestamp with time zone | | not null | part_key | integer | | | Partition key: LIST (part_key) Number of partitions: 5 (Use \d+ to list them.) db=> \d a_partition1 Table "public.a_partition1" Column | Type | Collation | Nullable | Default -------------------------+--------------------------+-----------+----------+---------------------------------------------- id | integer | | | nextval('a_id_seq'::regclass) parent_id | integer | | not null | tmstmp | timestamp with time zone | | not null | part_key | integer | | | Partition of: a FOR VALUES IN (1) Indexes: "a_pk_idx1" UNIQUE, btree (id) "a_tmstmp_idx1" btree (tmstmp) "a_parent_id_idx1" btree (parent_id) Check constraints: "a_partition_check_part_key1" CHECK (part_key = 1) Foreign-key constraints: "a_partition_parent_id_fk_b_id1" FOREIGN KEY (parent_id) REFERENCES b(id) DEFERRABLE INITIALLY DEFERRED db=> SELECT relname, relpages, reltuples, relallvisible, relkind, relnatts, relhassubclass, reloptions, pg_table_size(oid) FROM pg_class WHERE relname like 'a_partition%'; relname | relpages | reltuples | relallvisible | relkind | relnatts | relhassubclass | reloptions | pg_table_size -----------------------------------+----------+-------------+---------------+---------+----------+----------------+------------+--------------- a_partition1 | 152146 | 873939 | 106863 | r | 26 | f | | 287197233152 a_partition2 | 669987 | 3.62268e+06 | 0 | r | 26 | f | | 161877745664 a_partition3 | 562069 | 2.94414e+06 | 213794 | r | 26 | f | | 132375994368 a_partition4 | 729880 | 3.95513e+06 | 69761 | r | 26 | f | | 188689047552 a_partition5 | 834132 | 4.9748e+06 | 52622 | r | 26 | f | | 218596630528 (5 rows) I'm interested in filtering by parent_id (an indexed foreign-key relationship, though I think it shouldn't matter in this context), then sorting the result by a timestamp field (also indexed). I only need 20 at a time. The naive query works well for a very small number of ids (up to 3): EXPLAIN (ANALYZE, BUFFERS) SELECT "a"."id" FROM "a" WHERE "a"."parent_id" IN ( 37066,41174,28843 ) ORDER BY "a"."tmstmp" DESC LIMIT 20; Limit (cost=9838.23..9838.28 rows=20 width=12) (actual time=13.307..13.307 rows=0 loops=1) Buffers: shared hit=29 read=16 -> Sort (cost=9838.23..9860.12 rows=8755 width=12) (actual time=13.306..13.306 rows=0 loops=1) Sort Key: a_partition1.tmstmp DESC Sort Method: quicksort Memory: 25kB Buffers: shared hit=29 read=16 -> Append (cost=0.43..9605.26 rows=8755 width=12) (actual time=13.302..13.302 rows=0 loops=1) Buffers: shared hit=29 read=16 -> Index Scan using a_parent_id_idx1 on a_partition1 (cost=0.43..4985.45 rows=4455 width=12) (actual time=4.007..4.007 rows=0 loops=1) Index Cond: (parent_id = ANY ('{37066,41174,28843}'::integer[])) Buffers: shared hit=6 read=3 -> Index Scan using a_parent_id_idx2 on a_partition2 (cost=0.43..1072.79 rows=956 width=12) (actual time=3.521..3.521 rows=0 loops=1) Index Cond: (parent_id = ANY ('{37066,41174,28843}'::integer[])) Buffers: shared hit=5 read=4 -> Index Scan using a_parent_id_idx3 on a_partition3 (cost=0.43..839.30 rows=899 width=12) (actual time=2.172..2.172 rows=0 loops=1) Index Cond: (parent_id = ANY ('{37066,41174,28843}'::integer[])) Buffers: shared hit=6 read=3 -> Index Scan using a_parent_id_idx4 on a_partition4 (cost=0.43..1041.11 rows=959 width=12) (actual time=1.822..1.822 rows=0 loops=1) Index Cond: (parent_id = ANY ('{37066,41174,28843}'::integer[])) Buffers: shared hit=6 read=3 -> Index Scan using a_parent_id_idx5 on a_partition5 (cost=0.43..1666.61 rows=1486 width=12) (actual time=1.777..1.777 rows=0 loops=1) Index Cond: (parent_id = ANY ('{37066,41174,28843}'::integer[])) Buffers: shared hit=6 read=3 Planning time: 0.559 ms Execution time: 13.343 ms (25 rows) But as soon as the number included in the filter goes up slightly (at about 4), the query plan changes the indexes it uses in a way that makes it impossibly slow. The below is only an EXPLAIN because the query times out: EXPLAIN SELECT "a"."id" FROM "a" WHERE "a"."parent_id" IN ( 34226,24506,40987,27162 ) ORDER BY "a"."tmstmp" DESC LIMIT 20; Limit (cost=2.22..8663.99 rows=20 width=12) -> Merge Append (cost=2.22..5055447.67 rows=11673 width=12) Sort Key: a_partition1.tmstmp DESC -> Index Scan Backward using a_tmstmp_idx1 on a_partition1 (cost=0.43..1665521.55 rows=5940 width=12) Filter: (parent_id = ANY ('{34226,24506,40987,27162}'::integer[])) -> Index Scan Backward using a_tmstmp_idx2 on a_partition2 (cost=0.43..880517.20 rows=1274 width=12) Filter: (parent_id = ANY ('{34226,24506,40987,27162}'::integer[])) -> Index Scan Backward using a_tmstmp_idx3 on a_partition3 (cost=0.43..639224.73 rows=1199 width=12) Filter: (parent_id = ANY ('{34226,24506,40987,27162}'::integer[])) -> Index Scan Backward using a_tmstmp_idx4 on a_partition4 (cost=0.43..852881.68 rows=1278 width=12) Filter: (parent_id = ANY ('{34226,24506,40987,27162}'::integer[])) -> Index Scan Backward using a_tmstmp_idx5 on a_partition5 (cost=0.43..1017137.75 rows=1982 width=12) Filter: (parent_id = ANY ('{34226,24506,40987,27162}'::integer[])) (13 rows) Something about the estimated row counts (this problem persisted after I tried ANALYZEing) forces usage of the tmstmp index and Merge Append (which seems wise) but also a filter condition on parent_id over an index condition, which is apparently prohibitively slow. I tried creating a multicolumn index like: CREATE INDEX "a_partition1_parent_and_tmstmp_idx" on "a_partition2" USING btree ("parent_id", "tmstmp" DESC); But this didn't help (it wasn't used). I also found that removing the LIMIT makes an acceptably fast (though different) query plan which uses the parent_id instead instead of the tmstmp one: EXPLAIN (ANALYZE, BUFFERS) SELECT "a"."id" FROM "a" WHERE "a"."parent_id" IN ( 17942,32006,36733,9698,27948 ) ORDER BY "a"."tmstmp" DESC; Sort (cost=17004.82..17041.29 rows=14591 width=12) (actual time=109.650..109.677 rows=127 loops=1) Sort Key: a_partition1.tmstmp DESC Sort Method: quicksort Memory: 30kB Buffers: shared hit=60 read=142 written=12 -> Append (cost=0.43..15995.64 rows=14591 width=12) (actual time=9.206..109.504 rows=127 loops=1) Buffers: shared hit=60 read=142 written=12 -> Index Scan using a_parent_id_idx1 on a_partition1 (cost=0.43..8301.03 rows=7425 width=12) (actual time=9.205..11.116 rows=1 loops=1) Index Cond: (parent_id = ANY ('{17942,32006,36733,9698,27948}'::integer[])) Buffers: shared hit=10 read=6 -> Index Scan using a_parent_id_idx2 on a_partition2 (cost=0.43..1786.52 rows=1593 width=12) (actual time=7.116..76.000 rows=98 loops=1) Index Cond: (parent_id = ANY ('{17942,32006,36733,9698,27948}'::integer[])) Buffers: shared hit=15 read=97 written=10 -> Index Scan using a_parent_id_idx3 on a_partition3 (cost=0.43..1397.74 rows=1498 width=12) (actual time=3.160..3.160 rows=0 loops=1) Index Cond: (parent_id = ANY ('{17942,32006,36733,9698,27948}'::integer[])) Buffers: shared hit=10 read=5 -> Index Scan using a_parent_id_idx4 on a_partition4 (cost=0.43..1733.77 rows=1598 width=12) (actual time=1.975..16.960 rows=28 loops=1) Index Cond: (parent_id = ANY ('{17942,32006,36733,9698,27948}'::integer[])) Buffers: shared hit=14 read=30 written=2 -> Index Scan using a_parent_id_idx5 on a_partition5 (cost=0.43..2776.58 rows=2477 width=12) (actual time=2.155..2.155 rows=0 loops=1) Index Cond: (parent_id = ANY ('{17942,32006,36733,9698,27948}'::integer[])) Buffers: shared hit=11 read=4 Planning time: 0.764 ms Execution time: 109.748 ms (23 rows) Eventually, I stumbled upon these links: - https://stackoverflow.com/a/27237698 - http://datamangling.com/2014/01/17/limit-1-and-performance-in-a-postgres-query/ And from these, was able to partially resolve my issue by attaching an extra ORDER BY: EXPLAIN (ANALYZE, BUFFERS) SELECT "a"."id" FROM "a" WHERE "a"."parent_id" IN (3909,26840,32181,22998,9632) ORDER BY "a"."tmstmp" DESC, "a"."id" DESC LIMIT 20; Limit (cost=16383.91..16383.96 rows=20 width=12) (actual time=29.804..29.808 rows=6 loops=1) Buffers: shared hit=39 read=42 -> Sort (cost=16383.91..16420.38 rows=14591 width=12) (actual time=29.803..29.804 rows=6 loops=1) Sort Key: a_partition1.tmstmp DESC, a_partition1.id DESC Sort Method: quicksort Memory: 25kB Buffers: shared hit=39 read=42 -> Append (cost=0.43..15995.64 rows=14591 width=12) (actual time=12.509..29.789 rows=6 loops=1) Buffers: shared hit=39 read=42 -> Index Scan using a_parent_id_idx1 on a_partition1 (cost=0.43..8301.03 rows=7425 width=12) (actual time=4.046..4.046 rows=0 loops=1) Index Cond: (parent_id = ANY ('{3909,26840,32181,22998,9632}'::integer[])) Buffers: shared hit=7 read=8 -> Index Scan using a_parent_id_idx2 on a_partition2 (cost=0.43..1786.52 rows=1593 width=12) (actual time=5.447..5.447 rows=0 loops=1) Index Cond: (parent_id = ANY ('{3909,26840,32181,22998,9632}'::integer[])) Buffers: shared hit=7 read=8 -> Index Scan using a_parent_id_idx3 on a_partition3 (cost=0.43..1397.74 rows=1498 width=12) (actual time=3.013..3.618 rows=1 loops=1) Index Cond: (parent_id = ANY ('{3909,26840,32181,22998,9632}'::integer[])) Buffers: shared hit=9 read=7 -> Index Scan using a_parent_id_idx4 on a_partition4 (cost=0.43..1733.77 rows=1598 width=12) (actual time=7.337..7.337 rows=0 loops=1) Index Cond: (parent_id = ANY ('{3909,26840,32181,22998,9632}'::integer[])) Buffers: shared hit=8 read=7 -> Index Scan using a_parent_id_idx5 on a_partition5 (cost=0.43..2776.58 rows=2477 width=12) (actual time=3.401..9.332 rows=5 loops=1) Index Cond: (parent_id = ANY ('{3909,26840,32181,22998,9632}'::integer[])) Buffers: shared hit=8 read=12 Planning time: 0.601 ms Execution time: 29.851 ms (25 rows) This query plan (which is the same as when LIMIT is removed) has been a good short term solution when the number of "parent_id"s I'm using is still relatively small, but unfortunately queries grow untenably slow as the number of "parent_id"s involved increases: EXPLAIN (ANALYZE, BUFFERS) SELECT "a"."id" FROM "a" WHERE "a"."parent_id" IN ( 49694,35150,48216,45743,49529,47239,40752,46735,7692,33855,25062,47821,6947,9735,12611,33190,9523,39200,50056,14912,9624,35552,35058,35365,47363,30358,853,21719,20568,16211,49372,6087,16111,38337,38891,10792,4556,27171,37731,38587,40402,26109,1477,36932,12191,5459,49307,21132,8697,4131,47869,49246,30447,23795,14389,19743,1369,15689,1820,7826,50623,2179,28090,46430,40117,32603,6886,35318,1026,6991,21360,50370,21721,33558,39162,49753,17974,45599,23256,8483,20864,33426,990,10068,38,13186,19338,43727,12319,50658,19243,6267,12498,12214,542,43339,5933,11376,49748,9335,22248,14763,13375,48554,50595,27006,43481,2805,49012,20000,849,27214,38576,36449,39854,13708,26841,3837,39971,5090,35680,49468,24738,27145,15380,40463,1250,22007,20962,8747,1383,45856,20025,35346,17121,3387,42172,5340,13004,11554,4607,16991,45034,20212,48020,19356,41234,30633,33657,1508,30660,35022,41408,19213,32748,48274,7335,50376,36496,1217,5701,37507,23706,25798,15126,663,7699,36665,18675,32723,17437,24179,15565,26280,50598,44490,44275,24679,34843,50,11995,50661,5002,47661,14505,32048,46696,10699,13889,8921,4567,7012,832,2401,5055,33306,9385,50169,49990,42236 ) ORDER BY "a"."tmstmp" DESC, "a"."id" DESC LIMIT 20; Limit (cost=641000.51..641000.56 rows=20 width=12) (actual time=80813.649..80813.661 rows=20 loops=1) Buffers: shared hit=11926 read=93553 dirtied=2063 written=27769 -> Sort (cost=641000.51..642534.60 rows=613634 width=12) (actual time=80813.647..80813.652 rows=20 loops=1) Sort Key: a_partition1.tmstmp DESC, a_partition1.id DESC Sort Method: top-N heapsort Memory: 25kB Buffers: shared hit=11926 read=93553 dirtied=2063 written=27769 -> Append (cost=0.43..624671.93 rows=613634 width=12) (actual time=2.244..80715.314 rows=104279 loops=1) Buffers: shared hit=11926 read=93553 dirtied=2063 written=27769 -> Index Scan using a_parent_id_idx1 on a_partition1 (cost=0.43..304861.89 rows=300407 width=12) (actual time=2.243..34766.236 rows=46309 loops=1) Index Cond: (parent_id = ANY ('{49694,35150,48216,45743,49529,47239,40752,46735,7692,33855,25062,47821,6947,9735,12611,33190,9523,39200,50056,14912,9624,35552,35058,35365,47363,30358,853,21719,20568,16211,49372,6087,16111,38337,38891,10792,4556,27171,37731,38587,40402,26109,1477,36932,12191,5459,49307,21132,8697,4131,47869,49246,30447,23795,14389,19743,1369,15689,1820,7826,50623,2179,28090,46430,40117,32603,6886,35318,1026,6991,21360,50370,21721,33558,39162,49753,17974,45599,23256,8483,20864,33426,990,10068,38,13186,19338,43727,12319,50658,19243,6267,12498,12214,542,43339,5933,11376,49748,9335,22248,14763,13375,48554,50595,27006,43481,2805,49012,20000,849,27214,38576,36449,39854,13708,26841,3837,39971,5090,35680,49468,24738,27145,15380,40463,1250,22007,20962,8747,1383,45856,20025,35346,17121,3387,42172,5340,13004,11554,4607,16991,45034,20212,48020,19356,41234,30633,33657,1508,30660,35022,41408,19213,32748,48274,7335,50376,36496,1217,5701,37507,23706,25798,15126,663,7699,36665,18675,32723,17437,24179,15565,26280,50598,44490,44275,24679,34843,50,11995,50661,5002,47661,14505,32048,46696,10699,13889,8921,4567,7012,832,2401,5055,33306,9385,50169,49990,42236}'::integer[])) Buffers: shared hit=9485 read=35740 dirtied=2033 written=12713 -> Index Scan using a_parent_id_idx2 on a_partition2 (cost=0.43..80641.75 rows=75349 width=12) (actual time=8.280..12640.675 rows=16628 loops=1) Index Cond: (parent_id = ANY ('{49694,35150,48216,45743,49529,47239,40752,46735,7692,33855,25062,47821,6947,9735,12611,33190,9523,39200,50056,14912,9624,35552,35058,35365,47363,30358,853,21719,20568,16211,49372,6087,16111,38337,38891,10792,4556,27171,37731,38587,40402,26109,1477,36932,12191,5459,49307,21132,8697,4131,47869,49246,30447,23795,14389,19743,1369,15689,1820,7826,50623,2179,28090,46430,40117,32603,6886,35318,1026,6991,21360,50370,21721,33558,39162,49753,17974,45599,23256,8483,20864,33426,990,10068,38,13186,19338,43727,12319,50658,19243,6267,12498,12214,542,43339,5933,11376,49748,9335,22248,14763,13375,48554,50595,27006,43481,2805,49012,20000,849,27214,38576,36449,39854,13708,26841,3837,39971,5090,35680,49468,24738,27145,15380,40463,1250,22007,20962,8747,1383,45856,20025,35346,17121,3387,42172,5340,13004,11554,4607,16991,45034,20212,48020,19356,41234,30633,33657,1508,30660,35022,41408,19213,32748,48274,7335,50376,36496,1217,5701,37507,23706,25798,15126,663,7699,36665,18675,32723,17437,24179,15565,26280,50598,44490,44275,24679,34843,50,11995,50661,5002,47661,14505,32048,46696,10699,13889,8921,4567,7012,832,2401,5055,33306,9385,50169,49990,42236}'::integer[])) Buffers: shared hit=558 read=16625 dirtied=8 written=6334 -> Index Scan using a_parent_id_idx3 on a_partition3 (cost=0.43..57551.91 rows=65008 width=12) (actual time=3.721..13759.664 rows=12973 loops=1) Index Cond: (parent_id = ANY ('{49694,35150,48216,45743,49529,47239,40752,46735,7692,33855,25062,47821,6947,9735,12611,33190,9523,39200,50056,14912,9624,35552,35058,35365,47363,30358,853,21719,20568,16211,49372,6087,16111,38337,38891,10792,4556,27171,37731,38587,40402,26109,1477,36932,12191,5459,49307,21132,8697,4131,47869,49246,30447,23795,14389,19743,1369,15689,1820,7826,50623,2179,28090,46430,40117,32603,6886,35318,1026,6991,21360,50370,21721,33558,39162,49753,17974,45599,23256,8483,20864,33426,990,10068,38,13186,19338,43727,12319,50658,19243,6267,12498,12214,542,43339,5933,11376,49748,9335,22248,14763,13375,48554,50595,27006,43481,2805,49012,20000,849,27214,38576,36449,39854,13708,26841,3837,39971,5090,35680,49468,24738,27145,15380,40463,1250,22007,20962,8747,1383,45856,20025,35346,17121,3387,42172,5340,13004,11554,4607,16991,45034,20212,48020,19356,41234,30633,33657,1508,30660,35022,41408,19213,32748,48274,7335,50376,36496,1217,5701,37507,23706,25798,15126,663,7699,36665,18675,32723,17437,24179,15565,26280,50598,44490,44275,24679,34843,50,11995,50661,5002,47661,14505,32048,46696,10699,13889,8921,4567,7012,832,2401,5055,33306,9385,50169,49990,42236}'::integer[])) Buffers: shared hit=592 read=12943 dirtied=2 written=3136 -> Index Scan using a_parent_id_idx4 on a_partition4 (cost=0.43..70062.42 rows=67402 width=12) (actual time=5.999..5949.033 rows=7476 loops=1) Index Cond: (parent_id = ANY ('{49694,35150,48216,45743,49529,47239,40752,46735,7692,33855,25062,47821,6947,9735,12611,33190,9523,39200,50056,14912,9624,35552,35058,35365,47363,30358,853,21719,20568,16211,49372,6087,16111,38337,38891,10792,4556,27171,37731,38587,40402,26109,1477,36932,12191,5459,49307,21132,8697,4131,47869,49246,30447,23795,14389,19743,1369,15689,1820,7826,50623,2179,28090,46430,40117,32603,6886,35318,1026,6991,21360,50370,21721,33558,39162,49753,17974,45599,23256,8483,20864,33426,990,10068,38,13186,19338,43727,12319,50658,19243,6267,12498,12214,542,43339,5933,11376,49748,9335,22248,14763,13375,48554,50595,27006,43481,2805,49012,20000,849,27214,38576,36449,39854,13708,26841,3837,39971,5090,35680,49468,24738,27145,15380,40463,1250,22007,20962,8747,1383,45856,20025,35346,17121,3387,42172,5340,13004,11554,4607,16991,45034,20212,48020,19356,41234,30633,33657,1508,30660,35022,41408,19213,32748,48274,7335,50376,36496,1217,5701,37507,23706,25798,15126,663,7699,36665,18675,32723,17437,24179,15565,26280,50598,44490,44275,24679,34843,50,11995,50661,5002,47661,14505,32048,46696,10699,13889,8921,4567,7012,832,2401,5055,33306,9385,50169,49990,42236}'::integer[])) Buffers: shared hit=449 read=7665 dirtied=10 written=1242 -> Index Scan using a_parent_id_idx5 on a_partition5 (cost=0.43..111553.96 rows=105468 width=12) (actual time=7.806..13519.564 rows=20893 loops=1) Index Cond: (parent_id = ANY ('{49694,35150,48216,45743,49529,47239,40752,46735,7692,33855,25062,47821,6947,9735,12611,33190,9523,39200,50056,14912,9624,35552,35058,35365,47363,30358,853,21719,20568,16211,49372,6087,16111,38337,38891,10792,4556,27171,37731,38587,40402,26109,1477,36932,12191,5459,49307,21132,8697,4131,47869,49246,30447,23795,14389,19743,1369,15689,1820,7826,50623,2179,28090,46430,40117,32603,6886,35318,1026,6991,21360,50370,21721,33558,39162,49753,17974,45599,23256,8483,20864,33426,990,10068,38,13186,19338,43727,12319,50658,19243,6267,12498,12214,542,43339,5933,11376,49748,9335,22248,14763,13375,48554,50595,27006,43481,2805,49012,20000,849,27214,38576,36449,39854,13708,26841,3837,39971,5090,35680,49468,24738,27145,15380,40463,1250,22007,20962,8747,1383,45856,20025,35346,17121,3387,42172,5340,13004,11554,4607,16991,45034,20212,48020,19356,41234,30633,33657,1508,30660,35022,41408,19213,32748,48274,7335,50376,36496,1217,5701,37507,23706,25798,15126,663,7699,36665,18675,32723,17437,24179,15565,26280,50598,44490,44275,24679,34843,50,11995,50661,5002,47661,14505,32048,46696,10699,13889,8921,4567,7012,832,2401,5055,33306,9385,50169,49990,42236}'::integer[])) Buffers: shared hit=842 read=20580 dirtied=10 written=4344 Planning time: 8.366 ms Execution time: 80813.714 ms (25 rows) Though it still finishes, I'd really like to speed this up, especially because there are many situations where I will be using many more than the 200 "parent_id"s above. I'd be very grateful for help with one or both of these questions: 1) Why is adding an unnecessary (from the perspective of result correctness) ORDER BY valuable for forcing the parent_id index usage, and does that indicate that there is something wrong with my table/indexes/statistics/etc.? 2) Is there any way I can improve my query time when there are many "parent_id"s involved? I seem to only be able to get the query plan to use at most one of the parent_id index and the tmstmp index at a time. Perhaps the correct multicolumn index would help? A couple of notes (happy to be scolded for either of these things if they're problematic): 1) I've masked these queries somewhat for privacy reasons and cleaned up the table "a" of some extra fields and indexes. 2) I'm using different sets of "parent_id"s to try to overcome caching so the BUFFERS aspect can be as accurate as possible. As for other maybe relevant information: - version: 10.3 - hardware is AWS so probably not huge issue. - work_mem is quite high (order of GBs) Thanks in advance for your help!