On Mon, 2021-12-06 at 18:20 +0100, Francisco Olarte wrote: > > Can you post an explain analyze? To me it seems like the planner > thinks shipment_import_id is randomly distributed and the table is > well correlated with it's PK, so scanning it for the first id > should > be fast.
#explain analyze SELECT "shipment_import_records".* FROM shipment_import_records" WHERE shipment_import_records"."shipment_import_id" = 5090609 ORDER BY "shipment_import_records"."id" ASC LIMIT 1; ------------------------------------------------------------------ --- ------------------------------------------------------------------ --- ------------------------------------- Limit (cost=0.44..873.08 rows=1 width=243) (actual time=31689.725..31689.726 rows=1 loops=1) -> Index Scan using shipment_import_records_pkey on shipment_import_records (cost=0.44..5122405.71 rows=5870 width=243) (actual time=31689.723..31689.724 rows=1 loops=1) Filter: (shipment_import_id = 5090609) Rows Removed by Filter: 28710802 Planning Time: 0.994 ms Execution Time: 31689.744 ms (6 rows) The biggest one (but yes "earlier"): # explain analyze SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE "shipment_import_records"."shipment_import_id" = 1247888 ORDER BY "shipment_import_records"."id" ASC LIMIT 1; QUERY PLAN --------------------------------------------------------------------- --------------------------------------------------------------------- ---------------------------- -------- Limit (cost=0.44..426.59 rows=1 width=243) (actual time=8007.069..8007.070 rows=1 loops=1) -> Index Scan using shipment_import_records_pkey on shipment_import_records (cost=0.44..5126628.40 rows=12030 width=243) (actual time=8007.068..8007.068 rows=1 l oops=1) Filter: (shipment_import_id = 1247888) Rows Removed by Filter: 10929193 Planning Time: 0.584 ms Execution Time: 8007.086 ms (6 rows) And the smallest/latest, which actually uses the "right" index: # explain analyze SELECT "shipment_import_records".* FROM "shipment_import_records" WHERE "shipment_import_records"."shipment_import_id" = 5116174 ORDER BY "shipment_import_records"."id" ASC LIMIT 1; QUERY PLAN --------------------------------------------------------------------- --------------------------------------------------------------------- ---------------------------- ------------------------ Limit (cost=145.44..145.44 rows=1 width=243) (actual time=0.018..0.018 rows=1 loops=1) -> Sort (cost=145.44..145.64 rows=79 width=243) (actual time=0.017..0.018 rows=1 loops=1) Sort Key: id Sort Method: quicksort Memory: 26kB -> Index Scan using index_shipment_import_records_on_shipment_import_id on shipment_import_records (cost=0.44..145.05 rows=79 width=243) (actual time=0.013 ..0.014 rows=1 loops=1) Index Cond: (shipment_import_id = 5116174) Planning Time: 0.104 ms Execution Time: 0.032 ms (8 rows) > > But from the names of the field you may have correlation between > shipment_import_id and id hidden somewhere ( like they are two > serial > growing together, you query for the latest shipment ids and it > scans > all the table ). An explain analyze should show that ( or three, > one > for that shipment import id, one for 1, one for a really big one ) This is definitely the case. And we are generally looking for newer data for most operations. Thanks for looking at it.