Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT

Alan Hodgson Mon, 06 Dec 2021 09:58:19 -0800

On Mon, 2021-12-06 at 18:20 +0100, Francisco Olarte wrote:
> 
> Can you post an explain analyze? To me it seems like the planner
> thinks shipment_import_id is randomly distributed and the table is
> well correlated with it's PK, so scanning it for the first id
> should
> be fast.


   #explain analyze SELECT "shipment_import_records".* FROM 
   shipment_import_records" WHERE 
   shipment_import_records"."shipment_import_id" = 5090609 ORDER BY
   "shipment_import_records"."id" ASC LIMIT 1;
   ------------------------------------------------------------------
---
   ------------------------------------------------------------------
---
   -------------------------------------
    Limit  (cost=0.44..873.08 rows=1 width=243) (actual
   time=31689.725..31689.726 rows=1 loops=1)
      ->  Index Scan using shipment_import_records_pkey on
   shipment_import_records  (cost=0.44..5122405.71 rows=5870
width=243)
   (actual time=31689.723..31689.724 rows=1 loops=1)
            Filter: (shipment_import_id = 5090609)
            Rows Removed by Filter: 28710802
    Planning Time: 0.994 ms
    Execution Time: 31689.744 ms
   (6 rows)


      The biggest one (but yes "earlier"):

   # explain analyze SELECT "shipment_import_records".* FROM
   "shipment_import_records" WHERE
   "shipment_import_records"."shipment_import_id" = 1247888 ORDER BY 
   "shipment_import_records"."id" ASC LIMIT 1; 
                                                                        
               QUERY PLAN
                                                                        
       
           
   ---------------------------------------------------------------------
   ---------------------------------------------------------------------
   ----------------------------
   --------
   Limit  (cost=0.44..426.59 rows=1 width=243) (actual
   time=8007.069..8007.070 rows=1 loops=1)
     ->  Index Scan using shipment_import_records_pkey on
   shipment_import_records  (cost=0.44..5126628.40 rows=12030 width=243)
   (actual time=8007.068..8007.068 rows=1 l
   oops=1)
           Filter: (shipment_import_id = 1247888)
           Rows Removed by Filter: 10929193
   Planning Time: 0.584 ms
   Execution Time: 8007.086 ms
   (6 rows)
   
   
   And the smallest/latest, which actually uses the "right" index:
   
    # explain analyze SELECT "shipment_import_records".* FROM
   "shipment_import_records" WHERE
   "shipment_import_records"."shipment_import_id" = 5116174 ORDER BY 
   "shipment_import_records"."id" ASC LIMIT 1;      
                                                                       
                QUERY PLAN
                                                                    
                           
   ---------------------------------------------------------------------
   ---------------------------------------------------------------------
   ----------------------------
   ------------------------
   Limit  (cost=145.44..145.44 rows=1 width=243) (actual
   time=0.018..0.018 rows=1 loops=1)
     ->  Sort  (cost=145.44..145.64 rows=79 width=243) (actual
   time=0.017..0.018 rows=1 loops=1)
           Sort Key: id
           Sort Method: quicksort  Memory: 26kB
           ->  Index Scan using
   index_shipment_import_records_on_shipment_import_id on
   shipment_import_records  (cost=0.44..145.05 rows=79 width=243)
   (actual time=0.013
   ..0.014 rows=1 loops=1)
                 Index Cond: (shipment_import_id = 5116174)
   Planning Time: 0.104 ms
   Execution Time: 0.032 ms
   (8 rows)
   
> 
> But from the names of the field you may have correlation between
> shipment_import_id and id hidden somewhere ( like they are two
> serial
> growing together, you query for the latest shipment ids and it
> scans
> all the table ). An explain analyze should show that ( or three,
> one
> for that shipment import id, one for 1, one for a really big one )


This is definitely the case. And we are generally looking for newer
data for most operations.

Thanks for looking at it.

Re: Query planner issue with preferring primary key over a better index when using ORDER BY and LIMIT

Reply via email to