[HACKERS] Postgres 8.4 planner question - bad plan, good plan for almost same queries.

Дмитрий Tue, 20 Mar 2012 01:08:58 -0700

Running PostgreSQL 8.4.8 on x86_64-pc-linux-gnu, compiled by GCCgcc-4.6.real (Ubuntu/Linaro 4.6.0-7ubuntu1) 4.6.1, 64-bit

under Ubuntu 11.10.

Got large table with geography (PostGIS 1.5) data, running two almostsame queries with same result, but very different performance.


First query:

with uuu as (
    select dml.id
      from mp_locs12 dml
    where (complex conditions, leaving about 100 rows from millions)
  )

select label, country, region, parish, city, district,st_geometrytype(loc::geometry) as gtype,

    '0x' || to_hex(type) as n_type, file_name, line
    from mp_locs12 dml1
    where dml1.id in (select uu.id from uuu uu)
        and not exists (
          select 1 from mp_locs12 dml2
            where dml2.id in (select uu.id from uuu uu)
              and dml2.id <> dml1.id
              and not st_contains(dml1.loc::geometry, dml2.loc::geometry)
        );

Planner choose to seqscan dml2 table in NOT EXISTS condition, verysurprising for me - "dml2.id in (select uu.id from uuu uu)" is a bestcut off here I believe, but planner thinks different (for dml1 it does not).



                  QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop Anti Join  (cost=2451523.67..8486747.31 rows=1 width=2130)

Join Filter: ((dml2.id <> dml1.id) AND ((NOT ((dml1.loc)::geometry&& (dml2.loc)::geometry)) OR (NOT _st_contains((dml1.loc)::geometry,(dml2.loc)::geometry))))

   CTE uuu
     ->  Seq Scan on mp_locs12 dml  (cost=0.00..2451523.62 rows=1 width=4)

Filter: (complex conditions, leaving about 100 rows frommillions)

   ->  Nested Loop  (cost=0.02..8.85 rows=1 width=2134)
         ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4)
               ->  CTE Scan on uuu uu  (cost=0.00..0.02 rows=1 width=4)

-> Index Scan using mp_locs12_pkey on mp_locs12 dml1(cost=0.00..8.81 rows=1 width=2134)

               Index Cond: (dml1.id = uu.id)

-> Seq Scan on mp_locs12 dml2 (cost=0.02..1750366.67 rows=15581266width=1946)

         Filter: (hashed SubPlan 2)
         SubPlan 2
           ->  CTE Scan on uuu uu  (cost=0.00..0.02 rows=1 width=4)
(14 rows)


Now try to use bit modified query:

explain with uuu as (
    select dml.id
      from mp_locs12 dml
    where (complex conditions, leaving about 100 rows from millions)
  )

select label, country, region, parish, city, district,st_geometrytype(loc::geometry) as gtype,

    '0x' || to_hex(type) as n_type, file_name, line
    from mp_locs12 dml1
    where dml1.id in (select uu.id from uuu uu)

and lower(st_geometrytype(dml1.loc::geometry)) not in('st_geometrycollection')

        and not exists (
          select 1 from (
              select dml2.id as id from mp_locs12 dml2
                where dml2.id in (select uu.id from uuu uu)

and not st_contains(dml1.loc::geometry,dml2.loc::geometry)and lower(st_geometrytype(dml2.loc::geometry)) not in('st_geometrycollection')

            ) vv
            where vv.id <> dml1.id
        );

Only thing I changed - scanned CTE for ids to get, then compared takenids with dml1.id. And now planner chose best plan (I think so):



                                                 QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=2139898.32..2139916.30 rows=1 width=2130)
   CTE uuu
     ->  Seq Scan on mp_locs12 dml  (cost=0.00..2139898.30 rows=1 width=4)

Filter: (complex conditions, leaving about 100 rows frommillions)

   ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4)
         ->  CTE Scan on uuu uu  (cost=0.00..0.02 rows=1 width=4)

-> Index Scan using mp_locs12_pkey on mp_locs12 dml1(cost=0.00..17.95 rows=1 width=2134)

         Index Cond: (dml1.id = uu.id)

Filter: ((lower(st_geometrytype((dml1.loc)::geometry)) <>'st_geometrycollection'::text) AND (NOT (SubPlan 2)))

         SubPlan 2
           ->  Nested Loop  (cost=0.02..9.13 rows=1 width=0)
                 ->  HashAggregate  (cost=0.02..0.03 rows=1 width=4)

-> CTE Scan on uuu uu (cost=0.00..0.02 rows=1width=4)-> Index Scan using mp_locs12_pkey on mp_locs12 dml2(cost=0.00..9.08 rows=1 width=4)

                       Index Cond: (dml2.id = uu.id)

Filter: ((dml2.id <> $2) AND(lower(st_geometrytype((dml2.loc)::geometry)) <>'st_geometrycollection'::text) AND ((NOT (($1)::geometry &&(dml2.loc)::geometry)) OR (NOT _st_contains(($1)::geometry,(dml2.loc)::geometry))))

(16 rows)

What makes planner o choose so differently in almost same cases? Is it abug, or I misunderstood something?


Regards
Dmitry

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Postgres 8.4 planner question - bad plan, good plan for almost same queries.

Reply via email to