On Nov 30, 2005, at 01:55, Martijn van Oosterhout wrote:

On Tue, Nov 29, 2005 at 09:58:49PM -0500, John D. Burger wrote:
I could use some help with the following:

I have a database of geographic entities with attributes spread across
several tables.  I need to determine which entities are unique with
respect to some of those attributes.  I'm using the following query:

<snip>

If you put the gazPlaceID as a result of the uniqs subquery, that would
avoid the second lookup, right? Whether it's much faster is the
question. So something like:

select p1.gazPlaceID
     from gazPlaces as p1
       join gazNamings as n1 using (gazPlaceID)
       join gazContainers as c1 using (gazPlaceID)
group by p1.gazPlaceID, p1.featureType, n1.placeNameID, c1.containerID
     having count(*) = 1

The problem is that then every row is unique, because gazPlaceID is a primary key. As far as I can see, I need to group on just the other three columns - they're the ones for which I'm interested in uniqueness.

Secondly, what does the plan look like? Is it materialising or sorting at any stage?
Finally, what version of postgres?

Version is 7.4.8 for Solaris. Below is (a version of) the query again, as well as the plan. No materialization, I think, but it appears to be sorting the first three-way join to do the counts, then sorting the second one to merge. Cost estimates are way off, as the final result has almost 10M rows, but everything is analyzed, with indexes on every column, although none of them get used.

Again, any suggestions on tweaking this query, or a completely different approach to finding the entities with unique combinations, would be much appreciated.

- John Burger
  MITRE

select p2.gazPlaceID, u.*
  into table tempCanonical_nameMatchEquiver_3435_1
  from
  (select p1.featureType, n1.placeNameID, c1.containerID
    from gazPlaces as p1
      join gazNamings as n1 using (gazPlaceID)
        join gazContainers as c1 using (gazPlaceID)
    group by p1.featureType, n1.placeNameID, c1.containerID
    having count(*) = 1) as u,
  gazPlaces as p2
    join gazNamings as n2 using (gazPlaceID)
      join gazContainers as c2 using (gazPlaceID)
  where u.featureType = p2.featureType
  and u.placeNameID = n2.placeNameID
  and u.containerID = c2.containerID;

 Hash Join  (cost=4009535.96..4316595.92 rows=306 width=16)
Hash Cond: (("outer".gazplaceid = "inner".gazplaceid) AND ("outer".containerid = "inner".containerid)) -> Seq Scan on gazcontainers c2 (cost=0.00..141636.45 rows=9193945 width=8)
   ->  Hash  (cost=4006472.81..4006472.81 rows=282029 width=20)
-> Merge Join (cost=3777226.54..4006472.81 rows=282029 width=20) Merge Cond: (("outer".featuretype = "inner".featuretype) AND ("outer".placenameid = "inner".placenameid)) -> Subquery Scan u (cost=2107001.20..2259698.67 rows=5552635 width=12) -> GroupAggregate (cost=2107001.20..2204172.32 rows=5552635 width=12)
                           Filter: (count(*) = 1)
-> Sort (cost=2107001.20..2120882.79 rows=5552635 width=12) Sort Key: p1.featuretype, n1.placenameid, c1.containerid -> Hash Join (cost=688064.17..1217844.46 rows=5552635 width=12) Hash Cond: ("outer".gazplaceid = "inner".gazplaceid) -> Seq Scan on gaznamings n1 (cost=0.00..156331.05 rows=10147805 width=8) -> Hash (cost=642816.39..642816.39 rows=6128713 width=16) -> Hash Join (cost=160244.91..642816.39 rows=6128713 width=16) Hash Cond: ("outer".gazplaceid = "inner".gazplaceid) -> Seq Scan on gazcontainers c1 (cost=0.00..141636.45 rows=9193945 width=8) -> Hash (cost=120982.13..120982.13 rows=6128713 width=8) -> Seq Scan on gazplaces p1 (cost=0.00..120982.13 rows=6128713 width=8) -> Sort (cost=1670225.33..1685547.11 rows=6128713 width=16)
                     Sort Key: p2.featuretype, n2.placenameid
-> Hash Join (cost=160244.91..684040.19 rows=6128713 width=16) Hash Cond: ("outer".gazplaceid = "inner".gazplaceid) -> Seq Scan on gaznamings n2 (cost=0.00..156331.05 rows=10147805 width=8) -> Hash (cost=120982.13..120982.13 rows=6128713 width=8) -> Seq Scan on gazplaces p2 (cost=0.00..120982.13 rows=6128713 width=8)


---------------------------(end of broadcast)---------------------------
TIP 1: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to [EMAIL PROTECTED] so that your
      message can get through to the mailing list cleanly

Reply via email to