Re: [GENERAL] Finding (and deleting) dupes in relation table

Markus Wollny Fri, 23 Sep 2005 03:07:13 -0700

Hello!

CSN [EMAIL PROTECTED] wrote:
> I have a table that relates id's of two other tables:
> 
> table1id, table2id
> 
> Dupes have found their way into it (create unique index
> across both fields fails). Is there a quick and easy way to
> find and delete the dupes (there are tens of thousands of records)?
> 
> Thanks,
> CSN


If your table was created WITH OIDS you could identify the duplicates
thus:

select a.table1id
     , a.table12d
     , max(a.oid) as maxoid
     , count(a.oid) as coid
  from schema.mytable a,
       schema.mytable b
 where a.table1id = b.table1id
   and a.table2id=b.table2id
   and a.oid <> b.oid
 group by a.table1id, a.table2id
 order by a.table1id;

If you wish to delete surplus rows, you might do the following:

delete from schema.mytable where oid in (
        select maxoid from (
                        select a.table1id, a.table12d, max(a.oid) as
maxoid, count(a.oid) as coid
                                from schema.mytable a,
                                     schema.mytable b
                         where a.table1id = b.table1id
                           and a.table2id=b.table2id
                           and a.oid <> b.oid
                         group by a.table1id, a.table2id
                         order by a.table1id ) as foo
          where coid >1 );

This will delete the oldest tuple of a duplicate set of rows; if there
are more than two tuples in a set, you'll want to execute this a couple
of times until there's no duplicate left, as the delete will only reduce
a set by one tuple at a time. I'd also recommend to apply a PRIMARY KEY
constraint afterwards instead of just a unique index - this will prevent
NULL-entries as well as creating the desired unique index - and I think
it's good practice to have a primary key on about every table there is,
except when it's just a junk data table like a logging table where
content is regularly evaluated and discarded.

Kind regards

   Markus

---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
       choose an index scan if your joining column's datatypes do not
       match

Re: [GENERAL] Finding (and deleting) dupes in relation table

Reply via email to