"Roman F" <[EMAIL PROTECTED]> writes:
> ...  Executing something like
> the following statement would work, but even with indexes it takes an
> insane amount of time to execute for each of the tables:

> DELETE FROM child_table WHERE parentid NOT IN
>   (SELECT parentid FROM parent_table)

Uh, what sort of query plan are you getting for that?

PG 7.4 and up can do a reasonable job with NOT IN if the sub-select is
small enough to fit into an in-memory hash table (of size sort_mem).
I'm betting that your sort_mem setting is not high enough to encourage
the planner to try the hash method.  You could try increasing sort_mem
... but given the size of your tables, you might end up with a hash
table large enough to drive the system into swapping, in which case
it'll still be mighty slow.

Another idea is to try an outer join:

        SELECT child_table.parentid INTO tmp_table
        FROM child_table LEFT JOIN parent_table
             ON (child_table.parentid = parent_table.parentid)
        WHERE parent_table.parentid IS NULL;

which essentially does a join and then pulls out just the child_table
rows that failed to match.  This will probably end up getting done via a
merge join or hybrid hash join, either of which are more scalable than
the NOT IN code.  You still have to do the actual deletions in
child_table, but as long as there aren't too many, a NOT IN using
tmp_table should work OK.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Reply via email to