Robert Haas <robertmh...@gmail.com> writes: > With respect to the root of the issue (why does the anti-join take so > long?), my first thought was that perhaps the OP was very unlucky and > had a lot of values that hashed to the same bucket. But that doesn't > appear to be the case.
Well, yes it is. Notice what the subquery is doing: for each row in "box", it's pulling all matching "box_id"s from message and running a self-join across those rows. The hash join condition is a complete no-op. And some of the box_ids have hundreds of thousands of rows. I'd just write it off as being a particularly stupid way to find the max(), except I'm not sure why deleting just a few thousand rows improves things so much. It looks like it ought to be an O(N^2) situation, so the improvement should be noticeable but not amazing. And if you force it to not use a hashjoin, suddenly things are better. Nestloop should also be O(N^2) in this situation, but seemingly it avoids whatever weird corner case we are hitting here. As Cedric says, the lack of any CHECK_FOR_INTERRUPTS in this loop is also problematic. I'm not sure that right there is an ideal place to put it, but we need one somewhere in the loop. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers