Sean Shanny <[EMAIL PROTECTED]> writes: > Sort of piggybacking on this thread but why the suggestion to drop the > use of DISTINCT in 7.4?
Because the 7.4 planner can decide for itself whether DISTINCT'ifying the sub-select output is the best way to proceed or not. There is more than one good way to do an "IN sub-SELECT" operation, and the 7.4 planner knows several. (Pre-7.4 planners didn't know any :-( ... but I digress.) When you write "foo IN (SELECT DISTINCT ...)", the DISTINCT doesn't change the semantics at all, it just adds overhead. In fact it's worse than that: if the planner decides that the best way to proceed is to make the subselect output unique, it will throw another layer of sort/unique processing on top of what you did. So writing DISTINCT is actually a pessimization in 7.4. > Example: > -> HashAggregate (cost=1020025.13..1020025.13 rows=51 width=4) > (actual time=954049.317..954450.065 rows=82208 loops=1) > -> Subquery Scan "IN_subquery" (cost=983429.20..1020025.00 > rows=51 width=4) (actual time=856641.244..953639.116 rows=82208 loops=1) > -> Unique (cost=983429.20..1020024.49 rows=51 width=4) > (actual time=856641.230..952939.539 rows=82208 loops=1) > -> Sort (cost=983429.20..1001726.84 rows=7319058 > width=4) (actual time=856641.215..906429.835 rows=11067735 loops=1) > Sort Key: useragent_key > -> Index Scan using The sort/unique steps are coming from the DISTINCT. The HashAggregate step is the planner making sure the output rows are distinct :-( I just a couple days ago added some logic to CVS tip to notice that the sub-select has a DISTINCT clause, and not add unnecessary unique-ifying processing on top of it. So in 7.5, writing a DISTINCT clause will amount to forcing a particular query plan, which might or might not be the best thing but hopefully won't be too terrible. But in 7.4 it has nothing to recommend it ... regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster