On Sun, 28 Sep 2003, Tom Lane wrote: > Stephan Szabo <[EMAIL PROTECTED]> writes: > > On Sat, 27 Sep 2003, Tom Lane wrote: > >> I thought of what seems to be a better design for the check query: use > >> a LEFT JOIN and check for NULL in the righthand joined column. > > > Hmm, my initial testing showed that it really was a little slower > > than a more complicated one with NOT EXISTS so I'd abandoned it. How does > > it fare for you compared to: > > select f1, f2 from fk where not exists (select 1 from pk where pk.f1=fk.f1 > > and pk.f2=pk.f2) where fk.f1 is not null and fk.f2 is not null; > > Were you testing against 7.3 or 7.4? On what kinds of tables? > > In 7.4 I think that the JOIN would yield as good or better a plan. The > best possible plan for the NOT EXISTS query is effectively a nestloop > with inner indexscan, which is great if the FK table is small and the > PK table is large, but it sucks otherwise. The planner should choose a > plan of this form for the LEFT JOIN given that combination of table > sizes, and so there shouldn't be any great difference in runtime in that > case. But in other combinations, such as large FK and small PK, other > plan types will beat the pants off nestloop.
As an update, so far I still am getting better results with NOT EXISTS than the left join. For a 50m row fk, 10k row pk where the rows are just the keys, I'm getting a plan like Merge Join Index scan on pktable Sort Seqscan on fktable which is taking about 2-4 times longer for me than the not exists depending on sort_mem (at 4096,64000,128000). When I lowered random_page_cost to 1, I got an indexscan on fktable, but that hadn't seemed to finish after about 2 hours (as opposed to about 30-35 minutes for the not exists and about 30 min - 1 1/2 hours for the sort+seqscan version. I want to do some more tests where there's extraneous data in both tables and see what that does to the results. ---------------------------(end of broadcast)--------------------------- TIP 8: explain analyze is your friend