> > James Harper wrote: > > The subquery returns a very small result set (0 or 1, assuming you use > > DISTINCT) and so isn't too inefficient. It's when you say 'WHERE NOT > > EXISTS (SOME QUERY WITH LOTS OF RESULTS)' that you start to really bog > > down > > True, but if the outer query contains a very large number of records, the > inner query is executed over and over and over and ... > > This multiplies the small execution time by a large number. > > Think of it like a WHILE loop that executes a block of code each time > round. The inner block of code might be quick, but if the loop goes round > several million times, the cumulative time becomes significant. > > One general rule of thumb of SQL execution is that bulk processing is > almost always faster than processing records individually in a loop.
I would have thought that the query planner would have reduced both solutions to the same thing (eg for every record in the outer or left part, match it up with the inner or left part). I agree that the JOIN lends itself better to optimization though, and that's certainly the way I would have written it as it is far clearer to read and scales to far more situations than the 'NOT EXISTS' solution does. I constructed a simple test scenario - created a table 'james' with a few hundred thousand rows in it of both real filenames and corrupted versions of real filenames, and ran the following tests: RESET QUERY CACHE; (can't remember if that was the exact syntax) SELECT COUNT(DISTINCT Name) FROM james AS a WHERE NOT EXISTS ( SELECT Name FROM Filename AS f WHERE f.Name = a.Name ); And RESET QUERY CACHE; SELECT COUNT(DISTINCT Name) FROM james AS a LEFT JOIN filename AS f ON a.name = f.name WHERE f.name IS NULL; And compared the times over a half a dozen runs. There wasn't a measurable difference between the results, but the server was busy enough at the time that a significant amount of 'noise' was added in - both queries ran for between 2 and 6 seconds. The simple timing test also completely fails to measure memory and io usage. I'm sure someone could construct some more precise tests that would eliminate any speculation, but on face value you aren't going to get an order-of-magnitude improvement or anything close just by tinkering with this query, unless you can identify some indexes that could be added onto the batch table (which is temporary isn't it?). Although given that Filename already has an index on the only field being used, I suspect that even that won't do much for you. Does MySQL have a 'profiler' tool like MSSQL does? I spend most of my time in MSSQL and if some operation is running slow I just attach the profiler to it and capture the queries and focus on the ones that are taking most of the time. James ------------------------------------------------------------------------------ Crystal Reports - New Free Runtime and 30 Day Trial Check out the new simplified licensing option that enables unlimited royalty-free distribution of the report engine for externally facing server and web deployment. http://p.sf.net/sfu/businessobjects _______________________________________________ Bacula-users mailing list Bacula-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-users