In response to "Yuri Timofeev" <[EMAIL PROTECTED]>: > 2008/9/9 Bill Moran <[EMAIL PROTECTED]>: > > In response to Bob Hetzel <[EMAIL PROTECTED]>: > >. Additionally, doing that > > rewrite so that it results in improvements to all SQL platforms is > > probably going to require that the code be broken into SQL-dependent > > and SQL-independent sections. > > It may be that in the critical sections have to do it.
Agreed. This was demonstrated last year by some massive speed improvements in the way that job records are saved. > The problem is that different DBMS have different optimizers. > For example, in PostgreSQL (during my tests) operator EXPLAIN does not > show that will be used indices : > > bacula=# EXPLAIN SELECT File.FileId, Job.JobId FROM File LEFT OUTER JOIN Job > ON > (File.JobId = Job.JobId) WHERE Job.JobId IS NULL LIMIT 300000; > > QUERY PLAN > --------------------------------------------------------------------------- > Limit (cost=78.86..24169.76 rows=300000 width=8) > -> Hash Left Join (cost=78.86..181966.90 rows=2265021 width=8) > Hash Cond: (file.jobid = job.jobid) > Filter: (job.jobid IS NULL) > -> Seq Scan on file (cost=0.00..113937.42 rows=4530042 width=8) > -> Hash (cost=58.38..58.38 rows=1638 width=4) > -> Seq Scan on job (cost=0.00..58.38 rows=1638 width=4) > (7 rows) > > However, PostgreSQL worked quickly. That's because using indexes on that query is a horrifically bad idea: bacula=# EXPLAIN SELECT File.FileId, Job.JobId FROM File LEFT OUTER JOIN Job ON bacula-# (File.JobId = Job.JobId) WHERE Job.JobId IS NULL LIMIT 300000; QUERY PLAN ----------------------------------------------------------------------------- Limit (cost=53.15..14650.94 rows=300000 width=8) -> Hash Left Join (cost=53.15..2115902.45 rows=43482932 width=8) Hash Cond: ("outer".jobid = "inner".jobid) Filter: ("inner".jobid IS NULL) -> Seq Scan on file (cost=0.00..1463605.32 rows=43482932 width=8) -> Hash (cost=50.52..50.52 rows=1052 width=4) -> Seq Scan on job (cost=0.00..50.52 rows=1052 width=4) (7 rows) bacula=# set enable_seqscan=off; SET Time: 0.210 ms bacula=# EXPLAIN SELECT File.FileId, Job.JobId FROM File LEFT OUTER JOIN Job ON bacula-# (File.JobId = Job.JobId) WHERE Job.JobId IS NULL LIMIT 300000; QUERY PLAN ------------------------------------------------------------------------------------------------------ Limit (cost=0.00..1042544.60 rows=300000 width=8) -> Merge Right Join (cost=0.00..151109653.30 rows=43482932 width=8) Merge Cond: ("outer".jobid = "inner".jobid) Filter: ("outer".jobid IS NULL) -> Index Scan using job_pkey on job (cost=0.00..175.80 rows=1052 width=4) -> Index Scan using file_jobid_idx on file (cost=0.00..150565938.23 rows=43482932 width=8) (6 rows) Notice that PostgreSQL expects the index scan on file_job_idx to take 150,565,938 operations along, whereas the entire sequential scan only takes 1,463,605. The seq scans are a clear winner in this case. > MySQL (theoretically) should use an index (multiple columns), which > already have: > in make_mysql_tables.in > CREATE TABLE File (... > INDEX (JobId, PathId, FilenameId) > > but MySQL worked slowly. I don't know much about the innards of MySQL's query planner, but it doesn't seem to be as robust as Postgres'. Index scanning is usually the best way to go, but (as is shown above) not always. I have trouble believing that the MySQL developers always use available indexes as that would create a laundry list of corner cases where queries perform horribly, but it seems that their planning logic fails in this particular case. It's not unusual. If you mess with Postgres long enough, you'll find odd queries that perform poorly even though there's a much faster way to do it. And that's the rub. If I were writing a system purely for PostgreSQL, I could reorder joins or make other tweaks to the query until it ran quickly, but then I might have a query that ran like molasses on MySQL. The inverse is true as well. My point is that such work is definitely warranted in the director, I'm not sure if it's warranted in something like dbcheck. If you've already got it done, there's no reason to waste it, but suggesting that someone review dbcheck and optimize the SQL is like asking me to clean toilets with a toothbrush -- it's a useful exercise to teach privates not to mouth off to sergeants, but it's not a particularly efficient way to get the toilet clean. I expect, that on Postgres, the fastest way to delete orphaned records would be: delete from filename where filename.filenameid not in (select filenameid from file); Assuming that is fast, would such a query even work on MySQL? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 ------------------------------------------------------------------------- This SF.Net email is sponsored by the Moblin Your Move Developer's challenge Build the coolest Linux based applications with Moblin SDK & win great prizes Grand prize is a trip for two to an Open Source event anywhere in the world http://moblin-contest.org/redirect.php?banner_id=100&url=/ _______________________________________________ Bacula-devel mailing list Bacula-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/bacula-devel