On Wed, 2010-03-03 at 17:26 +0100, mk wrote: > > So there *may* be some evidence that joins are indeed bad in > practice. > If someone has smth specific/interesting on the subject, please post.
I have found joins to cause problems in a few cases - I'm talking about relatively large tables though - roughly order 10^8 rows. I'm on Mysql normally, but that shouldn't make any difference - I've seen almost the same situation on Oracle consider this simple example: /* Table A */ CREATE TABLE TableA ( project_id BIGINT NOT NULL, cost INT, date DATETIME, PRIMARY KEY (project_id, date) ); /* Table projects */ CREATE TABLE projects ( client_id BIGINT NOT NULL, project_id BIGINT NOT NULL, INDEX(client_id) ); ... now the index on TableA has been optimised for queries against date ranges on specific project ids which should more or less be sequential (under a load of other assumptions) - but that reduces the efficiency of the query under a join with the table "projects". If you denormalise the table, and update the first index to be on (client_id, project_id, date) it can end up running far more quickly - assuming you can access the first mapping anyway - so you're still storing the first table, with stored procedures to ensure you still have correct data in all tables. I'm definitely glossing over the details - but I've definitely got situations where I've had to choose denormalisation over purity of data. Rolled-up data tables are other situations - where you know half your queries are grouping by field "A" it's sometimes a requirement to store that. Tim -- http://mail.python.org/mailman/listinfo/python-list