[BUGS] BUG #4284: Optimizer chooses bad plan with LEFT join
The following bug has been logged online: Bug reference: 4284 Logged by: David Rowley Email address: [EMAIL PROTECTED] PostgreSQL version: 8.3.3 Operating system: Windows XP and 2003 Description:Optimizer chooses bad plan with LEFT join Details: I was asked a few days ago to gather certain information from our database. The schema was never really designed for this query and it’s quite a bodge job, but it gets the intended results. I was quite surprised that the query I came up with took over 2 minutes to run, giving that the biggest of the tables is only around 30k rows. A quick look at the query plan showed me that the optimizer was choosing to perform a nested loop for the left join, probably since it thought there was only 1 row in that (derived) table. I have managed to re-create this and I have included a full re-creation script including create tables and inserts to generate some mocked up data. I’ve only tested on 8.3.3, I will test if required on other versions but hopefully the recreation script will stop the need. I’ve tuned the number of rows in the mocked up tables so the example query only takes around 10 seconds (on this computer). That should be enough time to show that it’s not the best plan. Let me know if you are unable to re-created and I’ll see what I can do to provide more information, (settings etc.) I can get the query to run more quickly in the meantime just by using a FULL OUTER JOIN and adding a WHERE clause to filter out the NULLs Anyway… less talk more facts you say… CREATE TABLE production ( despid SERIAL, productiondate DATE NOT NULL, lineid INT NOT NULL, partcode VARCHAR(20) NOT NULL, quantity INT NOT NULL, PRIMARY KEY(despid) ); CREATE TABLE production_line ( lineid INT PRIMARY KEY, linename VARCHAR(16) NOT NULL ); CREATE TABLE batches ( batchid SERIAL, batchcode VARCHAR(16) NOT NULL, lineid INT NOT NULL, partcode VARCHAR(20) NOT NULL, productiondate DATE NOT NULL, bestbefore DATE NOT NULL, PRIMARY KEY(batchid) ); /* Table to help create mock data */ CREATE TABLE parts ( partcode VARCHAR(20) NOT NULL, PRIMARY KEY(partcode) ); /* Create fake parts */ INSERT INTO parts (partcode) VALUES('PART1'),('PART2'),('PART3'),('PART4'),('PART5'),('PART6'),('PART7'); /* Table to help create mock data */ CREATE TABLE calendar ( date DATE, PRIMARY KEY(date) ); /* 200 rows here is ok for testing */ INSERT INTO calendar (date) SELECT '2008-06-30'::DATE + num * '1 day'::INTERVAL FROM (select generate_series(1,200) AS num) AS series; INSERT INTO production_line (lineid,linename) VALUES(1,'Line 1'),(2,'Line 2'),(3,'Line 3'); /* Populate production with the cartasian product of calendar, production_line and parts * We need a table big enough to get the optimizer not to nest loop. Or at least to produce * a query where a nested loop would be slower than a merge join. */ INSERT INTO production (productiondate,lineid,partcode,quantity) SELECT cal.date, l.lineid, p.partcode, CAST(RANDOM() * 1000 AS INT) AS qty FROM calendar AS cal, production_line AS l, parts AS p; /* Now to populate the batches table, cartasian product once again. */ INSERT INTO batches (batchcode,lineid,partcode,productiondate,bestbefore) SELECT 'ABC123', l.lineid, p.partcode, cal.date, cal.date + '100 days'::INTERVAL FROM calendar AS cal, production_line AS l, parts AS p; /* Ensure we have stats for these tables */ ANALYZE; EXPLAIN ANALYZE SELECT t1.productiondate, t3.linename, t1.partcode, t1.batchcode, t1.bestbefore, t4.quantity / t2.number_of_bbdates AS est_packs_in_batch, t2.number_of_bbdates AS number_of_batches FROM batches t1 INNER JOIN (SELECT productiondate, lineid, partcode, COUNT(DISTINCT bestbefore) AS number_of_bbdates FROM batches GROUP BY productiondate,lineid,partcode ) AS t2 ON t1.productiondate = t2.productiondate AND t1.lineid = t2.lineid AND t1.partcode = t2.partcode INNER JOIN production_line t3 ON t1.lineid = t3.lineid LEFT OUTER JOIN (SELECT productiondate, lineid, partcode, SUM(quantity) AS quantity FROM production GROUP BY productiondate,partcode,lineid ) t4 ON t1.productiondate = t4.productiondate AND t1.lineid = t4.lineid AND t1.partcode = t4.partcode ; /* test=# SELECT VERSION(); version - PostgreSQL 8.3.3, compiled by Visual C++ build 1400 EXPLAIN ANALYZE output QUERY PLAN - - Nested L
Re: [BUGS] BUG #4274: uuid returns duplicate values
Hi. This can obtain a comfortable result. http://winpg.jp/~saito/pg_work/OSSP_win32/pg8.3.3-win-bin-uuid-ossp-20080706.zip I will adjust with Rarf-san. Thanks! Regards, Hiroshi Saito - Original Message - From: "Hiroshi Saito" <[EMAIL PROTECTED]> To: "Eric P. Melbardis" <[EMAIL PROTECTED]>; "Nicolas ANTONINI" <[EMAIL PROTECTED]> Cc: Sent: Saturday, July 05, 2008 1:28 AM Subject: Re: [BUGS] BUG #4274: uuid returns duplicate values Hi. uuid of MS is the approach different from ossp. Then, I regret that it can't provide on other platformsBut, I tried it. http://winpg.jp/~saito/pg_work/OSSP_win32/try-uuidwin32/ If someone wishes, I think that this is the place where pgfoundry is suitable. Therefore, I will work an effort to clear the bug of uuid-ossp persistently. Please give me time to a slight degree. I appreciate your perseverance. thanks! Regards, Hiroshi Saito - Original Message - From: "Hiroshi Saito" <[EMAIL PROTECTED]> Hi. Sorry, late reactionNow, private time has only a few. Umm, It seems that the bug is probably contained in another part. I will investigate at a weekend again. Anyway, thank you for useful information.! Regards, Hiroshi Saito - Original Message - From: "Eric P. Melbardis" <[EMAIL PROTECTED]> Hi I would suspect that the clock used may not be precise enough Not with standing, why not use the windows CreateUUID?? Slower system seems to fare better... (please see results below) Regards Eric --- My office system is xeon (duo core 5160) 3.00 ghz, 2 gig-ram psql:q.sql:168: INFO: uuid = b2631eb7-26ad-4b40-9276-ed1fc43f0944 psql:q.sql:168: INFO: uuid = 45472e3e-a88e-46ff-b2b1-cfa229b8c568 psql:q.sql:168: INFO: uuid = 45472e3e-a88e-46ff-b2b1-cfa229b8c568 psql:q.sql:168: INFO: uuid = 45472e3e-a88e-46ff-b2b1-cfa229b8c568 psql:q.sql:168: INFO: uuid = 45472e3e-a88e-46ff-b2b1-cfa229b8c568 psql:q.sql:168: INFO: uuid = 45472e3e-a88e-46ff-b2b1-cfa229b8c568 psql:q.sql:168: INFO: uuid = 45472e3e-a88e-46ff-b2b1-cfa229b8c568 psql:q.sql:168: INFO: uuid = 45472e3e-a88e-46ff-b2b1-cfa229b8c568 psql:q.sql:168: INFO: uuid = efd009ab-c38b-4efa-8b0c-4ef0280dc781 psql:q.sql:168: INFO: uuid = 10c8c7aa-1367-44fa-ac76-a0f9b1ca69b7 psql:q.sql:168: INFO: uuid = 10c8c7aa-1367-44fa-ac76-a0f9b1ca69b7 psql:q.sql:168: INFO: uuid = 10c8c7aa-1367-44fa-ac76-a0f9b1ca69b7 psql:q.sql:168: INFO: uuid = 10c8c7aa-1367-44fa-ac76-a0f9b1ca69b7 psql:q.sql:168: INFO: uuid = 10c8c7aa-1367-44fa-ac76-a0f9b1ca69b7 psql:q.sql:168: INFO: uuid = 10c8c7aa-1367-44fa-ac76-a0f9b1ca69b7 psql:q.sql:168: INFO: uuid = 10c8c7aa-1367-44fa-ac76-a0f9b1ca69b7 psql:q.sql:168: INFO: uuid = 10c8c7aa-1367-44fa-ac76-a0f9b1ca69b7 psql:q.sql:168: INFO: uuid = 10c8c7aa-1367-44fa-ac76-a0f9b1ca69b7 psql:q.sql:168: INFO: uuid = abc4884d-273e-49ce-82aa-46226af549c3 psql:q.sql:168: INFO: uuid = 5d5fce48-987a-46e9-ac03-e567e703aafb psql:q.sql:168: INFO: uuid = 5d5fce48-987a-46e9-ac03-e567e703aafb psql:q.sql:168: INFO: uuid = 5d5fce48-987a-46e9-ac03-e567e703aafb psql:q.sql:168: INFO: uuid = 5d5fce48-987a-46e9-ac03-e567e703aafb psql:q.sql:168: INFO: uuid = 5d5fce48-987a-46e9-ac03-e567e703aafb psql:q.sql:168: INFO: uuid = 5d5fce48-987a-46e9-ac03-e567e703aafb psql:q.sql:168: INFO: uuid = 5d5fce48-987a-46e9-ac03-e567e703aafb psql:q.sql:168: INFO: uuid = 5d5fce48-987a-46e9-ac03-e567e703aafb psql:q.sql:168: INFO: uuid = 5d5fce48-987a-46e9-ac03-e567e703aafb psql:q.sql:168: INFO: uuid = 5d5fce48-987a-46e9-ac03-e567e703aafb psql:q.sql:168: INFO: uuid = 6c3da39d-467e-484c-9b68-6a20f8980c3a psql:q.sql:168: INFO: uuid = 6c3da39d-467e-484c-9b68-6a20f8980c3a psql:q.sql:168: INFO: uuid = 6c3da39d-467e-484c-9b68-6a20f8980c3a psql:q.sql:168: INFO: uuid = 6c3da39d-467e-484c-9b68-6a20f8980c3a psql:q.sql:168: INFO: uuid = 6c3da39d-467e-484c-9b68-6a20f8980c3a psql:q.sql:168: INFO: uuid = 6c3da39d-467e-484c-9b68-6a20f8980c3a psql:q.sql:168: INFO: uuid = 6c3da39d-467e-484c-9b68-6a20f8980c3a psql:q.sql:168: INFO: uuid = 6c3da39d-467e-484c-9b68-6a20f8980c3a psql:q.sql:168: INFO: uuid = 754268bc-411d-48dd-b68d-5b449c18ecab psql:q.sql:168: INFO: uuid = a0ae6b93-adec-4523-b422-8eba041d76de - --- My home system is Pentium D 3.2 hz 2g ram (NON-xeon!) psql:q.sql:168: INFO: uuid = 5dce8d5f-0f36-40fc-a34c-af74991e7fb3 psql:q.sql:168: INFO: uuid = ae9b28c3-1614-4092-93dd-ba1d8392622c psql:q.sql:168: INFO: uuid = ae9b28c3-1614-4092-93dd-ba1d8392622c psql:q.sql:168: INFO: uuid = 14146307-231e-4feb-aa97-a709f6515d05 psql:q.sql:168: INFO: uuid = 6a70d36c-de75-44ca-947b-3785eac42d49 psql:q.sql:168: INFO: uuid = 6a70d36c-de75-44ca-947b-3785eac42d49 psql:q.sql:168: INFO: uuid = 849650aa-1004-4c28-b7b5-dce9bcff22e4 psql:q.sql:168: INFO: uuid = f950d8d9-c257-41cc-8206-7267698755f1 psql:q.sql:168: INFO: uuid = f950d8d9-c257-41cc-8206-7267698755f1 psql:q.sql:168: INFO: uuid = f950d8d9-c257-41cc-8206-7267698755f1 psql:q.sql:168: INFO: uuid = df203405-a3d9-40