On Thu, Nov 06, 2014 at 02:55:20PM +0000, Shaun Thomas wrote: > > These updates aren't equivalent. It's very important you know this, because > you're also inflating your table with a lot of extra updated rows. > > Take the first UPDATE: > > > UPDATE second SET time1 = orig.time1 > > FROM orig > > WHERE second.key1 = orig.key1; > > If you wrote this as a SELECT, it would look like this: > > SELECT second.time1, orig.time1 > FROM second > JOIN ORIG ON (second.key1 = orig.key1) > > Since second is a many to one subset of orig, you now have several > simultaneous updates. Your second UPDATE: > > > UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second > > WHERE orig.key1 = second.key1 LIMIT 1); > > Is equivalent to this SELECT: > > SELECT second.time1, > (SELECT orig.time1 FROM orig,second > WHERE orig.key1 = second.key1 LIMIT 1) > FROM second; > > Meaning you'd only get as many updates as there are rows in second. The > difference is your LIMIT 1. However, since you're not using an ORDER BY > clause, the actual value you get for time1 will be indeterminate. Something > like this would remove the row inflation and fix the random time1 behavior, > but I'm not sure it was your intent: > > UPDATE second > SET time1 = orig.time1 > FROM (SELECT DISTINCT ON (key1) key1, time1 > FROM orig > ORDER BY key1, time1 DESC) sub > WHERE second.key1 = sub.key1;
I see now that I made more than one mistake. 1) I forgot to INCLUDE INDEXES when creating second. I would have seen dup keys when filling it. CREATE TABLE second (LIKE orig INCLUDING INDEXES); 2) I should have used something like this to fill second: INSERT INTO second (key1) SELECT key1 FROM orig ORDER BY random() LIMIT 400000; 3) I then incorrectly remembered the query I had written at work. It should have been: EXPLAIN ANALYZE UPDATE second se SET time1 = (SELECT time1 FROM orig WHERE orig.key1 = se.key1); Once the second table is filled with unique keys, then both UPDATES should have produced the same results, but the UPDATE FROM is faster than the UPDATE = SELECT, which is documented. My original intent was to find out what the performance differences between the two are. Thanks for pointing these things out! PJ -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general