On Thu, Nov 06, 2014 at 02:55:20PM +0000, Shaun Thomas wrote:
> 
> These updates aren't equivalent. It's very important you know this, because 
> you're also inflating your table with a lot of extra updated rows.
> 
> Take the first UPDATE:
> 
> > UPDATE second SET time1 = orig.time1
> > FROM orig
> > WHERE second.key1 = orig.key1;
> 
> If you wrote this as a SELECT, it would look like this:
> 
> SELECT second.time1, orig.time1
>   FROM second
>   JOIN ORIG ON (second.key1 = orig.key1)
> 
> Since second is a many to one subset of orig, you now have several 
> simultaneous updates. Your second UPDATE:
> 
> > UPDATE second SET time1 = (SELECT orig.time1 FROM orig,second
> > WHERE orig.key1 = second.key1 LIMIT 1);
> 
> Is equivalent to this SELECT:
> 
> SELECT second.time1,
>        (SELECT orig.time1 FROM orig,second
>          WHERE orig.key1 = second.key1 LIMIT 1)
>   FROM second;
> 
> Meaning you'd only get as many updates as there are rows in second. The 
> difference is your LIMIT 1. However, since you're not using an ORDER BY 
> clause, the actual value you get for time1 will be indeterminate. Something 
> like this would remove the row inflation and fix the random time1 behavior, 
> but I'm not sure it was your intent:
> 
> UPDATE second
>    SET time1 = orig.time1
>   FROM (SELECT DISTINCT ON (key1) key1, time1
>           FROM orig
>          ORDER BY key1, time1 DESC) sub
>  WHERE second.key1 = sub.key1;

I see now that I made more than one mistake.

1) I forgot to INCLUDE INDEXES when creating second.  I would have
   seen dup keys when filling it.

        CREATE TABLE second (LIKE orig INCLUDING INDEXES);

2) I should have used something like this to fill second:

        INSERT INTO second (key1)
                SELECT key1 FROM orig
                ORDER BY random()
                LIMIT 400000;

3) I then incorrectly remembered the query I had written at work.  It
   should have been:

        EXPLAIN ANALYZE
        UPDATE second se SET time1 = (SELECT time1 FROM orig
                                WHERE orig.key1 = se.key1);


Once the second table is filled with unique keys, then both UPDATES
should have produced the same results, but the UPDATE FROM is faster
than the UPDATE = SELECT, which is documented.

My original intent was to find out what the performance differences
between the two are.

Thanks for pointing these things out!
PJ


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to