Just out of curiosity, what kind of trigger are you using, a row level trigger 
or a statement level trigger?  If you are using a row level trigger, see if you 
can achieve your requirements using a statement level trigger instead.  I’m 
relatively new to Postgres, so there could be some limit that I’m not aware of, 
but my understanding is that you have access to the old and new values of the 
updated rows in the after statement trigger.  It would likely be much more 
performant to do your operation once after the statement is done rather than 
firing a trigger on every changed row.

Regards,

Mark Z.


From: Israel Brewster <ijbrews...@alaska.edu>
Sent: Monday, January 6, 2020 1:24 PM
To: Michael Lewis <mle...@entrata.com>
Cc: Rob Sargent <robjsarg...@gmail.com>; Alban Hertroys <haram...@gmail.com>; 
Christopher Browne <cbbro...@gmail.com>; pgsql-generallists.postgresql.org 
<pgsql-general@lists.postgresql.org>
Subject: Re: UPDATE many records


On Jan 6, 2020, at 11:54 AM, Michael Lewis 
<mle...@entrata.com<mailto:mle...@entrata.com>> wrote:

I’m thinking it might be worth it to do a “quick” test on 1,000 or so records 
(or whatever number can run in a minute or so), watching the processor 
utilization as it runs. That should give me a better feel for where the 
bottlenecks may be, and how long the entire update process would take. I’m 
assuming, of course, that the total time would scale more or less linearly with 
the number of records.

I think that depends on how your identify and limit the update to those 1000 
records. If it is using a primary key with specific keys in an array, probably 
close to linear increase because the where clause isn't impactful to the 
overall execution time. If you write a sub-query that is slow, then you would 
need to exclude that from the time. You can always run explain analyze on the 
update and rollback rather than commit.
So a test run on 9,299 records took about 7 seconds to complete (EXPLAIN 
ANALYZE output at https://explain.depesz.com/s/lIYn if it matters), during 
which time I did see a postmaster process consuming 100% CPU. Upping the test 
to 20,819 records took about 16.5 seconds, so that looks relatively linear to 
me. Also, CPU bound. So by my calculations, doing all 50M records would take 
around 10 hours.

One potentially significant note: most of the execution time is spent in a 
trigger. This trigger is actually what’s doing the REAL update that I need to 
happen. If it would make a difference, I could easily pull the trigger code out 
to a separate function that I just call directly (with triggers temporarily 
disabled). My thinking is that calling a function is calling a function, and 
the fact that it is currently called via a trigger rather than direct is of 
little consequence, but I’m willing to be corrected on that :-)

---
Israel Brewster
Software Engineer
Alaska Volcano Observatory
Geophysical Institute - UAF
2156 Koyukuk Drive
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

Reply via email to