One potential issue I just thought of with this approach: disk space. Will I be 
doubling the amount of space used while both tables exist? If so, that would 
prevent this from working - I don’t have that much space available at the 
moment.
---
Israel Brewster
Software Engineer
Alaska Volcano Observatory 
Geophysical Institute - UAF 
2156 Koyukuk Drive 
Fairbanks AK 99775-7320
Work: 907-474-5172
cell:  907-328-9145

> On Jan 7, 2020, at 10:09 AM, Mark Zellers <ma...@adaptiveinsights.com> wrote:
> 
> You don’t tell us if other users will be concurrently changing any of the 
> records involved.  If you could guarantee that the table won’t be changed, 
> you might be better off doing a CREATE TABLE table_new as SELECT … FROM 
> table_old, dropping table_old, and finally renaming table_new.   Given the 
> way Postgres handles updates, I would think that might perform significantly 
> better.  Even if you did the work in batches (create a new table, 
> insert/select from the old table, drop, rename), that could well be better.  
> Especially if you re-create the indexes after all the data is moved.
>  
>  
>  
> From: Israel Brewster <ijbrews...@alaska.edu> 
> Sent: Monday, January 6, 2020 10:36 AM
> To: pgsql-general@lists.postgresql.org
> Subject: UPDATE many records
>  
> Thanks to a change in historical data, I have a need to update a large number 
> of records (around 50 million). The update itself is straight forward, as I 
> can just issue an "UPDATE table_name SET changed_field=new_value();" (yes, 
> new_value is the result of a stored procedure, if that makes a difference) 
> command via psql, and it should work. However, due to the large number of 
> records this command will obviously take a while, and if anything goes wrong 
> during the update (one bad value in row 45 million, lost connection, etc), 
> all the work that has been done already will be lost due to the transactional 
> nature of such commands (unless I am missing something).
>  
> Given that each row update is completely independent of any other row, I have 
> the following questions:
>  
> 1) Is there any way to set the command such that each row change is committed 
> as it is calculated?
> 2) Is there some way to run this command in parallel in order to better 
> utilize multiple processor cores, other than manually breaking the data into 
> chunks and running a separate psql/update process for each chunk? Honestly, 
> manual parallelizing wouldn’t be too bad (there are a number of logical 
> segregations I can apply), I’m just wondering if there is a more automatic 
> option.
> ---
> Israel Brewster
> Software Engineer
> Alaska Volcano Observatory 
> Geophysical Institute - UAF 
> 2156 Koyukuk Drive 
> Fairbanks AK 99775-7320
> Work: 907-474-5172
> cell:  907-328-9145

Reply via email to