One potential issue I just thought of with this approach: disk space. Will I be doubling the amount of space used while both tables exist? If so, that would prevent this from working - I don’t have that much space available at the moment. --- Israel Brewster Software Engineer Alaska Volcano Observatory Geophysical Institute - UAF 2156 Koyukuk Drive Fairbanks AK 99775-7320 Work: 907-474-5172 cell: 907-328-9145
> On Jan 7, 2020, at 10:09 AM, Mark Zellers <ma...@adaptiveinsights.com> wrote: > > You don’t tell us if other users will be concurrently changing any of the > records involved. If you could guarantee that the table won’t be changed, > you might be better off doing a CREATE TABLE table_new as SELECT … FROM > table_old, dropping table_old, and finally renaming table_new. Given the > way Postgres handles updates, I would think that might perform significantly > better. Even if you did the work in batches (create a new table, > insert/select from the old table, drop, rename), that could well be better. > Especially if you re-create the indexes after all the data is moved. > > > > From: Israel Brewster <ijbrews...@alaska.edu> > Sent: Monday, January 6, 2020 10:36 AM > To: pgsql-general@lists.postgresql.org > Subject: UPDATE many records > > Thanks to a change in historical data, I have a need to update a large number > of records (around 50 million). The update itself is straight forward, as I > can just issue an "UPDATE table_name SET changed_field=new_value();" (yes, > new_value is the result of a stored procedure, if that makes a difference) > command via psql, and it should work. However, due to the large number of > records this command will obviously take a while, and if anything goes wrong > during the update (one bad value in row 45 million, lost connection, etc), > all the work that has been done already will be lost due to the transactional > nature of such commands (unless I am missing something). > > Given that each row update is completely independent of any other row, I have > the following questions: > > 1) Is there any way to set the command such that each row change is committed > as it is calculated? > 2) Is there some way to run this command in parallel in order to better > utilize multiple processor cores, other than manually breaking the data into > chunks and running a separate psql/update process for each chunk? Honestly, > manual parallelizing wouldn’t be too bad (there are a number of logical > segregations I can apply), I’m just wondering if there is a more automatic > option. > --- > Israel Brewster > Software Engineer > Alaska Volcano Observatory > Geophysical Institute - UAF > 2156 Koyukuk Drive > Fairbanks AK 99775-7320 > Work: 907-474-5172 > cell: 907-328-9145