> On 6 Jan 2020, at 21:15, Israel Brewster <ijbrews...@alaska.edu> wrote: > >> On Jan 6, 2020, at 10:08 AM, Christopher Browne <cbbro...@gmail.com> wrote: >> >> On Mon, 6 Jan 2020 at 13:36, Israel Brewster <ijbrews...@alaska.edu> wrote: >> Thanks to a change in historical data, I have a need to update a large >> number of records (around 50 million). The update itself is straight >> forward, as I can just issue an "UPDATE table_name SET >> changed_field=new_value();" (yes, new_value is the result of a stored >> procedure, if that makes a difference) command via psql, and it should work. >> However, due to the large number of records this command will obviously take >> a while, and if anything goes wrong during the update (one bad value in row >> 45 million, lost connection, etc), all the work that has been done already >> will be lost due to the transactional nature of such commands (unless I am >> missing something). >> >> Given that each row update is completely independent of any other row, I >> have the following questions: >> >> 1) Is there any way to set the command such that each row change is >> committed as it is calculated? >> 2) Is there some way to run this command in parallel in order to better >> utilize multiple processor cores, other than manually breaking the data into >> chunks and running a separate psql/update process for each chunk? Honestly, >> manual parallelizing wouldn’t be too bad (there are a number of logical >> segregations I can apply), I’m just wondering if there is a more automatic >> option. >> >> Yeah, I'd be inclined to do this in batches.
I think you’re overcomplicating the matter. I’d just do it as a single update in one transaction. It’s only 50M rows. It may take half an hour or so on decent hardware, depending on how resource-intensive your function is. If that fails[1], only then would I start looking into batching things. But then you still need to figure out why it fails and what to do about that; if it fails it will probably fail fast, and if not, then you’re looking at a one-off situation that won’t require more than a few workarounds - after which you can just run the update again. Ad 1). No harm has been done, it’s a single transaction that rolled back. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll find there is no forest.