> On 6 Jan 2020, at 21:15, Israel Brewster <ijbrews...@alaska.edu> wrote:
> 
>> On Jan 6, 2020, at 10:08 AM, Christopher Browne <cbbro...@gmail.com> wrote:
>> 
>> On Mon, 6 Jan 2020 at 13:36, Israel Brewster <ijbrews...@alaska.edu> wrote:
>> Thanks to a change in historical data, I have a need to update a large 
>> number of records (around 50 million). The update itself is straight 
>> forward, as I can just issue an "UPDATE table_name SET 
>> changed_field=new_value();" (yes, new_value is the result of a stored 
>> procedure, if that makes a difference) command via psql, and it should work. 
>> However, due to the large number of records this command will obviously take 
>> a while, and if anything goes wrong during the update (one bad value in row 
>> 45 million, lost connection, etc), all the work that has been done already 
>> will be lost due to the transactional nature of such commands (unless I am 
>> missing something).
>> 
>> Given that each row update is completely independent of any other row, I 
>> have the following questions:
>> 
>> 1) Is there any way to set the command such that each row change is 
>> committed as it is calculated?
>> 2) Is there some way to run this command in parallel in order to better 
>> utilize multiple processor cores, other than manually breaking the data into 
>> chunks and running a separate psql/update process for each chunk? Honestly, 
>> manual parallelizing wouldn’t be too bad (there are a number of logical 
>> segregations I can apply), I’m just wondering if there is a more automatic 
>> option.
>> 
>> Yeah, I'd be inclined to do this in batches.

I think you’re overcomplicating the matter.

I’d just do it as a single update in one transaction. It’s only 50M rows. It 
may take half an hour or so on decent hardware, depending on how 
resource-intensive your function is.

If that fails[1], only then would I start looking into batching things. But 
then you still need to figure out why it fails and what to do about that; if it 
fails it will probably fail fast, and if not, then you’re looking at a one-off 
situation that won’t require more than a few workarounds - after which you can 
just run the update again.

Ad 1). No harm has been done, it’s a single transaction that rolled back.

Alban Hertroys
--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.



Reply via email to