Hi,

You can solve this problem using Citus in PostgreSQL, which is specifically
designed for parallelism

SELECT create_distributed_table('tab_part1', 'partition_key');
SELECT create_distributed_table('reference_tab', 'reference_key');

UPDATE tab_part1
SET column1 = reftab.code
FROM reference_tab reftab
WHERE tab_part1.column1 = reftab.column1;


On Sat, 14 Sept 2024 at 08:22, yudhi s <learnerdatabas...@gmail.com> wrote:

> Hello,
> We have to update a column value(from numbers like '123' to codes like
> 'abc' by looking into a reference table data) in a partitioned table with
> billions of rows in it, with each partition having 100's millions rows. As
> we tested for ~30million rows it's taking ~20minutes to update. So if we go
> by this calculation, it's going to take days for updating all the values.
> So my question is
>
> 1) If there is any inbuilt way of running the update query in parallel
> (e.g. using parallel hints etc) to make it run faster?
> 2) should we run each individual partition in a separate session (e.g.
> five partitions will have the updates done at same time from 5 different
> sessions)? And will it have any locking effect or we can just start the
> sessions and let them run without impacting our live transactions?
>
> UPDATE tab_part1
> SET column1 = reftab.code
> FROM reference_tab reftab
> WHERE tab_part1.column1 = subquery.column1;
>
> Regards
> Yudhi
>

Reply via email to