Hi, You can solve this problem using Citus in PostgreSQL, which is specifically designed for parallelism
SELECT create_distributed_table('tab_part1', 'partition_key'); SELECT create_distributed_table('reference_tab', 'reference_key'); UPDATE tab_part1 SET column1 = reftab.code FROM reference_tab reftab WHERE tab_part1.column1 = reftab.column1; On Sat, 14 Sept 2024 at 08:22, yudhi s <learnerdatabas...@gmail.com> wrote: > Hello, > We have to update a column value(from numbers like '123' to codes like > 'abc' by looking into a reference table data) in a partitioned table with > billions of rows in it, with each partition having 100's millions rows. As > we tested for ~30million rows it's taking ~20minutes to update. So if we go > by this calculation, it's going to take days for updating all the values. > So my question is > > 1) If there is any inbuilt way of running the update query in parallel > (e.g. using parallel hints etc) to make it run faster? > 2) should we run each individual partition in a separate session (e.g. > five partitions will have the updates done at same time from 5 different > sessions)? And will it have any locking effect or we can just start the > sessions and let them run without impacting our live transactions? > > UPDATE tab_part1 > SET column1 = reftab.code > FROM reference_tab reftab > WHERE tab_part1.column1 = subquery.column1; > > Regards > Yudhi >