Hi Julian, Sorry for the slow response. I think I will need to chop up the query some how, but have not yet found an efficient way to do that. row_id is the primary key in both tables, so that might work.
Here is the explain: urls_jrubin_merged=# EXPLAIN UPDATE table1 SET row_id = table2.row_id FROM table2 WHERE table1.row_id = table2.row_id; QUERY PLAN -------------------------------------------------------------------------------------- Merge Join (cost=57257969.62..12983795937.97 rows=4308749788074 width=121) Merge Cond: (table2.row_id = table1.row_id) -> Sort (cost=15885110.79..16029412.85 rows=288604128 width=8) Sort Key: table2.row_id -> Seq Scan on table2 (cost=0.00..2137231.26 rows=288604128 width=8) -> Materialize (cost=41372858.83..42105903.14 rows=293217725 width=121) -> Sort (cost=41372858.83..41519467.69 rows=293217725 width=121) Sort Key: table1.row_id -> Seq Scan on todo (cost=0.00..5922587.45 rows=293217725 width=121) (9 rows) Thanks, -- Joshua Rubin On Tue, Jul 13, 2010 at 5:08 PM, Julian Mehnle <jul...@mehnle.net> wrote: > Joshua Rubin wrote: > > > I have two tables each with nearly 300M rows. There is a 1:1 > > relationship between the two tables and they are almost always joined > > together in queries. The first table has many columns, the second has > > a foreign key to the primary key of the first table and one more > > column. It is expected that for every row in table1, there is a > > corresponding row in table2. We would like to just add the one column > > to the first table and drop the second table to allow us to index this > > extra column. > > > > This query would work after adding the column to the first table: > > UPDATE table1 SET new_column = table2.new_column FROM table2 WHERE > > table1.row_id = table2.row_id; > > > > However, this will take much too long, I have not successfully > > completed this on our staging server after running it for 3+ days. > > Can you get the query plan (EXPLAIN) of the update query? My guess is the > join cost scales superlinearly. > > You might be able to chop this up into smaller UPDATEs by limiting the > rows to be updated in each round by the primary key. > > E.g.: > > UPDATE table1 SET new_column = table2.new_column FROM table2 > WHERE > table1.row_id = table2.row_id and > table1.row_id >= 0e6 and table1.row_id < 1e6 and > table2.row_id >= 0e6 and table2.row_id < 1e6; > > for a moving row_id window. > > This has helped me in the past with a similar scenario (where both tables > were partitioned by the PK, but it would presumably still work in the > unpartitioned case). > > -Julian >