Re: [GENERAL] Efficient Way to Merge Two Large Tables

Joshua Rubin Fri, 16 Jul 2010 12:06:45 -0700

Hi Julian,

Sorry for the slow response. I think I will need to chop up the query some
how, but have not yet found an efficient way to do that. row_id is the
primary key in both tables, so that might work.


Here is the explain:
urls_jrubin_merged=# EXPLAIN UPDATE table1 SET row_id = table2.row_id FROM
table2 WHERE table1.row_id = table2.row_id;
                                      QUERY PLAN

--------------------------------------------------------------------------------------
 Merge Join  (cost=57257969.62..12983795937.97 rows=4308749788074 width=121)
   Merge Cond: (table2.row_id = table1.row_id)
   ->  Sort  (cost=15885110.79..16029412.85 rows=288604128 width=8)
         Sort Key: table2.row_id
         ->  Seq Scan on table2  (cost=0.00..2137231.26 rows=288604128
width=8)
   ->  Materialize  (cost=41372858.83..42105903.14 rows=293217725 width=121)
         ->  Sort  (cost=41372858.83..41519467.69 rows=293217725 width=121)
               Sort Key: table1.row_id
               ->  Seq Scan on todo  (cost=0.00..5922587.45 rows=293217725
width=121)
(9 rows)


Thanks,
--
Joshua Rubin


On Tue, Jul 13, 2010 at 5:08 PM, Julian Mehnle <jul...@mehnle.net> wrote:

> Joshua Rubin wrote:
>
> > I have two tables each with nearly 300M rows. There is a 1:1
> > relationship between the two tables and they are almost always joined
> > together in queries. The first table has many columns, the second has
> > a foreign key to the primary key of the first table and one more
> > column. It is expected that for every row in table1, there is a
> > corresponding row in table2. We would like to just add the one column
> > to the first table and drop the second table to allow us to index this
> > extra column.
> >
> > This query would work after adding the column to the first table:
> > UPDATE table1 SET new_column = table2.new_column FROM table2 WHERE
> > table1.row_id = table2.row_id;
> >
> > However, this will take much too long, I have not successfully
> > completed this on our staging server after running it for 3+ days.
>
> Can you get the query plan (EXPLAIN) of the update query?  My guess is the
> join cost scales superlinearly.
>
> You might be able to chop this up into smaller UPDATEs by limiting the
> rows to be updated in each round by the primary key.
>
> E.g.:
>
>  UPDATE table1 SET new_column = table2.new_column FROM table2
>    WHERE
>       table1.row_id = table2.row_id and
>      table1.row_id >= 0e6 and table1.row_id < 1e6 and
>      table2.row_id >= 0e6 and table2.row_id < 1e6;
>
> for a moving row_id window.
>
> This has helped me in the past with a similar scenario (where both tables
> were partitioned by the PK, but it would presumably still work in the
> unpartitioned case).
>
> -Julian
>

Re: [GENERAL] Efficient Way to Merge Two Large Tables

Reply via email to