On 2005-02-25, "Dave Held" <[EMAIL PROTECTED]> wrote:
> A possibility that I would like to pursue is to keep the staging data
> from the previous day, do a COPY TO, import the new data into
> another staging table with a COPY FROM, then export the fresh
> data with another COPY TO.  Then, I can write a fast C/C++
> program to do a line-by-line comparison of each record, isolating
> the ones that have changed from the previous day.  I can then
> emit those records in a change file that should be relatively small
> and easy to update.

I have an application that does something like this, but rather than use an
external program, I do the comparison in the database itself:

  - import data from external system into a temporary table
  - compare the temporary table against the live data (a full outer join
    is a convenient way of doing this - I create an index on the temp table
    first)
  - perform insert/update/delete for each record that was added, changed
    or removed

In my case the compare/update is in a pl/pgsql function. My data is only
2-3 million rows, a bit smaller than yours, but I have to update hourly,
not daily, and spend no more than 5-10 minutes on each update (currently
I can do it in 5: 2 to load the data, 3 to do the compare/update).

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services

---------------------------(end of broadcast)---------------------------
TIP 9: the planner will ignore your desire to choose an index scan if your
      joining column's datatypes do not match

Reply via email to