Hannu Krosing <ha...@2ndquadrant.com> wrote: > On Tue, 2010-04-27 at 17:28 +0200, Csaba Nagy wrote: >> One use case we would have is to dump only the changes from the >> last backup of a single table. This table takes 30% of the DB >> disk space, it is in the order of ~400GB, and it's only inserted, >> never updated, then after ~1 year the old entries are archived. >> There's ~10M new entries daily in this table. If the backup would >> be smart enough to only read the changed blocks (in this case >> only for newly inserted records), it would be a fairly big win... That is covered pretty effectively in PITR-style backups with the hard link and rsync approach cited earlier in the thread. Those 1GB table segment files which haven't changed aren't read or written, and only those portions of the other files which have actually changed are sent over the wire (although the entire disk file is written on the receiving end). > The standard trick for this kind of table is having this table > partitioned by insertion date That doesn't always work. In our situation the supreme court sets records retention rules which can be quite complex, but usually key on *final disposition* of a case rather than insertion date; that is, the earliest date on which the data related to a case is *allowed* to be deleted isn't known until weeks or years after insertion. Additionally, it is the elected clerk of court in each county who determines when and if data for that county will be purged once it has reached the minimum retention threshold set by supreme court rules. That's not to say that partitioning couldn't help with some backup strategies; just that it doesn't solve all "insert-only" (with eventual purge) use cases. One of the nicest things about PostgreSQL is the availability of several easy and viable backup strategies, so that you can tailor one to fit your environment. -Kevin
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers