Hi! > 9 апр. 2019 г., в 20:48, Robert Haas <robertmh...@gmail.com> написал(а): > > Thoughts? Thanks for this long and thoughtful post!
At Yandex, we are using incremental backups for some years now. Initially, we used patched pgbarman, then we implemented this functionality in WAL-G. And there are many things to be done yet. We have more than 1Pb of clusters backuped with this technology. Most of the time we use this technology as a part of HA setup in managed PostgreSQL service. So, for us main goals are to operate backups cheaply and restore new node quickly. Here's what I see from our perspective. 1. Yes, this feature is important. 2. This importance comes not from reduced disk storage, magnetic disks and object storages are very cheap. 3. Incremental backups save a lot of network bandwidth. It is non-trivial for the storage system to ingest hundreds of Tb daily. 4. Incremental backups are a redundancy of WAL, intended for parallel application. Incremental backup applied sequentially is not very useful, it will not be much faster than simple WAL replay in many cases. 5. As long as increments duplicate WAL functionality - it is not worth pursuing tradeoffs of storage utilization reduction. We scan WAL during archivation, extract numbers of changed blocks and store changemap for a group of WALs in the archive. 6. This changemaps can be used for the increment of the visibility map (if I recall correctly). But you cannot compare LSNs on a page of visibility map: some operations do not bump them. 7. We use changemaps during backups and during WAL replay - we know blocks that will change far in advance and prefetch them to page cache like pg_prefaulter does. 8. There is similar functionality in RMAN for one well-known database. They used to store 8 sets of change maps. That database also has cool functionality "increment for catchup". 9. We call incremental backup a "delta backup". This wording describes purpose more precisely: it is not "next version of DB", it is "difference between two DB states". But wording choice does not matter much. Here are slides from my talk at PgConf.APAC[0]. I've proposed a talk on this matter to PgCon, but it was not accepted. I will try next year :) > 9 апр. 2019 г., в 20:48, Robert Haas <robertmh...@gmail.com> написал(а): > - This is just a design proposal at this point; there is no code. If > this proposal, or some modified version of it, seems likely to be > acceptable, I and/or my colleagues might try to implement it. I'll be happy to help with code, discussion and patch review. Best regards, Andrey Borodin. [0] https://yadi.sk/i/Y_S1iqNN5WxS6A