On Wed, Apr 10, 2019 at 10:57 AM Jehan-Guillaume de Rorthais <j...@dalibo.com> wrote: > My idea would be create a new tool working on archived WAL. No burden > server side. Basic concept is: > > * parse archives > * record latest relevant FPW for the incr backup > * write new WALs with recorded FPW and removing/rewriting duplicated > walrecords. > > It's just a PoC and I hadn't finished the WAL writing part...not even talking > about the replay part. I'm not even sure this project is a good idea, but it > is > a good educational exercice to me in the meantime. > > Anyway, using real life OLTP production archives, my stats were: > > # WAL xlogrec kept Size WAL kept > 127 39% 50% > 383 22% 38% > 639 20% 29% > > Based on this stats, I expect this would save a lot of time during recovery in > a first step. If it get mature, it might even save a lot of archives space or > extend the retention period with degraded granularity. It would even help > taking full backups with a lower frequency. > > Any thoughts about this design would be much appreciated. I suppose this > should > be offlist or in a new thread to avoid polluting this thread as this is a > slightly different subject.
Interesting idea, but I don't see how it can work if you only deal with the FPWs and not the other records. For instance, suppose that you take a full backup at time T0, and then at time T1 there are two modifications to a certain block in quick succession. That block is then never touched again. Since no checkpoint intervenes between the modifications, the first one emits an FPI and the second does not. Capturing the FPI is fine as far as it goes, but unless you also do something with the non-FPI change, you lose that second modification. You could fix that by having your tool replicate the effects of WAL apply outside the server, but that sounds like a ton of work and a ton of possible bugs. I have a related idea, though. Suppose that, as Peter says upthread, you have a replication slot that prevents old WAL from being removed. You also have a background worker that is connected to that slot. It decodes WAL and produces summary files containing all block-references extracted from those WAL records and the associated LSN (or maybe some approximation of the LSN instead of the exact value, to allow for compression and combining of nearby references). Then you hold onto those summary files after the actual WAL is removed. Now, when somebody asks the server for all blocks changed since a certain LSN, it can use those summary files to figure out which blocks to send without having to read all the pages in the database. Although I believe that a simple system that finds modified blocks by reading them all is good enough for a first version of this feature and useful in its own right, a more efficient system will be a lot more useful, and something like this seems to me to be probably the best way to implement it. The reason why I think this is likely to be superior to other possible approaches, such as the ptrack approach Konstantin suggests elsewhere on this thread, is because it pushes the work of figuring out which blocks have been modified into the background. With a ptrack-type approach, the server has to do some non-zero amount of extra work in the foreground every time it modifies a block. With an approach based on WAL-scanning, the work is done in the background and nobody has to wait for it. It's possible that there are other considerations which aren't occurring to me right now, though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company