On Thu, Jul 11, 2019 at 5:00 PM Jeevan Chalke < jeevan.cha...@enterprisedb.com> wrote:
> Hi Anastasia, > > On Wed, Jul 10, 2019 at 11:47 PM Anastasia Lubennikova < > a.lubennik...@postgrespro.ru> wrote: > >> 23.04.2019 14:08, Anastasia Lubennikova wrote: >> > I'm volunteering to write a draft patch or, more likely, set of >> > patches, which >> > will allow us to discuss the subject in more detail. >> > And to do that I wish we agree on the API and data format (at least >> > broadly). >> > Looking forward to hearing your thoughts. >> >> Though the previous discussion stalled, >> I still hope that we could agree on basic points such as a map file >> format and protocol extension, >> which is necessary to start implementing the feature. >> > > It's great that you too come up with the PoC patch. I didn't look at your > changes in much details but we at EnterpriseDB too working on this feature > and started implementing it. > > Attached series of patches I had so far... (which needed further > optimization and adjustments though) > > Here is the overall design (as proposed by Robert) we are trying to > implement: > > 1. Extend the BASE_BACKUP command that can be used with replication > connections. Add a new [ LSN 'lsn' ] option. > > 2. Extend pg_basebackup with a new --lsn=LSN option that causes it to send > the option added to the server in #1. > > Here are the implementation details when we have a valid LSN > > sendFile() in basebackup.c is the function which mostly does the thing for > us. If the filename looks like a relation file, then we'll need to consider > sending only a partial file. The way to do that is probably: > > A. Read the whole file into memory. > > B. Check the LSN of each block. Build a bitmap indicating which blocks > have an LSN greater than or equal to the threshold LSN. > > C. If more than 90% of the bits in the bitmap are set, send the whole file > just as if this were a full backup. This 90% is a constant now; we might > make it a GUC later. > > D. Otherwise, send a file with .partial added to the name. The .partial > file contains an indication of which blocks were changed at the beginning, > followed by the data blocks. It also includes a checksum/CRC. > Currently, a .partial file format looks like: > - start with a 4-byte magic number > - then store a 4-byte CRC covering the header > - then a 4-byte count of the number of blocks included in the file > - then the block numbers, each as a 4-byte quantity > - then the data blocks > > > We are also working on combining these incremental back-ups with the full > backup and for that, we are planning to add a new utility called > pg_combinebackup. Will post the details on that later once we have on the > same page for taking backup. > For combining a full backup with one or more incremental backup, we are adding a new utility called pg_combinebackup in src/bin. Here is the overall design as proposed by Robert. pg_combinebackup starts from the LAST backup specified and work backward. It must NOT start with the full backup and work forward. This is important both for reasons of efficiency and of correctness. For example, if you start by copying over the full backup and then later apply the incremental backups on top of it then you'll copy data and later end up overwriting it or removing it. Any files that are leftover at the end that aren't in the final incremental backup even as .partial files need to be removed, or the result is wrong. We should aim for a system where every block in the output directory is written exactly once and nothing ever has to be created and then removed. To make that work, we should start by examining the final incremental backup. We should proceed with one file at a time. For each file: 1. If the complete file is present in the incremental backup, then just copy it to the output directory - and move on to the next file. 2. Otherwise, we have a .partial file. Work backward through the backup chain until we find a complete version of the file. That might happen when we get \back to the full backup at the start of the chain, but it might also happen sooner - at which point we do not need to and should not look at earlier backups for that file. During this phase, we should read only the HEADER of each .partial file, building a map of which blocks we're ultimately going to need to read from each backup. We can also compute the offset within each file where that block is stored at this stage, again using the header information. 3. Now, we can write the output file - reading each block in turn from the correct backup and writing it to the write output file, using the map we constructed in the previous step. We should probably keep all of the input files open over steps 2 and 3 and then close them at the end because repeatedly closing and opening them is going to be expensive. When that's done, go on to the next file and start over at step 1. We are already started working on this design. -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation