On Tue, Jul 30, 2019 at 1:58 AM Robert Haas <robertmh...@gmail.com> wrote:
> > I haven't had a chance to look at Jeevan's patch at all, or yours in > any detail, as yet, so these are just some very preliminary comments. > It will be good, however, if we can agree on who is going to do what > part of this as we try to drive this forward together. I'm sorry that > I didn't communicate EDB's plans to work on this more clearly; > duplicated effort serves nobody well. > I had a look over Anastasia's PoC patch to understand the approach she has taken and here are my observations. 1. The patch first creates a .blockmap file for each relation file containing an array of all modified block numbers. This is done by reading all blocks (in a chunk of 4 (32kb in total) in a loop) from a file and checking the page LSN with given LSN. Later, to create .partial file, a relation file is opened again and all blocks are read in a chunk of 4 in a loop. If found modified, it is copied into another memory and after scanning all 4 blocks, all copied blocks are sent to the .partial file. In this approach, each file is opened and read twice which looks more expensive to me. Whereas in my patch, I do that just once. However, I read the entire file in memory to check which blocks are modified but in Anastasia's design max TAR_SEND_SIZE (32kb) will be read at a time but, in a loop. I need to do that as we wanted to know how heavily the file got modified so that we can send the entire file if it was modified beyond the threshold (currently 90%). 2. Also, while sending modified blocks, they are copied in another buffer, instead they can be just sent from the read files contents (in BLCKSZ block size). Here, the .blockmap created earlier was not used. In my implementation, we are sending just a .partial file with a header containing all required details like the number of blocks changes along with the block numbers including CRC followed by the blocks itself. 3. I tried compiling Anastasia's patch, but getting an error. So could not see or test how it goes. Also, like a normal backup option, the incremental backup option needs to verify the checksum if requested. 4. While combining full and incremental backup, files from the incremental backup are just copied into the full backup directory. While the design I posted earlier, we are trying another way round to avoid over-writing and other issues as I explained earlier. I am almost done writing the patch for pg_combinebackup and will post soon. > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > Thanks -- Jeevan Chalke Technical Architect, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company