I quickly tried to have a look at your 0001-refactor patch. Here are some comments:
1. The patch fails to compile. Sorry if I am missing something, but am not able to understand why in new function collectTablespaces() you have added an extra parameter NULL while calling sendTablespace(), it fails the compilation : + ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1; gcc -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument -g -g -O0 -Wall -Werror -I../../../../src/include -c -o xlog.o xlog.c -MMD -MP -MF .deps/xlog.Po xlog.c:12253:59: error: too many arguments to function call, expected 2, have 3 ti->size = infotbssize ? sendTablespace(fullpath, true, NULL) : -1; ~~~~~~~~~~~~~~ ^~~~ 2. I think the patch needs to run via pg_indent. It does not follow 80 column width. e.g. +void +collectTablespaces(List **tablespaces, StringInfo tblspcmapfile, bool infotbssize, bool needtblspcmapfile) +{ 3. The comments in re-factored code appear to be redundant. example: Following comment: /* Setup and activate network throttling, if client requested it */ appears thrice in the code, before calling setup_throttle(), in the prologue of the function setup_throttle(), and above the if() in that function. Similarly - the comment: /* Collect information about all tablespaces */ in collectTablespaces(). 4. In function include_wal_files() why is the parameter TimeLineID i.e. endtli needed. I don't see it being used in the function at all. I think you can safely get rid of it. +include_wal_files(XLogRecPtr endptr, TimeLineID endtli) Regards, Jeevan Ladhe On Wed, Oct 16, 2019 at 6:49 PM Asif Rehman <asifr.reh...@gmail.com> wrote: > > > On Mon, Oct 7, 2019 at 6:35 PM Asif Rehman <asifr.reh...@gmail.com> wrote: > >> >> >> On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmh...@gmail.com> wrote: >> >>> On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.reh...@gmail.com> >>> wrote: >>> > Sure. Though the backup manifest patch calculates and includes the >>> checksum of backup files and is done >>> > while the file is being transferred to the frontend-end. The manifest >>> file itself is copied at the >>> > very end of the backup. In parallel backup, I need the list of >>> filenames before file contents are transferred, in >>> > order to divide them into multiple workers. For that, the manifest >>> file has to be available when START_BACKUP >>> > is called. >>> > >>> > That means, backup manifest should support its creation while >>> excluding the checksum during START_BACKUP(). >>> > I also need the directory information as well for two reasons: >>> > >>> > - In plain format, base path has to exist before we can write the >>> file. we can extract the base path from the file >>> > but doing that for all files does not seem a good idea. >>> > - base backup does not include the content of some directories but >>> those directories although empty, are still >>> > expected in PGDATA. >>> > >>> > I can make these changes part of parallel backup (which would be on >>> top of backup manifest patch) or >>> > these changes can be done as part of manifest patch and then parallel >>> can use them. >>> > >>> > Robert what do you suggest? >>> >>> I think we should probably not use backup manifests here, actually. I >>> initially thought that would be a good idea, but after further thought >>> it seems like it just complicates the code to no real benefit. >> >> >> Okay. >> >> >>> I >>> suggest that the START_BACKUP command just return a result set, like a >>> query, with perhaps four columns: file name, file type ('d' for >>> directory or 'f' for file), file size, file mtime. pg_basebackup will >>> ignore the mtime, but some other tools might find that useful >>> information. >>> >> yes current patch already returns the result set. will add the additional >> information. >> >> >>> I wonder if we should also split START_BACKUP (which should enter >>> non-exclusive backup mode) from GET_FILE_LIST, in case some other >>> client program wants to use one of those but not the other. I think >>> that's probably a good idea, but not sure. >>> >> >> Currently pg_basebackup does not enter in exclusive backup mode and other >> tools have to >> use pg_start_backup() and pg_stop_backup() functions to achieve that. >> Since we are breaking >> backup into multiple command, I believe it would be a good idea to have >> this option. I will include >> it in next revision of this patch. >> >> >>> >>> I still think that the files should be requested one at a time, not a >>> huge long list in a single command. >>> >> sure, will make the change. >> >> >> > > I have refactored the functionality into multiple smaller patches in order > to make the review process easier. I have divided the code into backend > changes and pg_basebackup changes. The > backend replication system now supports the following commands: > > - START_BACKUP > - SEND_FILE_LIST > - SEND_FILES_CONTENT > - STOP_BACKUP > > The START_BACKUP will not return the list of files, instead SEND_FILE_LIST > is used for that. The START_BACKUP > now calls pg_start_backup and returns starting WAL position, tablespace > header information and content of backup label file. > Initially I was using tmp files to store the backup_label content but that > turns out to be bad idea, because there can be multiple > non-exclusive backups running. The backup label information is needed by > stop_backup so pg_basebackup will send it as part > of STOP_BACKUP. > > The SEND_FILE_LIST will return the list of files. It will be returned as > resultset having four columns (filename, type, size, mtime). > The SEND_FILES_CONTENT can now return the single file or multiple files as > required. There is not much change required to > support both, so I believe it will be much useable this way if other tools > want to utilise it. > > As per suggestion from Robert, I am currently working on making changes in > pg_basebackup to fetch files one by one. However that's not complete and > the attach patch > is still using the old method of multi-file fetching to test the backend > commands. I will send an updated patch which will contain the changes on > fetching file one by one. > > I wanted to share the backend patch to get some feedback in the mean time. > > Thanks, > > -- > Asif Rehman > Highgo Software (Canada/China/Pakistan) > URL : www.highgo.ca > >