On Mon, Oct 7, 2019 at 6:35 PM Asif Rehman <asifr.reh...@gmail.com> wrote:
> > > On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmh...@gmail.com> wrote: > >> On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.reh...@gmail.com> >> wrote: >> > Sure. Though the backup manifest patch calculates and includes the >> checksum of backup files and is done >> > while the file is being transferred to the frontend-end. The manifest >> file itself is copied at the >> > very end of the backup. In parallel backup, I need the list of >> filenames before file contents are transferred, in >> > order to divide them into multiple workers. For that, the manifest file >> has to be available when START_BACKUP >> > is called. >> > >> > That means, backup manifest should support its creation while excluding >> the checksum during START_BACKUP(). >> > I also need the directory information as well for two reasons: >> > >> > - In plain format, base path has to exist before we can write the file. >> we can extract the base path from the file >> > but doing that for all files does not seem a good idea. >> > - base backup does not include the content of some directories but >> those directories although empty, are still >> > expected in PGDATA. >> > >> > I can make these changes part of parallel backup (which would be on top >> of backup manifest patch) or >> > these changes can be done as part of manifest patch and then parallel >> can use them. >> > >> > Robert what do you suggest? >> >> I think we should probably not use backup manifests here, actually. I >> initially thought that would be a good idea, but after further thought >> it seems like it just complicates the code to no real benefit. > > > Okay. > > >> I >> suggest that the START_BACKUP command just return a result set, like a >> query, with perhaps four columns: file name, file type ('d' for >> directory or 'f' for file), file size, file mtime. pg_basebackup will >> ignore the mtime, but some other tools might find that useful >> information. >> > yes current patch already returns the result set. will add the additional > information. > > >> I wonder if we should also split START_BACKUP (which should enter >> non-exclusive backup mode) from GET_FILE_LIST, in case some other >> client program wants to use one of those but not the other. I think >> that's probably a good idea, but not sure. >> > > Currently pg_basebackup does not enter in exclusive backup mode and other > tools have to > use pg_start_backup() and pg_stop_backup() functions to achieve that. > Since we are breaking > backup into multiple command, I believe it would be a good idea to have > this option. I will include > it in next revision of this patch. > > >> >> I still think that the files should be requested one at a time, not a >> huge long list in a single command. >> > sure, will make the change. > > > I have refactored the functionality into multiple smaller patches in order to make the review process easier. I have divided the code into backend changes and pg_basebackup changes. The backend replication system now supports the following commands: - START_BACKUP - SEND_FILE_LIST - SEND_FILES_CONTENT - STOP_BACKUP The START_BACKUP will not return the list of files, instead SEND_FILE_LIST is used for that. The START_BACKUP now calls pg_start_backup and returns starting WAL position, tablespace header information and content of backup label file. Initially I was using tmp files to store the backup_label content but that turns out to be bad idea, because there can be multiple non-exclusive backups running. The backup label information is needed by stop_backup so pg_basebackup will send it as part of STOP_BACKUP. The SEND_FILE_LIST will return the list of files. It will be returned as resultset having four columns (filename, type, size, mtime). The SEND_FILES_CONTENT can now return the single file or multiple files as required. There is not much change required to support both, so I believe it will be much useable this way if other tools want to utilise it. As per suggestion from Robert, I am currently working on making changes in pg_basebackup to fetch files one by one. However that's not complete and the attach patch is still using the old method of multi-file fetching to test the backend commands. I will send an updated patch which will contain the changes on fetching file one by one. I wanted to share the backend patch to get some feedback in the mean time. Thanks, -- Asif Rehman Highgo Software (Canada/China/Pakistan) URL : www.highgo.ca
0001-Refactor-some-basebackup-code-to-increase-reusabilit.patch
Description: Binary data
0004-parallel-backup-testcase.patch
Description: Binary data
0003-pg_basebackup-changes-for-parallel-backup.patch
Description: Binary data
0002-backend-changes-for-parallel-backup.patch
Description: Binary data