On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.reh...@gmail.com> wrote: > - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA > If the parallel option is there, then it will only do pg_start_backup, scans > PGDATA and sends a list of file names.
So IIUC, this would mean that BASE_BACKUP without PARALLEL returns tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a list of file names. I don't think that's a good approach. It's too confusing to have one replication command that returns totally different things depending on whether some option is given. > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given list. > pg_basebackup will then send back a list of filenames in this command. This > commands will be send by each worker and that worker will be getting the said > files. Seems reasonable, but I think you should just pass one file name and use the command multiple times, once per file. > - STOP_BACKUP > when all workers finish then, pg_basebackup will send STOP_BACKUP command. This also seems reasonable, but surely the matching command should then be called START_BACKUP, not BASEBACKUP PARALLEL. > I have done a basic proof of concenpt (POC), which is also attached. I would > appreciate some input on this. So far, I am simply dividing the list equally > and assigning them to worker processes. I intend to fine tune this by taking > into consideration file sizes. Further to add tar format support, I am > considering that each worker process, processes all files belonging to a > tablespace in its list (i.e. creates and copies tar file), before it > processes the next tablespace. As a result, this will create tar files that > are disjointed with respect tablespace data. For example: Instead of doing this, I suggest that you should just maintain a list of all the files that need to be fetched and have each worker pull a file from the head of the list and fetch it when it finishes receiving the previous file. That way, if some connections go faster or slower than others, the distribution of work ends up fairly even. If you instead pre-distribute the work, you're guessing what's going to happen in the future instead of just waiting to see what actually does happen. Guessing isn't intrinsically bad, but guessing when you could be sure of doing the right thing *is* bad. If you want to be really fancy, you could start by sorting the files in descending order of size, so that big files are fetched before small ones. Since the largest possible file is 1GB and any database where this feature is important is probably hundreds or thousands of GB, this may not be very important. I suggest not worrying about it for v1. > Say, tablespace t1 has 20 files and we have 5 worker processes and tablespace > t2 has 10. Ignoring all other factors for the sake of this example, each > worker process will get a group of 4 files of t1 and 2 files of t2. Each > process will create 2 tar files, one for t1 containing 4 files and another > for t2 containing 2 files. This is one of several possible approaches. If we're doing a plain-format backup in parallel, we can just write each file where it needs to go and call it good. But, with a tar-format backup, what should we do? I can see three options: 1. Error! Tar format parallel backups are not supported. 2. Write multiple tar files. The user might reasonably expect that they're going to end up with the same files at the end of the backup regardless of whether they do it in parallel. A user with this expectation will be disappointed. 3. Write one tar file. In this design, the workers have to take turns writing to the tar file, so you need some synchronization around that. Perhaps you'd have N threads that read and buffer a file, and N+1 buffers. Then you have one additional thread that reads the complete files from the buffers and writes them to the tar file. There's obviously some possibility that the writer won't be able to keep up and writing the backup will therefore be slower than it would be with approach (2). There's probably also a possibility that approach (2) would thrash the disk head back and forth between multiple files that are all being written at the same time, and approach (3) will therefore win by not thrashing the disk head. But, since spinning media are becoming less and less popular and are likely to have multiple disk heads under the hood when they are used, this is probably not too likely. I think your choice to go with approach (2) is probably reasonable, but I'm not sure whether everyone will agree. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company