Hi Asif, I was looking at the patch and tried comipling it. However, got few errors and warnings.
Fixed those in the attached patch. On Fri, Sep 27, 2019 at 9:30 PM Asif Rehman <asifr.reh...@gmail.com> wrote: > Hi Robert, > > Thanks for the feedback. Please see the comments below: > > On Tue, Sep 24, 2019 at 10:53 PM Robert Haas <robertmh...@gmail.com> > wrote: > >> On Wed, Aug 21, 2019 at 9:53 AM Asif Rehman <asifr.reh...@gmail.com> >> wrote: >> > - BASE_BACKUP [PARALLEL] - returns a list of files in PGDATA >> > If the parallel option is there, then it will only do pg_start_backup, >> scans PGDATA and sends a list of file names. >> >> So IIUC, this would mean that BASE_BACKUP without PARALLEL returns >> tarfiles, and BASE_BACKUP with PARALLEL returns a result set with a >> list of file names. I don't think that's a good approach. It's too >> confusing to have one replication command that returns totally >> different things depending on whether some option is given. >> > > Sure. I will add a separate command (START_BACKUP) for parallel. > > >> > - SEND_FILES_CONTENTS (file1, file2,...) - returns the files in given >> list. >> > pg_basebackup will then send back a list of filenames in this command. >> This commands will be send by each worker and that worker will be getting >> the said files. >> >> Seems reasonable, but I think you should just pass one file name and >> use the command multiple times, once per file. >> > > I considered this approach initially, however, I adopted the current > strategy to avoid multiple round trips between the server and clients and > save on query processing time by issuing a single command rather than > multiple ones. Further fetching multiple files at once will also aid in > supporting the tar format by utilising the existing ReceiveTarFile() > function and will be able to create a tarball for per tablespace per worker. > > >> >> > - STOP_BACKUP >> > when all workers finish then, pg_basebackup will send STOP_BACKUP >> command. >> >> This also seems reasonable, but surely the matching command should >> then be called START_BACKUP, not BASEBACKUP PARALLEL. >> >> > I have done a basic proof of concenpt (POC), which is also attached. I >> would appreciate some input on this. So far, I am simply dividing the list >> equally and assigning them to worker processes. I intend to fine tune this >> by taking into consideration file sizes. Further to add tar format support, >> I am considering that each worker process, processes all files belonging to >> a tablespace in its list (i.e. creates and copies tar file), before it >> processes the next tablespace. As a result, this will create tar files that >> are disjointed with respect tablespace data. For example: >> >> Instead of doing this, I suggest that you should just maintain a list >> of all the files that need to be fetched and have each worker pull a >> file from the head of the list and fetch it when it finishes receiving >> the previous file. That way, if some connections go faster or slower >> than others, the distribution of work ends up fairly even. If you >> instead pre-distribute the work, you're guessing what's going to >> happen in the future instead of just waiting to see what actually does >> happen. Guessing isn't intrinsically bad, but guessing when you could >> be sure of doing the right thing *is* bad. >> >> If you want to be really fancy, you could start by sorting the files >> in descending order of size, so that big files are fetched before >> small ones. Since the largest possible file is 1GB and any database >> where this feature is important is probably hundreds or thousands of >> GB, this may not be very important. I suggest not worrying about it >> for v1. >> > > Ideally, I would like to support the tar format as well, which would be > much easier to implement when fetching multiple files at once since that > would enable using the existent functionality to be used without much > change. > > Your idea of sorting the files in descending order of size seems very > appealing. I think we can do this and have the file divided among the > workers one by one i.e. the first file in the list goes to worker 1, the > second to process 2, and so on and so forth. > > >> >> > Say, tablespace t1 has 20 files and we have 5 worker processes and >> tablespace t2 has 10. Ignoring all other factors for the sake of this >> example, each worker process will get a group of 4 files of t1 and 2 files >> of t2. Each process will create 2 tar files, one for t1 containing 4 files >> and another for t2 containing 2 files. >> >> This is one of several possible approaches. If we're doing a >> plain-format backup in parallel, we can just write each file where it >> needs to go and call it good. But, with a tar-format backup, what >> should we do? I can see three options: >> >> 1. Error! Tar format parallel backups are not supported. >> >> 2. Write multiple tar files. The user might reasonably expect that >> they're going to end up with the same files at the end of the backup >> regardless of whether they do it in parallel. A user with this >> expectation will be disappointed. >> >> 3. Write one tar file. In this design, the workers have to take turns >> writing to the tar file, so you need some synchronization around that. >> Perhaps you'd have N threads that read and buffer a file, and N+1 >> buffers. Then you have one additional thread that reads the complete >> files from the buffers and writes them to the tar file. There's >> obviously some possibility that the writer won't be able to keep up >> and writing the backup will therefore be slower than it would be with >> approach (2). >> >> There's probably also a possibility that approach (2) would thrash the >> disk head back and forth between multiple files that are all being >> written at the same time, and approach (3) will therefore win by not >> thrashing the disk head. But, since spinning media are becoming less >> and less popular and are likely to have multiple disk heads under the >> hood when they are used, this is probably not too likely. >> >> I think your choice to go with approach (2) is probably reasonable, >> but I'm not sure whether everyone will agree. >> > > Yes for the tar format support, approach (2) is what I had in > mind. Currently I'm working on the implementation and will share the patch > in a couple of days. > > > -- > Asif Rehman > Highgo Software (Canada/China/Pakistan) > URL : www.highgo.ca > -- Jeevan Chalke Associate Database Architect & Team Lead, Product Development EnterpriseDB Corporation The Enterprise PostgreSQL Company
commit 0d7433e44123b486b48f2071b24f1eaef46f4849 Author: Jeevan Chalke <jeevan.cha...@enterprisedb.com> Date: Thu Oct 3 12:58:55 2019 +0530 Fix errors and warnings. diff --git a/src/backend/replication/basebackup.c b/src/backend/replication/basebackup.c index ef55bd0..32ed160 100644 --- a/src/backend/replication/basebackup.c +++ b/src/backend/replication/basebackup.c @@ -346,6 +346,7 @@ perform_base_backup(basebackup_options *opt) { /* save backup label into temp file for now. So stop backup can send it to pg_basebackup later on. */ FILE *fp = AllocateFile(TMP_BACKUP_LABEL_FILE, "w"); + if (!fp) ereport(ERROR, (errcode_for_file_access(), @@ -1627,8 +1628,8 @@ StopBackup(basebackup_options *opt) XLogRecPtr endptr; char *labelfile; struct stat statbuf; - int r; StringInfoData buf; + FILE *lfp; /* Disable throttling. */ throttling_counter = -1; @@ -1648,7 +1649,7 @@ StopBackup(basebackup_options *opt) sendFile(TMP_BACKUP_LABEL_FILE, BACKUP_LABEL_FILE, &statbuf, false, InvalidOid); /* read backup_label file into buffer, we need it for do_pg_stop_backup */ - FILE *lfp = AllocateFile(TMP_BACKUP_LABEL_FILE, "r"); + lfp = AllocateFile(TMP_BACKUP_LABEL_FILE, "r"); if (!lfp) { ereport(ERROR, @@ -1658,7 +1659,7 @@ StopBackup(basebackup_options *opt) } labelfile = palloc(statbuf.st_size + 1); - r = fread(labelfile, statbuf.st_size, 1, lfp); + fread(labelfile, statbuf.st_size, 1, lfp); labelfile[statbuf.st_size] = '\0'; @@ -1692,6 +1693,7 @@ SendBackupFileList(List *tablespaces) { StringInfoData buf; ListCell *lc; + pathinfo *pi; List *files = NIL; foreach(lc, tablespaces) @@ -1704,7 +1706,6 @@ SendBackupFileList(List *tablespaces) } // add backup label file - pathinfo *pi; MAKE_PATHINFO(false, TMP_BACKUP_LABEL_FILE); files = lcons(pi, files); @@ -1736,15 +1737,15 @@ SendBackupFileList(List *tablespaces) { pathinfo *pi = (pathinfo *) lfirst(lc); char *path = pi->path; + Size len; /* Send one datarow message */ pq_beginmessage(&buf, 'D'); pq_sendint16(&buf, 2); /* number of columns */ - int32 isdir = pi->isdir ? 1 : 0; - send_int8_string(&buf, isdir); + send_int8_string(&buf, (pi->isdir ? 1 : 0)); - Size len = strlen(path); + len = strlen(path); pq_sendint32(&buf, len); pq_sendbytes(&buf, path, len); diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c index 1637735..a316cc6 100644 --- a/src/bin/pg_basebackup/pg_basebackup.c +++ b/src/bin/pg_basebackup/pg_basebackup.c @@ -1864,6 +1864,7 @@ BaseBackup(void) { SimpleStringList files = {NULL, NULL}; SimpleStringList **worker_files; + int num_files; /* * Get the header @@ -1881,7 +1882,7 @@ BaseBackup(void) exit(1); } - int num_files = 0; + num_files = 0; for (i = 0; i < PQntuples(res); i++) { bool isdir = atoi(PQgetvalue(res, i, 0)); @@ -2537,6 +2538,7 @@ getFiles(SimpleStringList *files) SimpleStringListCell *cell; PGresult *res = NULL; int i = 0; + PGconn *worker_conn; PQExpBuffer buf; buf = createPQExpBuffer(); @@ -2561,7 +2563,7 @@ getFiles(SimpleStringList *files) } appendPQExpBufferStr(buf, " )"); - PGconn *worker_conn = GetConnection(); + worker_conn = GetConnection(); if (!worker_conn) return 1; @@ -2623,9 +2625,10 @@ divideFilesList(SimpleStringList *files, int numFiles) static void create_workers_and_fetch(SimpleStringList **worker_files) { + int i; + worker_process = (int*) palloc(sizeof(int) * numWorkers); - int status; - int pid, i; + for (i = 0; i < numWorkers; i++) { worker_process[i] = fork();