On Sat, May 9, 2020 at 2:23 AM Robert Haas <robertmh...@gmail.com> wrote: > > Hi, > > I'd like to propose a fairly major refactoring of the server's > basebackup.c. The current code isn't horrific or anything, but the > base backup mechanism has grown quite a few features over the years > and all of the code knows about all of the features. This is going to > make it progressively more difficult to add additional features, and I > have a few in mind that I'd like to add, as discussed below and also > on several other recent threads.[1][2] The attached patch set shows > what I have in mind. It needs more work, but I believe that there's > enough here for someone to review the overall direction, and even some > of the specifics, and hopefully give me some useful feedback. > > This patch set is built around the idea of creating two new > abstractions, a base backup sink -- or bbsink -- and a base backup > archiver -- or bbarchiver. Each of these works like a foreign data > wrapper or custom scan or TupleTableSlot. That is, there's a table of > function pointers that act like method callbacks. Every implementation > can allocate a struct of sufficient size for its own bookkeeping data, > and the first member of the struct is always the same, and basically > holds the data that all implementations must store, including a > pointer to the table of function pointers. If we were using C++, > bbarchiver and bbsink would be abstract base classes. > > They represent closely-related concepts, so much so that I initially > thought we could get by with just one new abstraction layer. I found > on experimentation that this did not work well, so I split it up into > two and that worked a lot better. The distinction is this: a bbsink is > something to which you can send a bunch of archives -- currently, each > would be a tarfile -- and also a backup manifest. A bbarchiver is > something to which you send every file in the data directory > individually, or at least the ones that are getting backed up, plus > any that are being injected into the backup (e.g. the backup_label). > Commonly, a bbsink will do something with the data and then forward it > to a subsequent bbsink, or a bbarchiver will do something with the > data and then forward it to a subsequent bbarchiver or bbsink. For > example, there's a bbarchiver_tar object which, like any bbarchiver, > sees all the files and their contents as input. The output is a > tarfile, which gets send to a bbsink. As things stand in the patch set > now, the tar archives are ultimately sent to the "libpq" bbsink, which > sends them to the client. > > In the future, we could have other bbarchivers. For example, we could > add "pax", "zip", or "cpio" bbarchiver which produces archives of that > format, and any given backup could choose which one to use. Or, we > could have a bbarchiver that runs each individual file through a > compression algorithm and then forwards the resulting data to a > subsequent bbarchiver. That would make it easy to produce a tarfile of > individually compressed files, which is one possible way of creating a > seekable achive.[3] Likewise, we could have other bbsinks. For > example, we could have a "localdisk" bbsink that cause the server to > write the backup somewhere in the local filesystem instead of > streaming it out over libpq. Or, we could have an "s3" bbsink that > writes the archives to S3. We could also have bbsinks that compresses > the input archives using some compressor (e.g. lz4, zstd, bzip2, ...) > and forward the resulting compressed archives to the next bbsink in > the chain. I'm not trying to pass judgement on whether any of these > particular things are things we want to do, nor am I saying that this > patch set solves all the problems with doing them. However, I believe > it will make such things a whole lot easier to implement, because all > of the knowledge about whatever new functionality is being added is > centralized in one place, rather than being spread across the entirety > of basebackup.c. As an example of this, look at how 0010 changes > basebackup.c and basebackup_tar.c: afterwards, basebackup.c no longer > knows anything that is tar-specific, whereas right now it knows about > tar-specific things in many places. > > Here's an overview of this patch set: > > 0001-0003 are cleanup patches that I have posted for review on > separate threads.[4][5] They are included here to make it easy to > apply this whole series if someone wishes to do so. > > 0004 is a minor refactoring that reduces by 1 the number of functions > in basebackup.c that know about the specifics of tarfiles. It is just > a preparatory patch and probably not very interesting. > > 0005 invents the bbsink abstraction. > > 0006 creates basebackup_libpq.c and moves all code that knows about > the details of sending archives via libpq there. The functionality is > exposed for use by basebackup.c as a new type of bbsink, bbsink_libpq. > > 0007 creates basebackup_throttle.c and moves all code that knows about > throttling backups there. The functionality is exposed for use by > basebackup.c as a new type of bbsink, bbsink_throttle. This means that > the throttling logic could be reused to throttle output to any final > destination. Essentially, this is a bbsink that just passes everything > it gets through to the next bbsink, but with a rate limit. If > throttling's not enabled, no bbsink_throttle object is created, so all > of the throttling code is completely out of the execution pipeline. > > 0008 creates basebackup_progress.c and moves all code that knows about > progress reporting there. The functionality is exposed for use by > basebackup.c as a new type of bbsink, bbsink_progress. Since the > abstraction doesn't fit perfectly in this case, some extra functions > are added to work around the problem. This is not entirely elegant, > but I don't think it's still an improvement over what we have now, and > I don't have a better idea. > > 0009 invents the bbarchiver abstraction. > > 0010 invents two new bbarchivers, a tar bbarchiver and a tarsize > bbarchiver, and refactors basebackup.c to make use of them. The tar > bbarchiver puts the files it sees into tar archives and forwards the > resulting archives to a bbsink. The tarsize bbarchiver is used to > support the PROGRESS option to the BASE_BACKUP command. It just > estimates the size of the backup by summing up the file sizes without > reading them. This approach is good for a couple of reasons. First, > without something like this, it's impossible to keep basebackup.c from > knowing something about the tar format, because the PROGRESS option > doesn't just figure out how big the files to be backed up are: it > figures out how big it thinks the archives will be, and that involves > tar-specific considerations. This area needs more work, as the whole > idea of measuring progress by estimating the archive size is going to > break down as soon as server-side compression is in the picture. > Second, this makes the code path that we use for figuring out the > backup size details much more similar to the path we use for > performing the actual backup. For instance, with this patch, we > include the exact same files in the calculation that we will include > in the backup, and in the same order, something that's not true today. > The basebackup_tar.c file added by this patch is sadly lacking in > comments, which I will add in a future version of the patch set. I > think, though, that it will not be too unclear what's going on here. > > 0011 invents another new kind of bbarchiver. This bbarchiver just > eavesdrops on the stream of files to facilitate backup manifest > construction, and then forwards everything through to a subsequent > bbarchiver. Like bbsink_throttle, it can be entirely omitted if not > used. This patch is a bit clunky at the moment and needs some polish, > but it is another demonstration of how these abstractions can be used > to simplify basebackup.c, so that basebackup.c only has to worry about > determining what should be backed up and not have to worry much about > all the specific things that need to be done as part of that. > > Although this patch set adds quite a bit of code on net, it makes > basebackup.c considerably smaller and simpler, removing more than 400 > lines of code from that file, about 20% of the current total. There > are some gratifying changes vs. the status quo. For example, in > master, we have this: > > sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, > bool sendtblspclinks, backup_manifest_info *manifest, > const char *spcoid) > > Notably, the sizeonly flag makes the function not do what the name of > the function suggests that it does. Also, we've got to pass some extra > fields through to enable specific features. With the patch set, the > equivalent function looks like this: > > archive_directory(bbarchiver *archiver, const char *path, int basepathlen, > List *tablespaces, bool sendtblspclinks) > > The question "what should I do with the directories and files we find > as we recurse?" is now answered by the choice of which bbarchiver to > pass to the function, rather than by the values of sizeonly, manifest, > and spcoid. That's not night and day, but I think it's better, > especially as you imagine adding more features in the future. The > really important part, for me, is that you can make the bbarchiver do > anything you like without needing to make any more changes to this > function. It just arranges to invoke your callbacks. You take it from > there. > > One pretty major question that this patch set doesn't address is what > the user interface for any of the hypothetical features mentioned > above ought to look like, or how basebackup.c ought to support them. > The syntax for the BASE_BACKUP command, like the contents of > basebackup.c, has grown organically, and doesn't seem to be very > scalable. Also, the wire protocol - a series of CopyData results which > the client is entirely responsible for knowing how to interpret and > about which the server provides only minimal information - doesn't > much lend itself to extensibility. Some careful design work is likely > needed in both areas, and this patch does not try to do any of it. I > am quite interested in discussing those questions, but I felt that > they weren't the most important problems to solve first. > > What do you all think?
The overall idea looks quite nice. I had a look at some of the patches at least 0005 and 0006. At first look, I have one comment. +/* + * Each archive is set as a separate stream of COPY data, and thus begins + * with a CopyOutResponse message. + */ +static void +bbsink_libpq_begin_archive(bbsink *sink, const char *archive_name) +{ + SendCopyOutResponse(); +} Some of the bbsink_libpq_* functions are directly calling pq_* e.g. bbsink_libpq_begin_backup whereas others are calling SendCopy* functions and therein those are calling pq_* functions. I think bbsink_libpq_* function can directly call pq_* functions instead of adding one more level of the function call. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com