Adding pipe support to pg_dump and pg_restore
Hello, I recently wanted a way to encrypt/decrypt backups while still utilizing the parallel dump/restore functionality. I couldn't see a way to do this so I experimented a bit with the directory backup format. If there's in fact already a way to do this, please tell me now :-) The idea is to add a --pipe option to pg_dump / pg_restore where you can specify a custom shell command that is used to write / read each .dat-file. Usage examples include encryption with pgp and/or custom compression pipelines. %p in the command is expanded to the path to write to / read from. The pipe command is not applied to the toc. The current version is attached. Could something like this be acceptable for inclusion? From 27f6c541be6546edfef62646f514fe1a92042705 Mon Sep 17 00:00:00 2001 From: David Hedberg Date: Sat, 29 Sep 2018 12:55:52 +0200 Subject: [PATCH] Add support for --pipe to pg_dump and pg_restore --- src/bin/pg_dump/compress_io.c | 97 --- src/bin/pg_dump/compress_io.h | 6 +- src/bin/pg_dump/pg_backup.h | 2 + src/bin/pg_dump/pg_backup_directory.c | 14 ++-- src/bin/pg_dump/pg_dump.c | 17 - src/bin/pg_dump/pg_restore.c | 7 ++ 6 files changed, 121 insertions(+), 22 deletions(-) diff --git a/src/bin/pg_dump/compress_io.c b/src/bin/pg_dump/compress_io.c index a96da15dc1..64c06d7eae 100644 --- a/src/bin/pg_dump/compress_io.c +++ b/src/bin/pg_dump/compress_io.c @@ -443,6 +443,9 @@ struct cfp static int hasSuffix(const char *filename, const char *suffix); #endif +static void +expand_shell_command(char *buf, size_t bufsize, const char *cmd, const char *filepath); + /* free() without changing errno; useful in several places below */ static void free_keep_errno(void *p) @@ -464,24 +467,26 @@ free_keep_errno(void *p) * On failure, return NULL with an error code in errno. */ cfp * -cfopen_read(const char *path, const char *mode) +cfopen_read(const char *path, const char *mode, const char *pipecmd) { cfp *fp; + if (pipecmd) + fp = cfopen(path, mode, 0, pipecmd); #ifdef HAVE_LIBZ - if (hasSuffix(path, ".gz")) - fp = cfopen(path, mode, 1); + else if (hasSuffix(path, ".gz")) + fp = cfopen(path, mode, 1, NULL); else #endif { - fp = cfopen(path, mode, 0); + fp = cfopen(path, mode, 0, NULL); #ifdef HAVE_LIBZ if (fp == NULL) { char *fname; fname = psprintf("%s.gz", path); - fp = cfopen(fname, mode, 1); + fp = cfopen(fname, mode, 1, NULL); free_keep_errno(fname); } #endif @@ -501,19 +506,19 @@ cfopen_read(const char *path, const char *mode) * On failure, return NULL with an error code in errno. */ cfp * -cfopen_write(const char *path, const char *mode, int compression) +cfopen_write(const char *path, const char *mode, int compression, const char *pipecmd) { cfp *fp; if (compression == 0) - fp = cfopen(path, mode, 0); + fp = cfopen(path, mode, 0, pipecmd); else { #ifdef HAVE_LIBZ char *fname; fname = psprintf("%s.gz", path); - fp = cfopen(fname, mode, compression); + fp = cfopen(fname, mode, compression, pipecmd); free_keep_errno(fname); #else exit_horribly(modulename, "not built with zlib support\n"); @@ -530,11 +535,32 @@ cfopen_write(const char *path, const char *mode, int compression) * On failure, return NULL with an error code in errno. */ cfp * -cfopen(const char *path, const char *mode, int compression) +cfopen(const char *path, const char *mode, int compression, const char *pipecmd) { cfp *fp = pg_malloc(sizeof(cfp)); - if (compression != 0) + if (pipecmd) + { + char cmd[MAXPGPATH]; + char pmode[2]; + + if ( !(mode[0] == 'r' || mode[0] == 'w') ) { + exit_horribly(modulename, "Pipe does not support mode %s", mode); + } + pmode[0] = mode[0]; + pmode[1] = '\0'; + + expand_shell_command(cmd, MAXPGPATH, pipecmd, path); + + fp->compressedfp = NULL; + fp->uncompressedfp = popen(cmd, pmode); + if (fp->uncompressedfp == NULL) + { + free_keep_errno(fp); + fp->uncompressedfp = NULL; + } + } + else if (compression != 0) { #ifdef HAVE_LIBZ if (compression != Z_DEFAULT_COMPRESSION) @@ -731,5 +757,54 @@ hasSuffix(const char *filename, const char *suffix) suffix, suffixlen) == 0; } - #endif + +/* + * Expand a shell command + * + * Replaces %p in cmd with the path in filepath and writes the result to buf. + */ +static void +expand_shell_command(char *buf, size_t bufsize, const char *cmd, const char *filepath) +{ + char *dp; + char *endp; + const char *sp; + + dp = buf; + endp = buf + bufsize - 1; + *endp = '\0'; + + for (sp = cmd; *sp; sp++) + { + if (*sp == '%') + { + switch (sp[1]) + { +case 'p': + /* %p: absolute path of file */ + sp++; + strlcpy(dp, filepath, endp - dp); + dp += strlen(dp); + break; +case '%&
Re: Adding pipe support to pg_dump and pg_restore
On Sat, Sep 29, 2018 at 5:56 PM, David Fetter wrote: > On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote: >> Stephen Frost writes: >> > * David Hedberg (david.hedb...@gmail.com) wrote: >> >> The idea is to add a --pipe option to pg_dump / pg_restore where >> >> you can specify a custom shell command that is used to write / >> >> read each .dat-file. Usage examples include encryption with pgp >> >> and/or custom compression pipelines. %p in the command is >> >> expanded to the path to write to / read from. The pipe command is >> >> not applied to the toc. >> >> > I would certainly think that we'd want to have support for custom >> > format dumps too.. >> >> This seems like rather a kluge :-(. In the context of encrypted >> dumps in particular, I see no really safe way to pass an encryption >> key down to the custom command --- either you put it in the command >> line to be exec'd, or you put it in the process environment, and >> neither of those are secure on all platforms. > > As I understand it, those are the options for providing secrets in > general. At least in the case of encryption, one good solution would > be to use an asymmetric encryption scheme, i.e. one where encrypting > doesn't expose a secret in any way. > > As to decryption, that's generally done with more caution in > environments where things are being routinely encrypted in the first > place. > Yes; in my specific case the idea is to use public key encryption with gpg. In that scenario the secret does not need to be on the server at all. >> The assumption that the TOC doesn't need encryption seems pretty >> shaky as well. > > That it does. > I don't think there's any inherent reason it can't be applied to the TOC as well. It's mostly an accident of me following the the existing compression code. On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost wrote: > At least for my 2c, I'm not completely against it, but I'd much rather > see us providing encryption directly and for all of the formats we > support, doing intelligent things like encrypting the TOC for a custom > format dump independently so we can still support fast restore of > individual objects and such. I'm also not entirely sure about how well > this proposed approach would work on Windows.. I haven't tested it in windows, but I did see that there's already a popen function in src/port/system.c so my guess was going to be that it can work.. Generally, my thinking is that this can be pretty useful in general besides encryption. For other formats the dumps can already be written to standard output and piped through for example gpg or a custom compression application of the administrators choice, so in a sense this functionality would merely add the same feature to the directory format. My main wish here is to be able combine a parallel dump/restore with encryption without having to first write the dump encrypted and then loop over and rewrite the files encrypted in an extra step. This can surely be quite a large win as the size of the dumps grow larger.. / David
Re: Adding pipe support to pg_dump and pg_restore
Hi, On Sat, Sep 29, 2018 at 7:01 PM, Stephen Frost wrote: > Greetings, > > * David Hedberg (david.hedb...@gmail.com) wrote: >> On Sat, Sep 29, 2018 at 5:56 PM, David Fetter wrote: >> > On Sat, Sep 29, 2018 at 11:42:40AM -0400, Tom Lane wrote: >> > As I understand it, those are the options for providing secrets in >> > general. At least in the case of encryption, one good solution would >> > be to use an asymmetric encryption scheme, i.e. one where encrypting >> > doesn't expose a secret in any way. >> > >> > As to decryption, that's generally done with more caution in >> > environments where things are being routinely encrypted in the first >> > place. >> >> Yes; in my specific case the idea is to use public key encryption with >> gpg. In that scenario the secret does not need to be on the server at >> all. > > Using public key encryption doesn't mean you get to entirely avoid the > question around how to handle secrets- you'll presumably want to > actually restore the dump at some point. > You are right of course. But I don't see how it's more difficult to pass the secret to the piped commands than it is to pass it to postgres. You wouldn't want to pass the secrets as options to the commands of course. In the case of gpg you would probably let gpg store and handle them, which seems to me about the same as letting postgres store them. >> On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost wrote: >> Generally, my thinking is that this can be pretty useful in general >> besides encryption. For other formats the dumps can already be written >> to standard output and piped through for example gpg or a custom >> compression application of the administrators choice, so in a sense >> this functionality would merely add the same feature to the directory >> format. > > That's certainly not the same though. One of the great advantages of > custom and directory format dumps is the TOC and the ability to > selectively extract data from them without having to read the entire > dump file. You end up losing that if you have to pass the entire dump > through something else because you're using the pipe. > I can maybe see the problem here, but I apologize if I'm missing the point. Since all the files are individually passed through separate instances of the pipe, they can also be individually restored. I guess the --list option could be (adopted to be) used to produce a clear text TOC to further use in selective decryption of the rest of the archive? Possibly combined with an option to not apply the pipeline commands to the TOC during dump and/or restore, if there's any need for that. I do think that I understand the advantages of having a TOC that describes the exact format of the dump and how to restore it, and I am in no way arguing against having encryption included natively in the format as a default option. But I think the pipe option, or one like it, could be used to easily extend the format. Easily supporting a different compression algorithm, a different encryption method or even a different storage method like uploading the files directly to a bucket in S3. In this way I think that it's similar to be able to write the other formats to stdout; there are probably many different usages of it out there, including custom compression or encryption. If this is simply outside the scope of the directory or the custom format, that is certainly understandable (and, to me, somewhat regrettable :-) ). Thank you the answers, David
Re: Adding pipe support to pg_dump and pg_restore
Hi, On Sat, Sep 29, 2018 at 8:03 PM, Stephen Frost wrote: > Greetings, > > * David Hedberg (david.hedb...@gmail.com) wrote: >> On Sat, Sep 29, 2018 at 7:01 PM, Stephen Frost wrote: >> > * David Hedberg (david.hedb...@gmail.com) wrote: >> >> On Sat, Sep 29, 2018 at 5:03 PM, Stephen Frost wrote: >> >> Generally, my thinking is that this can be pretty useful in general >> >> besides encryption. For other formats the dumps can already be written >> >> to standard output and piped through for example gpg or a custom >> >> compression application of the administrators choice, so in a sense >> >> this functionality would merely add the same feature to the directory >> >> format. >> > >> > That's certainly not the same though. One of the great advantages of >> > custom and directory format dumps is the TOC and the ability to >> > selectively extract data from them without having to read the entire >> > dump file. You end up losing that if you have to pass the entire dump >> > through something else because you're using the pipe. >> >> I can maybe see the problem here, but I apologize if I'm missing the point. >> >> Since all the files are individually passed through separate instances >> of the pipe, they can also be individually restored. I guess the >> --list option could be (adopted to be) used to produce a clear text >> TOC to further use in selective decryption of the rest of the archive? > I admit that my understanding of the custom format was naive (I have never actually used it). >> If this is simply outside the scope of the directory or the custom >> format, that is certainly understandable (and, to me, somewhat >> regrettable :-) ). > > What I think isn't getting through is that while this is an interesting > approach, it really isn't a terribly good one, regardless of how > flexible you view it to be. The way to move this forward seems pretty > clearly to work on adding generalized encryption support to > pg_dump/restore that doesn't depend on calling external programs > underneath of the directory format with a pipe. I did get the message that it wasn't the optimal way of doing it, and I have now also gotten the message that it's probably not really wanted at all. Thanks you for your insights, David