On Thu, Mar 14, 2019 at 9:26 AM Antonin Houska <a...@cybertec.at> wrote: > The point of #ifdef USE_ENCRYPTION is that we rely on OpenSSL, so the code > needs to compile even w/o OpenSSL (of course the encryption won't be enabled > in that case). I'll try to reduce the use of this construct only to the code > blocks that really reference the OpenSSL functions.
That sounds like a good place to start. The other part will be avoiding referencing the OpenSSL functions from too many places. For example, right now we have secure_read() -- which I know some people hate, or at least hate the name of] -- which lets you send data to the client without having to care whether that data is going to be SSL-encrypted on its way over the network. One can argue about whether that particular abstraction is properly-designed, but as Tom recently felt the urge to remind me on another thread, the value of abstraction in general is not in doubt. I think we need one here, so that the SSL-specific bits can be isolated in a relatively small number of files, and then clients can ask to write a file, or write a block to a file, or whatever, and ignore the question of whether that is going to get encrypted somehow behind the scenes. > Since buffile.c needs to handle this kind of problem (and it already does in > the last patch version), I think even other than temporary files could be > handled by this module. The patch below adds functions BufFileOpenTransient(), > BufFileWriteTransient(), etc. that can replace OpenTransientFile, write(), > etc. respectively. Once we implement the encryption, these functions will also > hide handling of the encryption blocks from user. In this (preliminary) patch > As an example, I applied this approach to ReorderBufferSerializeChange() and > the function seems a bit simpler now. (ReorderBufferRestoreChange() would need > more work to adopt this approach.) Yeah, I don't have time to look at this in detail right now, but this kind of thing seems like a promising approach to me. > > The interaction of this capability with certain tricks that PostgreSQL > > plays needs some thought -- and documentation, at least developer > > documentation, maybe user documentation. One area is recovery. > > Writing WAL relies on the fact that if you are in the midst of > > rewriting a block and the power fails, bytes that are the same in both > > the old and new block images will be undisturbed; encryption will make > > that false. How's that going to work? > > Yes, if only a single bit is different, decryption will turn the whole > encryption block (16 bytes) into garbage. So we need to enforce > full_page_writes=on if encryption is enabled. The last version of the patch > does not do that. I believe that we need full_page_writes=on and wal_log_hints=on, but they don't fix the problem that I'm describing. Suppose that the very last page in the write-ahead log contains 3.5kB of good data and 7kB of zeroes. Now, somebody comes along and writes a new write-ahead log record that is 1k in size. In memory, we update the block: it now contains 4.5kB of good data and 6.5kB of zeroes. Now, as we're writing the block to do disk, the power is removed. Consequently, the first 4kB hits the platter and the other 4kB disappears into the void. If the WAL is not encrypted, the state at this point is that the 3.5kB of good data that we had previously is still there, and the first 500 bytes of the new WAL record are also there, and the rest is missing. Recovery will detect that there's a valid-looking record at end of WAL, but when it tries to checksum that record, it will fail, and so that trailing half-record will just get ignored. And that's fine. If the WAL *is* encrypted, the state at this point is that the block is unreadable, because the first 4kB of the block is the first half of the bits that resulted from encrypting 8kB of data that includes the new record, and the second 4kB of the block is the second half of the bits that resulted from encrypting 8kB of data that did not include the new record, and since the new record perturbed every bit on the page with probability ~50%, what that means is you now have garbage. That means that not only did we lose the new record, but we also lost the 3.5kB of good data which the page previously contained. That's NOT ok. Some of the changes associated with those WAL records may have been flushed to disk already, or there may be commits in there that were acknowledged to the client, and we can't just lose them. One idea I have for fixing this problem is to encrypt the individual WAL records rather than the blocks. That leaks some information, of course, but maybe not in a way that really matters. > > Hint bits also rely on this principle. I thought there might be some > > interaction between this work and wal_log_hints for this reason, but I see > > nothing of that sort in the patch. > > I'm not sure if the hint bit is still a problem if we first copy the shared > buffer to backend-local memory and encrypt it there. That's what the patch > does. That has the same problem that I described in the previous paragraph, except for the relation data files rather than the WAL itself. See the manual's description of wal_log_hints: <para> When this parameter is <literal>on</literal>, the <productname>PostgreSQL</productname> server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint, even for non-critical modifications of so-called hint bits. </para> For encryption to work, you need that. A hint bit write might convert the page to garbage, just as in the previous example, and this option make sure that if that happens, there will be an FPW, allowing you to recover... unless of course you also hit the problem I described above, but assuming that's fixed somehow you still have this issue. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company