On Fri, Oct 04, 2019 at 03:57:32PM -0400, Bruce Momjian wrote:
On Fri, Oct  4, 2019 at 09:18:58AM -0400, Robert Haas wrote:
I think everyone would agree that if you have no information about a
database other than the contents of pg_clog, that's not a meaningful
information leak. You would be able to tell which transactions
committed and which transactions aborted, but since you know nothing
about the data inside those transactions, it's of no use to you.
However, in that situation, you probably wouldn't be attacking the
database in the first place. Most likely you have some knowledge about
what it contains. Maybe there's a stream of sensor data that flows
into the database, and you can see that stream.  By watching pg_clog,
you can see when a particular bit of data is rejected. That could be
valuable.

It is certainly true that seeing activity in _any_ cluster file could
leak information.  However, even if we encrypted all the cluster files,
bad actors could still get information by analyzing the file sizes and
size changes of relation files, and the speed of WAL creation, and even
monitor WAL for write activity (WAL file byte changes).  I would think
that would leak more information than clog.


Yes, those information leaks seem unavoidable.
I am not sure how you could secure against that information leak.  While
file system encryption might do that at the storage layer, it doesn't do
anything at the mounted file system layer.


That's because FDE is only meant to protect against passive attacker,
essentially stealing the device. It's useless when someone gains access
to a mounted disk, so these information leaks are irrelevant.

(I'm only talking about encryption at the block device level. I'm not
sure about details e.g. for the encryption built into ext4, etc.)

The current approach is to encrypt anything that contains user data,
which includes heap, index, and WAL files.  I think replication slots
and logical replication might also fall into that category, which is why
I started this thread.


Yes, I think those bits have to be encrypted too.

BTW I'm not sure why you list replication slots and logical replication
independently, those are mostly the same thing I think. For physical
slots we probably don't need to encrypt anything, but for logical slots
we may spill decoded data to files (so those will contain user data).

I can see some saying that all cluster files should be encrypted, and I
can respect that argument.  However, as outlined in the diagram linked
to from the blog entry:

        https://momjian.us/main/blogs/pgblog/2019.html#September_27_2019

I feel that TDE, since it has limited value, and can't really avoid all
information leakage, should strive to find the intersection of ease of
implementation, security, and compliance.  If people don't think that
limited file encryption is secure, I get it.  However, encrypting most
or all files I think would lead us into such a "difficult to implement"
scope that I would not longer be able to work on this feature.  I think
the code complexity, fragility, potential unreliability, and even
overhead of trying to encrypt most/all files would lead TDE to be
greatly delayed or never implemented.  I just couldn't recommend it.
Now, I might be totally wrong, and encryption of everything might be
just fine, but I have to pick my projects, and such an undertaking seems
far too risky for me.


I agree some trade-offs will be needed, to make the implementation at
all possible (irrespectedly of the exact design). But I think those
trade-offs need to be conscious, based on some technical arguments why
it's OK to consider a particular information leak acceptable, etc. For
example it may be fine when assuming the attacker only gets a single
static copy of the data directory, but not when having the ability to
observe changes made by a running instance.

In a way, my concern is somehat the opposite of yours - that we'll end
up with a feature (which necessarily adds complexity) that however does
not provide sufficient security for various use cases.

And I don't know where exactly the middle ground is, TBH.

Just for some detail, we have solved the block-level encryption problem
by using CTR mode in most cases, but there is still a requirement for a
nonce for every encryption operation.  You can use derived keys too, but
you need to set up those keys for every write to encrypt files.  Maybe
it is possible to set up a write API that handles this transparently in
the code, but I don't know how to do that cleanly, and I doubt if the
value of encrypting everything is worth it.

As far as encrypting the log file, I can see us adding documentation to
warn about that, and even issue a server log message if encryption is
enabled and syslog is not being used.  (I don't know how to test if
syslog is being shipped to a remote server.)


Not sure. I wonder if it's possible to setup syslog so that it encrypts
the data on storage, and if that would be a suitable solution e.g. for
PCI DSS purposes. (It seems at least rsyslogd supports that.)


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to