On Sat, Apr 11, 2020 at 01:56:10PM -0400, Tom Lane wrote:
Tomas Vondra <tomas.von...@2ndquadrant.com> writes:
I don't think "commit is atomic" really implies "data should be released
at commit". This is precisely what makes the feature extremely hard to
implement, IMHO.

Why wouldn't it be acceptable to do something like this?

     BEGIN;
     ...
     DROP TABLE x ERASE;
     ...
     COMMIT;  <-- Don't do data erasure, just add "x" to queue.

     -- wait for another process to complete the erasure
     SELECT pg_wait_for_erasure();

That means we're not running any custom commands / code during commit,
which should (hopefully) make it easier to handle errors.

Yeah, adding actions-that-could-fail to commit is a very hard sell,
so something like this API would probably have a better chance.

However ... the whole concept of erasure being a committable action
seems basically misguided from here.  Consider this scenario:

        begin;

        create table full_o_secrets (...);

        ... manipulate secret data in full_o_secrets ...

        drop table full_o_secrets erase;

        ... do something that unintentionally fails, causing xact abort ...

        commit;

Now what?  Your secret data is all over the disk and you have *no*
recourse to get rid of it; that's true even at a very low level,
because we unlinked the file when rolling back the transaction.
If the error occurred before getting to "drop table full_o_secrets
erase" then there isn't even any way in principle for the server
to know that you might not be happy about leaving that data lying
around.

And I haven't even spoken of copies that may exist in WAL, or
have been propagated to standby servers by now.

I have no idea what an actual solution that accounted for those
problems would look like.  But as presented, this is a toy feature
offering no real security gain, if you ask me.


Yeah, unfortunately the feature as proposed has these weaknesses.

This is why I proposed that a solution based on encryption and throwing
away a key might be more reliable - if you don't have a key, who cares
if the encrypted data file (or parts of it) is still on disk?

It has issues too, though - a query might need a temporary file to do a
sort, hash join spills to disk, or something like that. And those won't
be encrypted without some executor changes (e.g. we might propagate
"needs erasure" to temp files, and do erasure when necessary).

I doubt a perfect solution would be so complex it's not feasible in
practice, especially in v1. So maybe the best thing we can do is
documenting those limitations, but I'm not sure where to draw the line
between acceptable and unacceptable limitations.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Reply via email to