On Wed, Jun 15, 2022 at 5:53 PM Peter Geoghegan <p...@bowt.ie> wrote: > I think that it's worth doing the following exercise (humor me): Why > wouldn't it be okay to just encrypt the tuple space and the line > pointer array, leaving both the page header and page special area > unencrypted? What kind of user would find that trade-off to be > unacceptable, and why? What's the nuance of it?
Let's consider a continuum where, on the one end, you encrypt the entire disk. Then, consider a solution where you encrypt each individual file, block by block. Next, let's imagine that we don't encrypt some kinds of files at all, if we think the data in them isn't sensitive enough. CLOG, maybe. Perhaps pg_class, because that'd be useful for debugging, and how sensitive can the names of the database tables be? Then, let's adopt your proposal here and leave some parts of each block unencrypted for debuggability. As a next step, we could take the further step of separately encrypting each tuple, but only the data, leaving the tuple header unencrypted. Then, going further, we could encrypt each individual column value within the tuple separately, rather than encrypting the tuple together. Then, let's additionally decide that we're not going to encrypt all the columns, but just the ones the user says are sensitive. Now I think we've pretty much reached the other end of the continuum, unless someone is going to propose something like encrypting only part of each column, or storing some unencrypted data along with each encrypted column that is somehow dependent on the column contents. I think it is undeniable that every step along that continuum has weakened security in some way. The worst case scenario for an attacker must be that the entire disk is encrypted and they can gain no meaningful information at all without having to break that encryption. As the encryption touches fewer things, it becomes easier and easier to make inferences about the unseen data based on the data that you can see. One can sit around and argue about whether the amount of information that is leaked at any given step is enough for anyone to care, but to some extent that's an opinion question where any position can be defended by someone. I would argue that even leaking the lengths of the files is not great at all. Imagine that the table is scheduled_nuclear_missile_launches. I definitely do not want my adversaries to know even as much as whether that table is zero-length or non-zero-length. In fact I would prefer that they be unable to infer that I have such a table at all. Back in 2019 I constructed a similar example for how access to pg_clog could leak meaningful information: http://postgr.es/m/ca+tgmozhbeymroaccj1ocn03jz2uak18qn4afx4wd7g+j7s...@mail.gmail.com Now, obviously, anyone can debate how realistic such cases are, but they definitely exist. If you can read btpo_prev, btpo_next, btpo_level, and btpo_flags for every page in the btree, you can probably infer some things about the distribution of keys in the table -- especially if you can read all the pages at time T1 and then read them all again later at time T2 (and maybe further times T3..Tn). You can make inference about which parts of the keyspace are receiving new index insertions and which are not. If that's the index on the current_secret_missions.country_code column, well then that sucks. Your adversary may be able to infer where in the world your secret organization is operating and round up all your agents. Now, I do realize that if we're ever going to get TDE in PostgreSQL, we will probably have to make some compromises. Actually concealing file lengths would require a redesign of the entire storage system, and so is probably not practical in the short term. Concealing SLRU contents would require significant changes too, some of which I think are things Thomas wants to do anyway, but we might have to punt that goal for the first version of a TDE feature, too. Surely, that weakens security, but if it gets us to a feature that some people can use before the heat death of the universe, there's a reasonable argument that that's better than nothing. Still, conceding that we may not realistically may be able to conceal all the information in v1 is different from arguing that concealing it isn't desirable, and I think the latter argument is pretty hard to defend. People who want to break into computers have gotten incredibly good at exploiting incredibly subtle bits of information in order to infer the contents of unseen data. https://en.wikipedia.org/wiki/Spectre_(security_vulnerability) is a good example: somebody figured out that the branch prediction hardware could initiate speculative accesses to RAM that the user doesn't actually have permission to execute, and thus a JavaScript program running in your browser can read out the entire contents of RAM by measuring exactly how long mis-predicted code takes to execute. There's got to be at least one chip designer out there somewhere who was involved in the design of that branch prediction system, knew that it didn't perform the permissions checks before accessing RAM, and thought to themselves "that should be ok - what's the worst thing that can happen?". I imagine that (those) chip designer(s) had a really bad day when they found out someone had written a program to use that information leakage to read out the entire contents of RAM ... not even using C, but using JavaScript running inside a browser! That's only an example, but I think it's pretty typical of how these sort of things go. I believe computer security literature is literally riddled with attacks where the exposure of seemingly-innocent information turned out to be a big problem. I don't think the information exposed in the btree special space is very innocent: it's not the keys themselves, but if you have the contents of every btree special space in the btree there are definitely cases where you can take inference from that information. > I also expect only a small benefit. But that isn't a particularly > important factor in my mind. > > Let's suppose that it turns out to be significantly more useful than > we originally expected, for whatever reason. Assuming all that, what > else can be said about it now? Isn't it now *relatively* likely that > including that status bit metadata will be *extremely* valuable, and > not merely somewhat more valuable? This is too hypothetical for me to have an intelligent opinion. -- Robert Haas EDB: http://www.enterprisedb.com