On Wed, Apr 6, 2022 at 7:18 AM Filip Janus <fja...@redhat.com> wrote: > A few months ago a group of researchers published a paper about LZ77 > vulnerability[1]. And it also affects PGLZ. From my point of view, it could > be a really dangerous issue for some kind of application. If I understand it > correctly there is a possibility of leaking approx. 24B secret data per > hour(but it depends on HW configuration). > > I understand that there is no simple and easy solution. But I would like to > know Your opinion on this. Or if you have any plan on how to deal with this?
I hadn't heard of this before. It seems to be a real vulnerability in PGLZ. Fortunately, the attack relies on the presence of conditions that may not always be present, and the rate of data leakage is pretty slow. Some threats of this kind are going to need to be addressed outside the database, perhaps. For example, you could rate-limit attempts to access your web application to make it harder to accumulate enough accesses to get any meaningful data leakage, and you could store highly secret data in a different place than you store data that the user has the ability to modify. It sounds like even just putting those things in separate jsonb columns rather than the same one would block this particular attack. A user could also choose to disable compression for a certain column entirely if they're worried about this kind of thing. However, there are new attacks all the time, and it's going to be really hard to block them all. Variable latency is extremely difficult to avoid, because pretty much every piece of code anyone writes is going to have if statements and loops that can iterate for different numbers of iterations on different input, and then there are CPU effects like caching and branch prediction that add to the problem. There are tons of attacks like this, and even if we could somehow, by magic, secure PostgreSQL against this one completely, there will be lots more in the future. I think it's inevitable that there are going to be more and more papers demonstrating that a determined attacker can leak information out of system A by very carefully measuring the latency of operation X under different conditions, and there is no real solution to that problem in general. One thing that we could do internally to PostgreSQL is add more possible TOAST compression algorithms. In addition to PGLZ, which the attack in the paper targets, we now have LZ4 as an option. That's probably vulnerable too, and probably zstd is as well, but if a state of the art algorithm emerges that somehow isn't vulnerable, we can consider adding support for it. I don't think that as a project we really ought to be in the business of trying to design our own compression algorithms. PGLZ is a great job for something that was written by a PostgreSQL hacker, and many years ago at that, but not surprisingly, people who spend all day thinking about compression are really, really good at it. We should leave it up to them to figure out whether there's something to be done here, and if the answer is yes, then we can consider adopting whatever they come up with. Personally, I don't quite see how such a thing would be possible, but I'm not a compression expert. One last thought: I don't think it's right to suppose that every security vulnerability is the result of some design flaw and every security vulnerability must be patched. Imagine, for example, that someone posted a paper showing that they could break into your house. Your reaction to that paper would probably depend on how they did it. If it turns out that the lock you have on your front door will unlock if you give it a hard bump with your fist, you'd probably want to replace the lock with one that didn't have that design flaw. But if the paper showed that they could break into your house by breaking one of the windows with a crowbar, would you replace all of those windows with solid steel? Most people understand that a window is likely to be made of a more breakable substance than whatever surrounds it, because it has an additional design constraint: it has to permit light to pass through it. We accept that as a trade-off when we choose to live in a house rather than a bunker. In the same way, without denying that there's a real vulnerability here, I don't think that anyone who understands a little bit about how compression and decompression work would expect decompression to take the same amount of time on every input. Every compression algorithm pretty much has a mode where incompressible data is copied through byte for byte, and other modes that take advantage of repeated byte sequences. It's only reasonable to suppose that those various code paths are not all going to run at the same speed, and nobody would want them to. It would mean trying to slow down the fast paths through the code to the same speed as the slow paths, and because decompression speed is so important, that sounds like a thing that most people would not want. Do you have any suggestions on what we should do here? -- Robert Haas EDB: http://www.enterprisedb.com