On 30.03.23 20:35, Stephen Frost wrote:
I do feel that column encryption is a useful capability and there's
large parts of this approach that I agree with, but I dislike the idea
of having our clients be able to depend on what gets returned for
non-encrypted columns while not being able to trust what encrypted
column results are and then trying to say it's 'transparent'.  To that
end, it seems like just saying they get back a bytea and making it clear
that they have to provide the validation would be clear, while keeping
much of the rest.

[Note that the word "transparent" has been removed from the feature name. I just didn't change the email thread name.]

These thoughts are reasonable, but I think there is a tradeoff to be made between having featureful data validation and enhanced security. If you want your database system to validate your data, you have to send it in plain text. If you want to have your database system not see the plain text, then it cannot validate it. But there is still utility in it.

You can't really depend on what gets returned even in the non-encrypted case, unless you cryptographically sign the schema against modification or something like that. So realistically, a client that cares strongly about the data it receives has to do some kind of client-side validation anyway.

Note also that higher-level client libraries like JDBC effectively do client-side data validation, for example when you call ResultSet.getInt() etc.

This is also one of the reasons why the user facing type declaration exists. You could just make all encrypted columns of type "opaque" or something and not make any promises about what's inside. But client APIs sort or rely on the application being able to ask the result set for what's inside a column value. If we just say, we don't know, then applications (or driver APIs) will have to be changed to accommodate that, but the intention was to not require that. So instead we say, it's supposed to be int, and then if it's sometimes actually not int, then your application throws an exception you can deal with. This is arguably a better developer experience, even if it concerns the data type purist.

But do you have a different idea about how it should work?

Expanding out from that I'd imagine, pie-in-the-sky
and in some far off land, having our data type in/out validation
functions moved to the common library and then adding client-side
validation of the data going in/out of the encrypted columns would allow
application developers to be able to trust what we're returning (as long
as they're using libpq- and we'd have to document that independent
implementations of the protocol have to provide this or just continue to
return bytea's).

As mentioned, some client libraries effectively already do that. But even if we could make this much more comprehensive, I don't see how this can ever actually satisfy your point. It would require that all participating clients apply validation all the time, and all other clients can rely on that happening, which is impossible.


Reply via email to