Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

Bryn Llewellyn Thu, 29 Sep 2022 20:25:23 -0700

> t...@sss.pgh.pa.us wrote:
> 
>> b...@yugabyte.com writes:
>> 
>>> rjuju...@gmail.com wrote:
>>> 
>>> I'm not convinced... that the authorization system can prevent an untrusted 
>>> user with a direct SQL access from actually hurting you.
>> 
>> What do you mean by "untrusted"? Any person who is given the credentials to 
>> start a database session is trusted—even a person who can connect as a 
>> superuser and do untold harm. So focus on a person who has the credentials 
>> to connect as "client" in my example. But imagine a design that exposes 
>> functionality to "client" sessions exclusively through a carefully designed 
>> and implemented API that's expressed exclusively with user-defined functions 
>> and procedures.
> 
> Sure. That is called an application server. What we are is a SQL server, and 
> that means that the API is SQL commands, and the authorization model is what 
> the SQL spec says it is. [So]... any session authenticated as user X has the 
> same privileges as any other session authenticated as user X, so there is not 
> a lot of point in user X mounting defenses against user X. So I think the 
> concerns you're expressing here would be better addressed at the next level 
> up the stack.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Julien replied to the same message from me that Tom did thus:

> rjuju...@gmail.com wrote 
> 
> You mean like if the application takes care of checking that the logged-in 
> user is allowed to insert data based on whatever application defined rules / 
> user profile, while the SQL role can simply insert data and/or call the 
> carefully written functions? Yes the data will be consistent, but if your 
> role just transferred money from an account to another that's not really the 
> problem.
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Peter Holzer replied to an earlier message fro be in this branching thread thus:

> hjp-pg...@hjp.at wrote:
> 
> I think you did [misunderstand Tom]. What he was saying was that a database 
> design which uses only one single role which both owns all the objects and 
> executes all code on behalf of the user can not be secure. At the very least 
> you need two roles: One which owns the objects and one which can only use the 
> objects in a way allowed by the business logic and is not allowed to change 
> that logic (This is the case you demonstrated.)

Yes, indeed it is. That was my aim.

> In many cases this should be even more fine-grained, and at the extreme end 
> every user could actually have several roles, each with only the minimal 
> privileges required.
> 
> (I have often observed that we tend to throw away and build permission 
> systems at every layer of the stack: The OS has a system of users and 
> permissions. But the database server runs as a single user (postgres) which 
> has access to all the data files. So it has to implement its own system of 
> roles and permissions. Then an application developer comes along and writes 
> an app which uses a single database role which has access to all the data.

It often is done like this. But it doesn't have to be done this way. This is 
where a user-defined within-RDBMS functions and procedures, and so-called "run 
authority" in the wider discourse, come to the rescue.

> So it again has to implement its own system of users and permissions...)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 

*SUMMARY*

I'd better simply attempt to stop further discussion in this thread by saying 
that no mutual understanding has been reached. I'm sad about this outcome. But 
it would seem to be unfair to invite anybody to spend any more time on these 
exchanges.

*MORE... ONLY IF YOU CAN BEAR TO READ IT*

First, lest a really awful misunderstanding lingers on, I must stress that, 
about this from Tom:

> any session authenticated as user X has the same privileges as any other 
> session authenticated as user X

well, yes. That statement of outcome is a tautology. I don't think that I wrote 
anything to say that I thought otherwise. About the notion that Ms. Jones, 
authorized as X can kill Ms. Smith's session authorized as X, and vice versa... 
this seems now to be just a red herring. Either the designer wants this or they 
 don't. And if they don't, they can prevent it by a simple revocation of one 
object privilege. It's been argued that some other evil doing can't be 
prevented. Well, fair enough. But this doesn't seem to justify not preventing 
any bad thing that *can* be prevented. 

There was an era, now long gone, when an outfit had its own production RDBMS 
(or a small few of these) in a locked and guarded room in the very same 
building within which all human end users of such a system worked. In those 
days, each human user was mapped one-to-one to its own within-RDBMS role. And 
the application under the end-users fingertips connected directly to the RDBMS 
by authenticating as that person-specific RDBMS role. (There'd also be some 
RDBMS roles for human admins and developers.) Then, no two end-user sessions 
would authorize as the same role, and native RDBMS security notions were relied 
upon to protect in the way that's come up for discussion here.

We're no longer in that era. Client-server is dead. Long-live multi-tier. I 
have a few bank accounts. And I also do Internet shopping. I went through the 
usual steps to set myself up to use each of these. And I have no choice but to 
put my faith in their authorization and security schemes—notwithstanding 
periodic troubling revelations of security loopholes. I've also discussed such 
systems with end-user organizations as part of the duties of my former job. Of 
course, these end-user organizations were self-selected for me because they all 
used an RDBMS as the database of record at the bottom of it all.

None of these end-user organizations mapped human end users each to their own 
RDBMS role. We all know this. And we all know the reasons. The concept of 
connection pooling, where all connections in the pool authorize as the same 
RDBMS role is just one reason among many. I've called this connection-pool-role 
"client"—and I'll continue to do so here.

This pre-amble sets the stage for the discussion about the overall security 
proposition when two sessions authorize as the same database role "client" (by 
all means, not even concurrently) but on behalf of different human end-users. 
I've not had the chance yet to have even a single face-to-face discussion with 
someone who knows what approaches are used to bring the required isolation 
between human end-users when the database of record is PostgreSQL. And I'd very 
much like to read some nicely-written essays about this topic. Can anybody 
recommend any such pieces for me to study?

Meanwhile, I do understand the general principles. Software tiers that stand 
above the RDBMS tier in the overall stack take care of authorizing a human 
being so that the SQL call that the application code tier passes through the 
connection pooling tier on behalf of the human user also passes a unique ID for 
the human. The SQL execution therefore knows whom it's acting for—and can look 
up salient facts about this principal in table(s) that have been designed and 
populated for that purpose. The details are unimportant here. We have no end of 
existence proofs that such schemes are in daily use and are judged to be 
adequately robust. (For example, Salesforce.com have talked and written 
publicly about how they do application-level multitenancy within a single RDBMS 
by striping the critical tables with an ID for the tenant organization. Then 
they model human users within tenant organizations on top of that.)

(Yes, Peter, I appreciate that this is exactly what you said! But it's hardly a 
bad thing. It's just an inevitable fact of life.)

Paraphrasing Peter, the design of the application's RDBMS backend has to 
implement its own notions of roles and privileges as a new layer on top of 
whatever the native RDBMS mechanisms provide. Some RDBMSs have native 
primitives that help the implementation of this next, application-specific, 
roles-and-privileges regime. I've yet to discover what PG has in this space. 
But one thing is clear. The SQLs that rely on these home-grown designs for the 
application-specific privilege regime must be designed, implemented, and tested 
very carefully by specialists. In my book, these specialists have enough to 
grapple with without embracing the skills of client-side programming languages, 
the declarative mechanisms that various frameworks rely on, and UI design. It's 
nice that PG has the mechanisms to support this separation of duties.

All this background lets me focus on what I started with: how to make use of 
PG's native mechanisms for security and (within this framework) "security 
definer" subprograms to limit the scope for what SQL calls from next-tier code 
that's authorized as "client" can do. The design spectrum here spans from this:

—Make no use of the RDBMS's mechanisms: in other words, implement the entire 
RDBMS application backend using only the single role "client" where client 
owns, crudely stated, only tables. Of course, there are closely associated 
artifacts like indexes and constraints—even though (as the myth has it) these 
cause performance problems and, as is claimed, such rules are better 
implemented in application-tier code. I kid you not. I've heard people say 
this. And I expect that you all have too. In this case, the engineers who write 
the application-tier code (or use schemes that generate it) take all the 
responsibility for correctness, data integrity, and data security. I call this 
approach the "bag of tables paradigm". And, yes, the phrase is meant to connote 
nastiness.

—Make maximal use of the RDBMS's mechanisms: in other words, exploit PG's 
role-based security mechanisms, and especially "security definer" subprograms, 
so that application-tier code (acting as it must, by having authorized as 
"client" to service requests for all human end-users) can most reliably 
implement whatever scheme is invented for the within-RDBMS component of the 
application-level notions for roles and privileges. I call this approach the 
"hard shell paradigm". (I didn't invent that term of art.) The phrase is meant 
to connote goodness.

You've noticed that I stand at the "hard shell" end of the spectrum. For 
example, I have a demo application that deals with data held in a master-detail 
table pair. (Please don't tell me, Julien, that this is such an unrealistic toy 
that it makes not a single pedagogic point.) The API is exposed via the 
"client" role as just a set of JSON-in, JSON-out procedures. And there's many 
roles where often each owns more than one schema, hidden behind the scenes. 
Sessions authorized as "client" can't detect any facts about these 
implementation details because they can't query any of the catalog relations. 
After all, why should they be able to do this? I see no need for it. And it's 
easy to prevent it. Moreover, and very significantly, it's hugely more 
difficult to allow catalog access and then to reason, case by case, about what 
"client" can see (like the source code of a procedure) can't be exploited to do 
harm. 

Here's some examples of API calls with their return values. Each of the 
"masters" and the "details" tables has just a single "payload" column, "v text" 
with some constraints. The other columns are the master-detail plumbing and 
don't surface into the business functionality world.

Firstly, inserting data:

call insert_master_and_details()

{"m": "mary", "ds": ["shampoo", "soap", "toothbrush", "towel"]}

→ {"status": "success"}

{"m": "arthur", "ds": ["scissors", "saucer", "spatula", "spatula", "scissors"]}

→ {"reason": "New master 'arthur' bad duplicate details: 
'scissors','spatula',", "status": "user error"}

{"m": "Christopher", "ds": []}

→ {"status": "unexpected error", "ticket": 1}

The "Christopher" error reflects the fact that a rule (like names must be all 
lower case) is violated. The API contract says that the database will reject 
bad values—and will not coerce a bad value into a good value. Rather, it's the 
application-tier's responsibility to present only good values. That's why the 
status is "unexpected error". The ticket number is the auto-generated surrogate 
key for a row in the behind-the-scenes "incidents" table. An authorized support 
staff member can access that table for that ticket and see this:

unit:                 procedure code.insert_master_and_details(text, text)
returned_sqlstate:    23514
message_text:         new row for relation "masters" violates check constraint 
"masters_v_chk"
pg_exception_detail:  Failing row contains 
(cc93bd34-b68a-4d47-b9e9-0033031cefb7, Christopher).
constraint_name:      masters_v_chk
table_name:           masters
schema_name:          data

pg_exception_context
--------------------
SQL statement "insert into data.masters(v) values(m_and_ds.m) returning mk"
PL/pgSQL function code.insert_master_and_details(text,text) line 17 at SQL 
statement
SQL statement "call code.insert_master_and_details(j, outcome)"
PL/pgSQL function insert_master_and_details(text,text) line 3 at CALL

This is the full story of what "get stacked diagnostics" delivers in an 
"others" handler—together with whatever other facts about the context that the 
designer as decided might be useful (exemplified here by the redundant fact 
"unit").

Secondly, reporting on data:

call do_master_and_details_report()

{"key": "mary"}

→ {"status": "m-and-ds report success",
   "m_and_ds": {"m": "mary", "ds": ["shampoo", "soap", "toothbrush", "towel"]}}

{"key": "bill"}

→ {"reason": "The master business key 'bill' doesn't exist", "status": "user 
error"}

{"ket": "fred"}

→ {"reason": "Bad JSON in: {\"ket\": \"fred\"}", "status": "client code error"}

Obviously, it's the responsibility of the application-tier code to manage the 
UI flow, paint panes that allow data entry for new facts and for parameterizing 
queries, and for presenting the return values for the good outcomes and the 
error outcomes sensibly.

For sport, I have a variant of this demo that enforces the famous mandatory 1:M 
rule. I used an approach that you can find described on the Internet. The 
concept is a mutual FK relationship between the "masters" and the "details" 
tables so that one detail is special in that it's the parent of its own parent. 
This works perfectly well w.r.t. the rule itself—and I don't need to worry 
about race conditions because the native support for deferred FK constraints 
looks after this. There is a twist, though. I have to do a "twizzle" when there 
remain many details for a given master and when somebody wants to delete the 
currently "special" detail. But the solution is doable by using a trigger to 
perform the twizzle to appoint a surviving detail to "special" status.

Significantly, the details of the enforcement scheme don't matter to sessions 
that connect as "client". To prove that point, I implemented an alternative 
scheme that uses a trigger naïvely to cause an error when it sees that you're 
about to leave a master row with no details. This scheme does, of course, 
require "serializable" isolation—and so I prefer the mutual FK scheme.

My point here is that the design and implementation of the mandatory 1:M rule 
enforcement, whatever it is, is entirely hidden from "client" sessions and is 
immune from their tampering attempts. Moreover, you can change the 
implementation, in an application upgrade exercise, without "client" sessions 
even knowing that this was done. (I like to dream that, one day, it could be 
replaced with a single declaration of intent, expressed as a "create assertion" 
statement.)

I regard this feature of the scheme (changing the implementation without 
needing to tell clients) as a good thing. It's something that the "bag of 
tables" paradigm cannot, in general, support. When you describe the paradigm as 
"hiding all implementations behind an API that articulates the business purpose 
of each of its members", you see the "hard shell" paradigm for what it is. One 
of the central pillars of software engineering. And it's as old as the 
discipline itself.

*FINALLY*

I've discussed these competing paradigms, over many years, with all sorts of 
people—both face-to-face and in written exchanges. I'm a "hard shell" devotee. 
And there are lots of "hard shell" devotees out there. But there are also many 
"bag of tables" devotees out there too. The second camp is probably more 
populous than the first one. I've come to accept that one's affiliation here is 
religion. In almost all cases, a devotee of one paradigm simply cannot convince 
devotees of the other paradigm to change their minds—or even to concede ground 
on just one tiny detail. I can't fix this. But I know where I stand in the 
debate.

Re: Limiting the operations that client-side code can perform upon its database backend's artifacts

Reply via email to