Tom Lane wrote: > KaiGai Kohei <kai...@ak.jp.nec.com> writes: >> The attached patch is the revised one for largeobejct access controls, >> because it conflicted to the "GRANT/REVOKE ON ALL xxx IN SCHEMA". > > I started to look through this patch with the hope of committing it, but > found out that it's not really ready. > > The most serious problem is that you ripped out myLargeObjectExists(), > apparently because you didn't understand what it's there for. The reason > it's there is to ensure that pg_dump runs don't fail. pg_dump is expected > to produce a consistent dump of all large objects that existed at the > time of its transaction snapshot. With this code, pg_dump would get a > "large object doesn't exist" error on any LO that is deleted between the > time of the snapshot and the time pg_dump actually tries to dump it --- > which could be quite a large window in a large database. > > The reason this is a significant problem and not just an easily fixable > oversight is that it's not entirely clear what to do instead. We can > certainly make the pure existence test use the query snapshot instead of > SnapshotNow, but what about the implied permissions tests? Should we > attempt to make them run using the version of the LO's ACL found in the > query-snapshot-time metadata row? The trouble with that is it might refer > to roles that don't exist anymore, perhaps resulting in failures down > inside the ACL checking routines. It would be safer to rely on the > current metadata row contents, but then we have the question of whether to > allow the access if the row doesn't exist according to SnapshotNow. > > Now these issues arise to some extent already in pg_dump, but the current > window for failure is quite short because it obtains access share locks on > all the tables it will dump at the start of the run. With large objects > the window in which things could have changed is very much longer. > > Of course, in the cases that people are most concerned about, pg_dump is > running as superuser and so the actual ACL contents don't really matter > anyway, so long as we don't fail outright before getting to the check. > So I'm kind of inclined to say that the least evil solution is to apply > the permissions check using the query-snapshot-time metadata row. > It's definitely a debatable question though. We'd also want to make sure > that the aclcheck code doesn't fail if it finds a nonexistent role ID > in the ACL.
The origin of this matter is the basis of existence was changed. Our current basis is whether pg_largeobject has one or more data chunk for the given loid in the correct snapshot, or not. The newer one is whether pg_largeobject_metadata has the entry for the given loid in the SnapshotNow, or not. The newer basis is same as other database objects, such as functions. But why pg_dump performs correctly for these database objects? In my understanding, because it reads the system catalog directly in the query snapshot. So, it will be visible, if concurrent transaction dropped a function to be backed up. Now, pg_dump uses libpq's large object interface which internally uses loread()/lowrite(). If pg_dump fetches data chunks from the system catalog, it seems to me the matter is reasonably solvable. My assumption is that you're not talking about a generic situation when a certain database object is unavailable even if we can find it within the system catalog, apart from large object backups. For example, we can easily produce a similar behavior when we tries to use a function which can be found in pg_proc, but concurrent transaction already removed it. I don't believe PostgreSQL guarantees equivalence between the visibility of system catalog and the availability of the related database object. So, is it the simplest approach to patch on the pg_dump? > Moving on to lesser but still significant problems, you probably already > guessed my next one: there's no pg_dump support. If the system tracks > owner and ACL for large objects, pg_dump *must* be prepared to dump that > information. It'd also be worthwhile to teach pg_dump that in servers >= > 8.5, it can look in the metadata catalog to make the list of LO OIDs > instead of having to do a SELECT DISTINCT from pg_largeobject. Hmm. I planed to add support to the pg_dump next to the serve-side enhancement. If both of patches are necessary soon, I'll include it in this phase. > I notice that the patch decides to change the pg_description classoid for > LO comments from pg_largeobject's OID to pg_largeobject_metadata's. This > will break existing clients that look at pg_description (eg, pg_dump and > psql, plus any other clients that have any intelligence about comments, > for instance it probably breaks pgAdmin). And there's just not a lot of > return that I can see. I agree that using pg_largeobject_metadata would > be more consistent given the new catalog layout, but I'm inclined to think > we should stick to the old convention on compatibility grounds. Given > that choice, for consistency we'd better also use pg_largeobject's OID not > pg_largeobject_metadata's in pg_shdepend entries for LOs. It seems to me reasonable. I'll fix it with comments why it uses LargeObjectRelationId not LargeObjectMetadataRelationId here. > In the category of lesser issues that have still got to be fixed: > > * You need to pay more attention to updating comments. For example > your changes to LargeObjectCreate render its header comment a complete > lie, but you didn't change the comment. OK, I'll fix it. > (And what is the purpose of > renaming it to CreateLargeObject, and similarly for the adjacent > routines? You didn't change the API meaningfully, so there is no > reason to break calling code that way.) Yes, ExecAlterOwnerStmt() calls AlterXXXXOwner() but only largeobject has different naming convention, if we follow the existing names such as LargeObjectCreate(). But I don't have any strong motication for the name. I'll fix it. > * The documentation needs work too, eg a new system catalog requires a > page in catalogs.sgml, and I'll bet there's a few references to large > objects and/or permissions that need to be updated. I left for adding the page for pg_largeobject_metadata. I'll add it. > * "largeobject" is not an English word. The occurrences in user-visible > messages and documentation must get changed to "large object". I've got > mixed emotions even about using it in code identifiers, although we > certainly aren't going to rename pg_largeobject, so anything that's named > in parallel to that should stay as-is. OK, I'll fix it. > In the same vein, "acl" is not > a word we want to expose to users, so "largeobject_check_acl" is doubly > bad as a GUC variable name. Perhaps "large_object_privilege_checks" > would do. Hmm. OK, I'll fix it. > * I'm not really happy with the ac_largeobject_foo shim layer, and would > frankly prefer to rip it out and put those tests inline. It's poorly > thought out IMO --- eg, what the heck is that cascade boolean --- and > doesn't look like any of the rest of the Postgres code stylistically, and > it makes the calling code look different from similar tests elsewhere too. > When and if SELinux support ever gets in, I'd expect that the stubs for > it would get put into the aclchk.c routines not into all their callers. > So this doesn't seem to me that it's going in the right direction even if > we posit that that support will get in. At the beginning of this commit fest, I was not clear which patch is merged earlier, so I separated access control routines for easy integration in the future. However, it is not a matter to deploy inlined ACL checks in this stage. If arguable, I'll implement it with the current style in this patch. BTW, we tried to put SELinux hooks within pg_xxx_aclcheck() routines but it was already failed on the first commit fest, so we are now working for the abstranction layer for access controls. > * Lastly, that change in psql is neither version-aware nor schema-safe. > psql is expected to still work with older server versions, so you need > two versions of that query, not just a replacement. And don't omit the > "pg_catalog." qualifier. (Also, I wonder whether it shouldn't have a way > to display permissions for LOs, though maybe that ought to be conditional. > Time for "\lo_list+" perhaps?) In the meantime, I'll add two versions of that query here. If we need "\lo_list+", it can be fixed later. Thanks, -- OSS Platform Development Division, NEC KaiGai Kohei <kai...@ak.jp.nec.com> -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers