And in case there *is* ACL present then each user mentioned in the ACL adds more overhead
Also the separate GRANT calls cause bloat as the pg_largeoject_metadata row gets updated for each ALTER USER or GRANT The following is for 10 million LOs with 1 and 3 users being GRANTed SELECT on each object (with no grants the pg_restore run was 10 minutes) Nr of GRANTS | pg_dump time | pg_restore time --------------+--------------+---------------- 0 | 0m 10s | 10m 5s 1 | 0m 17s | 15m 3s 3 | 0m 21s | 27m 15s NB! - I left out the --verbose flag from pg_dump as used by pg_upgrade, as it will emit one line per LO dumped ## 1 GRANT / LO hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only --quote-all-identifiers --binary-upgrade --format=custom --file=lodb10m.dump -p 5433 lodb10m real 0m17.022s user 0m2.956s sys 0m1.453s hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434 --exit-on-error --transaction-size=1000 --dbname lodb10m lodb10m.dump real 15m3.136s user 0m28.991s sys 2m54.164s ## 3 GRANTs / LO make sample LO with 3 grants ALTER LARGE OBJECT 1 OWNER TO "hannuk"; GRANT SELECT ON LARGE OBJECT 1 TO "bob"; GRANT SELECT ON LARGE OBJECT 1 TO "joe"; GRANT SELECT ON LARGE OBJECT 1 TO "tom"; lodb10m=# select * from pg_shdepend where objid = 1; ┌───────┬─────────┬───────┬──────────┬────────────┬──────────┬─────────┐ │ dbid │ classid │ objid │ objsubid │ refclassid │ refobjid │ deptype │ ├───────┼─────────┼───────┼──────────┼────────────┼──────────┼─────────┤ │ 16406 │ 2613 │ 1 │ 0 │ 1260 │ 16384 │ o │ │ 16406 │ 2613 │ 1 │ 0 │ 1260 │ 16393 │ a │ │ 16406 │ 2613 │ 1 │ 0 │ 1260 │ 16394 │ a │ │ 16406 │ 2613 │ 1 │ 0 │ 1260 │ 16395 │ a │ └───────┴─────────┴───────┴──────────┴────────────┴──────────┴─────────┘ lodb10m=# select * from pg_largeobject_metadata ; ┌─────┬──────────┬───────────────────────────────────────────────────────────┐ │ oid │ lomowner │ lomacl │ ├─────┼──────────┼───────────────────────────────────────────────────────────┤ │ 1 │ 16384 │ {hannuk=rw/hannuk,bob=r/hannuk,joe=r/hannuk,tom=r/hannuk} │ └─────┴──────────┴───────────────────────────────────────────────────────────┘ Make the remaining 10M-1 LOs lodb10m=# insert into pg_largeobject_metadata(oid, lomowner, lomacl) SELECT i, 16384, '{hannuk=rw/hannuk,bob=r/hannuk,joe=r/hannuk,tom=r/hannuk}' FROM generate_series(2, 10_000_000) g(i); INSERT 0 9999999 Time: 18859.341 ms (00:18.859) And add their sharedeps lodb10m=# WITH refdeps (robj, rdeptype) AS ( VALUES (16384, 'o'), (16393, 'a'), (16394, 'a'), (16395, 'a') ) INSERT INTO pg_shdepend SELECT 16396, 2613, i, 0, 1260, robj, rdeptype FROM generate_series(2, 10_000_000) g(i) , refdeps ; INSERT 0 39999996 Time: 116697.342 ms (01:56.697) Time pg_upgrade's pg_dump and pg_reload hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only --quote-all-identifiers --binary-upgrade --format=custom --file=lodb10m-3grants.dump -p 5433 lodb10m real 0m21.519s user 0m2.951s sys 0m1.723s hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434 --exit-on-error --transaction-size=1000 --dbname lodb10m lodb10m-3grants.dump real 27m15.372s user 0m45.157s sys 4m57.513s On Fri, Apr 11, 2025 at 10:11 PM Nathan Bossart <nathandboss...@gmail.com> wrote: > > On Tue, Apr 08, 2025 at 12:22:00PM -0500, Nathan Bossart wrote: > > On Tue, Apr 08, 2025 at 01:07:09PM -0400, Tom Lane wrote: > >> Nathan Bossart <nathandboss...@gmail.com> writes: > >>> I do think it's worth considering going back to copying > >>> pg_largobject_metadata's files for upgrades from v16 and newer. > >> > >> (If we do this) I don't see why we'd need to stop at v16. I'm > >> envisioning that we'd use COPY, which will be dealing in the > >> text representation of aclitems, and I don't think that's changed > >> in a long time. The sort of thing that would break it is changes > >> in the set of available/default privilege bits for large objects. > > > > I was thinking of actually reverting commit 12a53c7 for upgrades from v16, > > which AFAICT is the last release where any relevant storage formats changed > > (aclitem changed in v16). But if COPY gets us pretty close to that and is > > less likely to be disrupted by future changes, it could be a better > > long-term approach. > > > >> That is, where the dump currently contains something like > >> > >> SELECT pg_catalog.lo_create('2121'); > >> ALTER LARGE OBJECT 2121 OWNER TO postgres; > >> GRANT ALL ON LARGE OBJECT 2121 TO joe; > >> > >> we'd have > >> > >> COPY pg_largeobject_metadata FROM STDIN; > >> ... > >> 2121 10 {postgres=rw/postgres,joe=rw/postgres} > >> ... > >> > >> and some appropriate COPY data for pg_shdepend too. > > I did some more research here. For many large objects without ACLs to > dump, I noticed that the vast majority of time is going to restoring the > ALTER OWNER commands. For 1 million such large objects, restoring took ~73 > seconds on my machine. If I instead invented an lo_create_with_owner() > function and created 100 per SELECT command, the same restore takes ~7 > seconds. Copying the relevant pg_shdepend rows out and back in takes ~2.5 > seconds. I imagine using COPY for pg_largeobject_metadata would also take > a couple of seconds in this case. > > For upgrading, I don't think there's any huge benefit to optimizing the > restore commands versus using COPY. It might make future catalog changes > for large object stuff easier, but I'd expect those to be rare. However, > the optimized restore commands could be nice for non-pg_upgrade use-cases. > > -- > nathan