And in case there *is* ACL present then each user mentioned in the ACL
adds more overhead

Also the separate GRANT calls cause bloat as the
pg_largeoject_metadata row gets updated for each ALTER USER or GRANT

The following is for 10 million LOs with 1 and 3 users being GRANTed
SELECT on each object (with no grants the pg_restore run was 10
minutes)

Nr of GRANTS  | pg_dump time | pg_restore time
--------------+--------------+----------------
       0      |    0m 10s    |   10m  5s
       1      |    0m 17s    |   15m  3s
       3      |    0m 21s    |   27m 15s

NB! - I left out the --verbose flag from pg_dump as used by
pg_upgrade, as it will emit one line per LO dumped

## 1 GRANT / LO

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only
--quote-all-identifiers --binary-upgrade --format=custom
--file=lodb10m.dump -p 5433 lodb10m
real    0m17.022s
user    0m2.956s
sys     0m1.453s

hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
--exit-on-error --transaction-size=1000 --dbname lodb10m lodb10m.dump
real    15m3.136s
user    0m28.991s
sys     2m54.164s

## 3 GRANTs / LO

make sample LO with 3 grants

ALTER LARGE OBJECT 1 OWNER TO "hannuk";
GRANT SELECT ON LARGE OBJECT 1 TO "bob";
GRANT SELECT ON LARGE OBJECT 1 TO "joe";
GRANT SELECT ON LARGE OBJECT 1 TO "tom";
lodb10m=# select *  from pg_shdepend where objid = 1;

┌───────┬─────────┬───────┬──────────┬────────────┬──────────┬─────────┐
│ dbid  │ classid │ objid │ objsubid │ refclassid │ refobjid │ deptype │
├───────┼─────────┼───────┼──────────┼────────────┼──────────┼─────────┤
│ 16406 │    2613 │     1 │        0 │       1260 │    16384 │ o       │
│ 16406 │    2613 │     1 │        0 │       1260 │    16393 │ a       │
│ 16406 │    2613 │     1 │        0 │       1260 │    16394 │ a       │
│ 16406 │    2613 │     1 │        0 │       1260 │    16395 │ a       │
└───────┴─────────┴───────┴──────────┴────────────┴──────────┴─────────┘

lodb10m=# select *  from pg_largeobject_metadata ;
┌─────┬──────────┬───────────────────────────────────────────────────────────┐
│ oid │ lomowner │                          lomacl                           │
├─────┼──────────┼───────────────────────────────────────────────────────────┤
│   1 │    16384 │ {hannuk=rw/hannuk,bob=r/hannuk,joe=r/hannuk,tom=r/hannuk} │
└─────┴──────────┴───────────────────────────────────────────────────────────┘

Make the remaining 10M-1 LOs

lodb10m=# insert into pg_largeobject_metadata(oid, lomowner, lomacl)
SELECT i, 16384,
'{hannuk=rw/hannuk,bob=r/hannuk,joe=r/hannuk,tom=r/hannuk}' FROM
generate_series(2, 10_000_000) g(i);
INSERT 0 9999999
Time: 18859.341 ms (00:18.859)

And add their sharedeps

lodb10m=# WITH refdeps (robj, rdeptype)
AS ( VALUES
  (16384, 'o'),
  (16393, 'a'),
  (16394, 'a'),
  (16395, 'a')
)
INSERT INTO pg_shdepend SELECT 16396, 2613, i, 0, 1260, robj, rdeptype
  FROM generate_series(2, 10_000_000) g(i)
     , refdeps
;
INSERT 0 39999996
Time: 116697.342 ms (01:56.697)

Time pg_upgrade's pg_dump and pg_reload

hannuk@db01-c1a:~/work/lo-testing$ time pg_dump --schema-only
--quote-all-identifiers --binary-upgrade --format=custom
--file=lodb10m-3grants.dump -p 5433 lodb10m
real    0m21.519s
user    0m2.951s
sys     0m1.723s

hannuk@db01-c1a:~/work/lo-testing$ time pg_restore -p 5434
--exit-on-error --transaction-size=1000 --dbname lodb10m
lodb10m-3grants.dump
real    27m15.372s
user    0m45.157s
sys     4m57.513s

On Fri, Apr 11, 2025 at 10:11 PM Nathan Bossart
<nathandboss...@gmail.com> wrote:
>
> On Tue, Apr 08, 2025 at 12:22:00PM -0500, Nathan Bossart wrote:
> > On Tue, Apr 08, 2025 at 01:07:09PM -0400, Tom Lane wrote:
> >> Nathan Bossart <nathandboss...@gmail.com> writes:
> >>> I do think it's worth considering going back to copying
> >>> pg_largobject_metadata's files for upgrades from v16 and newer.
> >>
> >> (If we do this) I don't see why we'd need to stop at v16.  I'm
> >> envisioning that we'd use COPY, which will be dealing in the
> >> text representation of aclitems, and I don't think that's changed
> >> in a long time.  The sort of thing that would break it is changes
> >> in the set of available/default privilege bits for large objects.
> >
> > I was thinking of actually reverting commit 12a53c7 for upgrades from v16,
> > which AFAICT is the last release where any relevant storage formats changed
> > (aclitem changed in v16).  But if COPY gets us pretty close to that and is
> > less likely to be disrupted by future changes, it could be a better
> > long-term approach.
> >
> >> That is, where the dump currently contains something like
> >>
> >> SELECT pg_catalog.lo_create('2121');
> >> ALTER LARGE OBJECT 2121 OWNER TO postgres;
> >> GRANT ALL ON LARGE OBJECT 2121 TO joe;
> >>
> >> we'd have
> >>
> >> COPY pg_largeobject_metadata FROM STDIN;
> >> ...
> >> 2121 10      {postgres=rw/postgres,joe=rw/postgres}
> >> ...
> >>
> >> and some appropriate COPY data for pg_shdepend too.
>
> I did some more research here.  For many large objects without ACLs to
> dump, I noticed that the vast majority of time is going to restoring the
> ALTER OWNER commands.  For 1 million such large objects, restoring took ~73
> seconds on my machine.  If I instead invented an lo_create_with_owner()
> function and created 100 per SELECT command, the same restore takes ~7
> seconds.  Copying the relevant pg_shdepend rows out and back in takes ~2.5
> seconds.  I imagine using COPY for pg_largeobject_metadata would also take
> a couple of seconds in this case.
>
> For upgrading, I don't think there's any huge benefit to optimizing the
> restore commands versus using COPY.  It might make future catalog changes
> for large object stuff easier, but I'd expect those to be rare.  However,
> the optimized restore commands could be nice for non-pg_upgrade use-cases.
>
> --
> nathan

Reply via email to