Re: improve performance of pg_dump with many sequences

2024-07-31 Thread Nathan Bossart
Committed. -- nathan

Re: improve performance of pg_dump with many sequences

2024-07-24 Thread Nathan Bossart
I ran Euler's tests again on the v6 patch set. for i in `seq 1 1`; do psql postgres -c "CREATE SEQUENCE s$i;"; done time pg_dump -f - -s -d postgres > /dev/null HEAD:0.607s 0001 + 0002: 0.094s all patches: 0.094s Barring additional feedback, I

Re: improve performance of pg_dump with many sequences

2024-07-18 Thread Nathan Bossart
I fixed a compiler warning on Windows in v6 of the patch set. Sorry for the noise. -- nathan >From 139826d3b9290d547ae9b3446ad455ba713211e0 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 15 Jul 2024 13:13:05 -0500 Subject: [PATCH v6 1/4] parse sequence information --- src/bin/pg_dum

Re: improve performance of pg_dump with many sequences

2024-07-18 Thread Nathan Bossart
On Wed, Jul 17, 2024 at 11:58:21PM -0400, Tom Lane wrote: > ... okay, I lied, I looked at the patch. Why are you testing > > + if (pg_class_aclcheck(relid, GetUserId(), ACL_SELECT | ACL_USAGE) == > ACLCHECK_OK && > > ? This is a substitute for a SELECT from the sequence and it seems > like

Re: improve performance of pg_dump with many sequences

2024-07-17 Thread Tom Lane
Nathan Bossart writes: > Here is an attempt at adding a new function that returns the sequence tuple > and using that to avoid querying each sequence relation individually in > dumpSequenceData(). Didn't read the patch yet, but ... > If we instead wanted to change pg_sequence_last_value() to ret

Re: improve performance of pg_dump with many sequences

2024-07-17 Thread Nathan Bossart
Here is an attempt at adding a new function that returns the sequence tuple and using that to avoid querying each sequence relation individually in dumpSequenceData(). If we instead wanted to change pg_sequence_last_value() to return both is_called and last_value, I think we could modify the pg_se

Re: improve performance of pg_dump with many sequences

2024-07-17 Thread Tom Lane
Nathan Bossart writes: > On Wed, Jul 17, 2024 at 02:59:26PM -0400, Tom Lane wrote: >> Uh ... why do we need a function, rather than just >> select * from pg_sequence > We can use that for dumpSequence(), but dumpSequenceData() requires > information from the sequence tuple itself. Right now, we

Re: improve performance of pg_dump with many sequences

2024-07-17 Thread Nathan Bossart
On Wed, Jul 17, 2024 at 02:59:26PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> On second thought, I worry that this change might needlessly complicate the >> pg_sequences system view. Maybe we should just add a >> pg_sequence_get_tuple() function that returns everything in >> FormData_pg_s

Re: improve performance of pg_dump with many sequences

2024-07-17 Thread Tom Lane
Nathan Bossart writes: > On second thought, I worry that this change might needlessly complicate the > pg_sequences system view. Maybe we should just add a > pg_sequence_get_tuple() function that returns everything in > FormData_pg_sequence_data for a given sequence OID... Uh ... why do we need

Re: improve performance of pg_dump with many sequences

2024-07-17 Thread Nathan Bossart
On Tue, Jul 16, 2024 at 10:23:08PM -0500, Nathan Bossart wrote: > On Wed, Jul 17, 2024 at 11:30:04AM +0900, Michael Paquier wrote: >> Yeah, I have bumped on the same issue. In the long term, I also think >> that we'd better have pg_sequence_last_value() return a row with >> is_called and the value

Re: improve performance of pg_dump with many sequences

2024-07-16 Thread Nathan Bossart
On Wed, Jul 17, 2024 at 11:30:04AM +0900, Michael Paquier wrote: > Yeah, I have bumped on the same issue. In the long term, I also think > that we'd better have pg_sequence_last_value() return a row with > is_called and the value scanned. As you say, it won't help except > when upgrading from ver

Re: improve performance of pg_dump with many sequences

2024-07-16 Thread Michael Paquier
On Tue, Jul 16, 2024 at 04:36:15PM -0500, Nathan Bossart wrote: > Unfortunately, I've also discovered a problem with 0003. > pg_sequence_last_value() returns NULL when is_called is false, in which > case we assume last_value == seqstart, which is, sadly, bogus due to > commands like ALTER SEQUENCE

Re: improve performance of pg_dump with many sequences

2024-07-16 Thread Nathan Bossart
On Thu, Jul 11, 2024 at 09:09:17PM -0500, Nathan Bossart wrote: > On second thought, maybe we should just limit this improvement to the minor > releases with the fix so that we _can_ get rid of the workaround. Or we > could use the hacky workaround only for versions with the bug. Here is a new ve

Re: improve performance of pg_dump with many sequences

2024-07-11 Thread Nathan Bossart
On Wed, Jul 10, 2024 at 11:52:33PM -0300, Euler Taveira wrote: > On Wed, Jul 10, 2024, at 7:05 PM, Nathan Bossart wrote: >> Unfortunately, I think we have to keep this workaround since older minor >> releases of PostgreSQL don't have the fix. > > Hmm. Right. On second thought, maybe we should jus

Re: improve performance of pg_dump with many sequences

2024-07-10 Thread Euler Taveira
On Wed, Jul 10, 2024, at 7:05 PM, Nathan Bossart wrote: > I'm not following why that would be a better approach. strncpy() will add > a NUL to the end of the string unless it doesn't fit in the buffer, in > which case we'll add our own via "seqtype[sizeof(seqtype) - 1] = '\0'". > Furthermore, the

Re: improve performance of pg_dump with many sequences

2024-07-10 Thread Nathan Bossart
On Wed, Jul 10, 2024 at 05:08:56PM -0300, Euler Taveira wrote: > Nice improvement. The numbers for a realistic scenario (10k sequences) are Thanks for taking a look! > You are changing internal representation from char to int64. Is the main goal > to > validate catalog data? What if there is a n

Re: improve performance of pg_dump with many sequences

2024-07-10 Thread Euler Taveira
On Tue, Jul 9, 2024, at 4:11 PM, Nathan Bossart wrote: > rebased Nice improvement. The numbers for a realistic scenario (10k sequences) are for i in `seq 1 1`; do echo "CREATE SEQUENCE s$i;"; done > /tmp/s.sql master: real 0m1,141s user 0m0,056s sys 0m0,147s patched: real 0m0,410s user 0m0,

Re: improve performance of pg_dump with many sequences

2024-07-09 Thread Nathan Bossart
rebased -- nathan >From 8cab570c9c5771d58860ff4e1d8cbd38e2792d80 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 30 Apr 2024 14:41:36 -0500 Subject: [PATCH v2 1/3] parse sequence information --- src/bin/pg_dump/pg_dump.c | 64 --- 1 file changed, 26

improve performance of pg_dump with many sequences

2024-05-02 Thread Nathan Bossart
Similar to 'pg_dump --binary-upgrade' [0], we can speed up pg_dump with many sequences by gathering the required information in a single query instead of two queries per sequence. The attached patches are works-in-progress, but here are the results I see on my machine for 'pg_dump --schema-only --