For temporary tables; truncate vs on commit delete all

2023-03-22 Thread Jim Vanns
Does anyone have any idea which is generally better (by better I mean
most efficient/quickest!) in this simple scenario? I have a temporary
table which I wish to retain for the duration of a long-running
session. However, for each transaction it must be purged. So, is it
better to;

a) Always CREATE TEMPORARY TABLE + ON COMMIT DROP for every transaction or;
b) TRUNCATE TABLE or;
c) ON COMMIT DELETE ROWS

Both b & c avoid re-creating the table each time but at the cost of an
explicit purge of some form.
I would assume that despite its name, the latter, example c, is more
akin to a truncation? Or does it actually perform a deletion?

Cheers

Jim

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London




CREATE TEMPORARY TABLE LIKE

2023-05-25 Thread Jim Vanns
Hi everyone,

When basing a temporary table of a source table, are triggers included
by default? I have this statement;

CREATE TEMPORARY TABLE dev_main (
LIKE prod_main
INCLUDING ALL
EXCLUDING INDEXES
EXCLUDING CONSTRAINTS
) ON COMMIT DELETE ROWS;

And wondering if there is a trigger (row-based after) on prod_main
it'll fire also on dev_main? I can't find anything in the
documentation that suggests either way nor can I see an explicit
EXCLUDING option to be sure triggers aren't copied.

Does anyone know?

Cheers,

Jim

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London




Re: CREATE TEMPORARY TABLE LIKE

2023-05-25 Thread Jim Vanns
I just wrote a little test case... it appears not. Triggers aren't
fired in the temporary table.

Jim

On Thu, 25 May 2023 at 14:06, Jim Vanns  wrote:
>
> Hi everyone,
>
> When basing a temporary table of a source table, are triggers included
> by default? I have this statement;
>
> CREATE TEMPORARY TABLE dev_main (
> LIKE prod_main
> INCLUDING ALL
> EXCLUDING INDEXES
> EXCLUDING CONSTRAINTS
> ) ON COMMIT DELETE ROWS;
>
> And wondering if there is a trigger (row-based after) on prod_main
> it'll fire also on dev_main? I can't find anything in the
> documentation that suggests either way nor can I see an explicit
> EXCLUDING option to be sure triggers aren't copied.
>
> Does anyone know?
>
> Cheers,
>
> Jim
>
> --
> Jim Vanns
> Principal Production Engineer
> Industrial Light & Magic, London



-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London




Dynamic creation of list partitions in highly concurrent write environment

2023-06-01 Thread Jim Vanns
Hello everyone.

We are attempting to evaluate list partitioning over hash partitioning
(which we currently use) to better facilitate dropping tables that
distinctly model devices we wish to purge. We don't want to use a DML
statement for cleanup since these tables can contain billions of rows
otherwise. Using PG15, we've hit a snag;

We're attempting to use declarative partitioning syntax in a function run
as an after trigger; we hit deadlocks. We try the older
inheritance alternative, we still hit deadlocks. I appreciate this is
generally because creating tables isn't a concurrently supported operation.
Here's the general approach we have, without code and with hypothetical
model, since I'm writing this hastily;

table devices; <-- After trigger on here
table routes; <-- To create physical partition for this logical table
table route_for_device_N; <-- The dynamically created partition

We don't know the device names and thus partitions up front, hence the need
for dynamic creation. How do you go about doing this or is it folly!?

Cheers

Jim

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Re: Dynamic creation of list partitions in highly concurrent write environment

2023-06-01 Thread Jim Vanns
Yeah, that's a known concern - hence the evaluation as part of  the list
vs. hash comparison ;)

Jim

On Thu, 1 Jun 2023 at 14:32, Tom Lane  wrote:

> Jim Vanns  writes:
> > Here's the general approach we have, without code and with hypothetical
> > model, since I'm writing this hastily;
>
> > table devices; <-- After trigger on here
> > table routes; <-- To create physical partition for this logical table
> > table route_for_device_N; <-- The dynamically created partition
>
> > We don't know the device names and thus partitions up front, hence the
> need
> > for dynamic creation. How do you go about doing this or is it folly!?
>
> Sounds like folly, TBH.  You're going to end with a ridiculous number
> of partitions of the routes table, which will not be good for performance.
>
> regards, tom lane
>


-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Tips on troubleshooting slow DELETE (suspect cascades)

2024-01-16 Thread Jim Vanns
Hi,

I have a slow (CPU bound) DELETE statement I'm attempting to debug and I
suspect that its actually the ON DELETE CASCADE on the foreign key thats
causing it. I suspect this because the dry-run mode of the same query (a
SELECT instead of DELETE) doesn't suffer the same fate. The statement is
effectively;

# Dry mode
SELECT prune_function(timestamp);
# Destructive mode
DELETE FROM foobar p USING prune_function(timestamp) AS e WHERE p.id =
e.prune_id

The logs seem to hold no information on the progress of the statement but
the CPU is pegged at 100% for hours. The SELECT equivalent runs in under a
minute.

What I need is a way to see into this statement as it executes to confirm
my suspicion - does anyone have any tips on that?

Cheers

Jim

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Re: Tips on troubleshooting slow DELETE (suspect cascades)

2024-01-18 Thread Jim Vanns
Hi Tom/Adrian.

I should have already stated I did begin with EXPLAIN but given they
don't easily work with (the internals) stored/procedures, it wasn't
useful in this case. Also, I keep having to terminate the statement
because it never runs to completion and produces the plan (at least in
ANALYZE VERBOSE mode anyway).

I have, however, pulled the function body code out and produced an
isolated case that can be EXPLAINED. The table in question is a
curious one since it models a hierarchy as an adjacency list and so
the fkey reference is back to itself (to a primary key - so is an
additional index required?):

CREATE TABLE tree (
   ts TIMESTAMPTZ NOT NULL
   tree_id BIGINT NOT NULL,
   parent_id BIGINT NULL,
   --
   CONSTRAINT cstr_tree_pky PRIMARY KEY (tree_id) INCLUDE (parent_id),
   FOREIGN KEY (parent_id) REFERENCES tree(tree_id)
  ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
);
CREATE INDEX ON tree USING BRIN (ts);

The tree table has 95,915,630 rows.

I've not yet got a complete or reliable plan :( I have made a DB copy
and will be dropping the constraint to see what effect that has.

Cheers,

Jim

On Tue, 16 Jan 2024 at 22:16, Tom Lane  wrote:
>
> Adrian Klaver  writes:
> > On 1/16/24 09:45, Jim Vanns wrote:
> >> I have a slow (CPU bound) DELETE statement I'm attempting to debug and I
> >> suspect that its actually the ON DELETE CASCADE on the foreign key thats
> >> causing it.
>
> 99% of the time, the cause is lack of an index on the foreign key's
> referencing columns.  We make you have a unique index on the
> referenced columns, because otherwise the FK constraint's semantics
> are unclear.  But you're not required to make one on the other side.
>
> >> What I need is a way to see into this statement as it executes to
> >> confirm my suspicion - does anyone have any tips on that?
>
> > Explain:
> > https://www.postgresql.org/docs/current/sql-explain.html
>
> Specifically, if EXPLAIN ANALYZE shows a lot of time spent in the
> enforcement trigger for the FK, this is likely what's happening.
>
> regards, tom lane



-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London




Re: Tips on troubleshooting slow DELETE (suspect cascades)

2024-01-18 Thread Jim Vanns
After dropping the constraint entirely the DELETE completes in 4
minutes (the same time as the dry-run using SELECT against the
function instead of a DELETE). A marked improvement on 3 hours
followed by a pg_cancel_backend()!

Jim

On Thu, 18 Jan 2024 at 16:37, Jim Vanns  wrote:
>
> Hi Tom/Adrian.
>
> I should have already stated I did begin with EXPLAIN but given they
> don't easily work with (the internals) stored/procedures, it wasn't
> useful in this case. Also, I keep having to terminate the statement
> because it never runs to completion and produces the plan (at least in
> ANALYZE VERBOSE mode anyway).
>
> I have, however, pulled the function body code out and produced an
> isolated case that can be EXPLAINED. The table in question is a
> curious one since it models a hierarchy as an adjacency list and so
> the fkey reference is back to itself (to a primary key - so is an
> additional index required?):
>
> CREATE TABLE tree (
>ts TIMESTAMPTZ NOT NULL
>tree_id BIGINT NOT NULL,
>parent_id BIGINT NULL,
>--
>CONSTRAINT cstr_tree_pky PRIMARY KEY (tree_id) INCLUDE (parent_id),
>FOREIGN KEY (parent_id) REFERENCES tree(tree_id)
>   ON DELETE CASCADE DEFERRABLE INITIALLY DEFERRED
> );
> CREATE INDEX ON tree USING BRIN (ts);
>
> The tree table has 95,915,630 rows.
>
> I've not yet got a complete or reliable plan :( I have made a DB copy
> and will be dropping the constraint to see what effect that has.
>
> Cheers,
>
> Jim
>
> On Tue, 16 Jan 2024 at 22:16, Tom Lane  wrote:
> >
> > Adrian Klaver  writes:
> > > On 1/16/24 09:45, Jim Vanns wrote:
> > >> I have a slow (CPU bound) DELETE statement I'm attempting to debug and I
> > >> suspect that its actually the ON DELETE CASCADE on the foreign key thats
> > >> causing it.
> >
> > 99% of the time, the cause is lack of an index on the foreign key's
> > referencing columns.  We make you have a unique index on the
> > referenced columns, because otherwise the FK constraint's semantics
> > are unclear.  But you're not required to make one on the other side.
> >
> > >> What I need is a way to see into this statement as it executes to
> > >> confirm my suspicion - does anyone have any tips on that?
> >
> > > Explain:
> > > https://www.postgresql.org/docs/current/sql-explain.html
> >
> > Specifically, if EXPLAIN ANALYZE shows a lot of time spent in the
> > enforcement trigger for the FK, this is likely what's happening.
> >
> > regards, tom lane
>
>
>
> --
> Jim Vanns
> Principal Production Engineer
> Industrial Light & Magic, London



-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London




Fwd: Suggestions to overcome 'multixact "members" limit exceeded' in temporary tables

2024-07-31 Thread Jim Vanns
(resending to general since I believe I originally sent it to hackers by
mistake)

I've reached the limit of my understanding and attempts at correcting my
code/use of temporary tables in the face of multixact members and have come
to ask for your help! Here's a brief description of my software;

Pool of N connection sessions, persistent for the duration of the program
lifetime.
Upon each session initialisation, a set of CREATE TEMPORARY TABLE ON COMMIT
DELETE ROWS statements are made for bulk ingest.
Each session is acquired by a thread for use when ingesting data and
therefore each temporary table remains until the session is terminated
The thread performs a COPY  FROM STDIN in binary format
Then an INSERT INTO  SELECT FROM  WHERE...

This has been working great for a while and with excellent throughput.
However, upon scaling up I eventually hit this error;

ERROR:  multixact "members" limit exceeded
DETAIL:  This command would create a multixact with 2 members, but the
remaining space is only enough for 0 members.
HINT:  Execute a database-wide VACUUM in database with OID 16467 with
reduced vacuum_multixact_freeze_min_age and
vacuum_multixact_freeze_table_age settings.

And it took me quite a while to identify that it appears to be coming from
the temporary table (the other 'main' tables were being autovacuumed OK) -
which makes sense because they have a long lifetime, aren't auto vacuumed
and shared by transactions (in turn).

I first attempted to overcome this by introducing an initial step of always
creating the temporary table before the copy (and using on commit drop) but
this lead to a terrible performance degradation.
Next, I reverted the above and instead I introduced a VACUUM step every
100 (configurable) ingest operations
Finally, I introduced a TRUNCATE step in addition to the occasional VACUUM
since the TRUNCATE allowed the COPY option of FREEZE.

The new overhead appears minimal until after several hours and again I've
hit a performance degradation seemingly dominated by the TRUNCATE.

My questions are;

1) Is the VACUUM necessary if I use TRUNCATE + COPY FREEZE (on the
temporary table)?
2) Is there really any benefit to using FREEZE here or is it best to just
VACUUM the temporary tables occasionally?
3) Is there a better way of managing all this!? Perhaps re-CREATING the TT
every day or something?

I understand that I can create a Linux tmpfs partition for a tablespace for
the temporary tables and that may speed up the TRUNCATE but that seems like
a hack and I'd rather not do it at all if it's avoidable.

Thanks for your help,

Jim

PS. PG version in use is 15.4 if that matters here

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Re: Suggestions to overcome 'multixact "members" limit exceeded' in temporary tables

2024-07-31 Thread Jim Vanns
I've been able to observe that the performance degradation with TRUNCATE
appears to happen when other ancillary processes are running that are also
heavy users of temporary tables. If I used an exclusive tablespace, would
that improve things?

Cheers

Jim

On Wed, 31 Jul 2024 at 19:27, Jim Vanns  wrote:

> (resending to general since I believe I originally sent it to hackers by
> mistake)
>
> I've reached the limit of my understanding and attempts at correcting my
> code/use of temporary tables in the face of multixact members and have come
> to ask for your help! Here's a brief description of my software;
>
> Pool of N connection sessions, persistent for the duration of the program
> lifetime.
> Upon each session initialisation, a set of CREATE TEMPORARY TABLE ON
> COMMIT DELETE ROWS statements are made for bulk ingest.
> Each session is acquired by a thread for use when ingesting data and
> therefore each temporary table remains until the session is terminated
> The thread performs a COPY  FROM STDIN in binary format
> Then an INSERT INTO  SELECT FROM  WHERE...
>
> This has been working great for a while and with excellent throughput.
> However, upon scaling up I eventually hit this error;
>
> ERROR:  multixact "members" limit exceeded
> DETAIL:  This command would create a multixact with 2 members, but the
> remaining space is only enough for 0 members.
> HINT:  Execute a database-wide VACUUM in database with OID 16467 with
> reduced vacuum_multixact_freeze_min_age and
> vacuum_multixact_freeze_table_age settings.
>
> And it took me quite a while to identify that it appears to be coming from
> the temporary table (the other 'main' tables were being autovacuumed OK) -
> which makes sense because they have a long lifetime, aren't auto vacuumed
> and shared by transactions (in turn).
>
> I first attempted to overcome this by introducing an initial step of
> always creating the temporary table before the copy (and using on commit
> drop) but this lead to a terrible performance degradation.
> Next, I reverted the above and instead I introduced a VACUUM step every
> 100 (configurable) ingest operations
> Finally, I introduced a TRUNCATE step in addition to the occasional VACUUM
> since the TRUNCATE allowed the COPY option of FREEZE.
>
> The new overhead appears minimal until after several hours and again I've
> hit a performance degradation seemingly dominated by the TRUNCATE.
>
> My questions are;
>
> 1) Is the VACUUM necessary if I use TRUNCATE + COPY FREEZE (on the
> temporary table)?
> 2) Is there really any benefit to using FREEZE here or is it best to just
> VACUUM the temporary tables occasionally?
> 3) Is there a better way of managing all this!? Perhaps re-CREATING the TT
> every day or something?
>
> I understand that I can create a Linux tmpfs partition for a tablespace
> for the temporary tables and that may speed up the TRUNCATE but that seems
> like a hack and I'd rather not do it at all if it's avoidable.
>
> Thanks for your help,
>
> Jim
>
> PS. PG version in use is 15.4 if that matters here
>
> --
> Jim Vanns
> Principal Production Engineer
> Industrial Light & Magic, London
>


VACUUM on temp table blocks VACUUM on another...

2024-08-01 Thread Jim Vanns
I have two sessions, each creating a temporary table of the same name -
'foobar'. Why would a vacuum of 'foobar' from one session block the vacuum
of 'foobar' in the other session? They're in separate tablespaces
(pg_temp_NN) and effectively distinct with respect to the sessions.

Cheers,

Jim

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Re: VACUUM on temp table blocks VACUUM on another...

2024-08-01 Thread Jim Vanns
Sorry, PG15.

Jim

On Thu, 1 Aug 2024 at 20:25, Adrian Klaver 
wrote:

> On 8/1/24 12:12, Tom Lane wrote:
> > Jim Vanns  writes:
> >> I have two sessions, each creating a temporary table of the same name -
> >> 'foobar'. Why would a vacuum of 'foobar' from one session block the
> vacuum
> >> of 'foobar' in the other session?
> >
> > That's fairly hard to believe.  Can you provide a self-contained test
> > case?
>
> Would also be nice to know what version of Postgres and where you are
> running the server?
>
> >
> >   regards, tom lane
> >
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Re: VACUUM on temp table blocks VACUUM on another...

2024-08-01 Thread Jim Vanns
I don't at hand, but the query I used to identify this was;

SELECT state, pid, pg_blocking_pids(pid), query
FROM pg_stat_activity
WHERE backend_type='client backend';

On Thu, 1 Aug 2024 at 20:12, Tom Lane  wrote:

> Jim Vanns  writes:
> > I have two sessions, each creating a temporary table of the same name -
> > 'foobar'. Why would a vacuum of 'foobar' from one session block the
> vacuum
> > of 'foobar' in the other session?
>
> That's fairly hard to believe.  Can you provide a self-contained test
> case?
>
> regards, tom lane
>


-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Re: VACUUM on temp table blocks VACUUM on another...

2024-08-01 Thread Jim Vanns
PG 15.4

On Thu, 1 Aug 2024 at 22:27, Ron Johnson  wrote:

> PG 15.0 or 15.7?
>
> On Thu, Aug 1, 2024 at 3:58 PM Jim Vanns  wrote:
>
>> Sorry, PG15.
>>
>> Jim
>>
>> On Thu, 1 Aug 2024 at 20:25, Adrian Klaver 
>> wrote:
>>
>>> On 8/1/24 12:12, Tom Lane wrote:
>>> > Jim Vanns  writes:
>>> >> I have two sessions, each creating a temporary table of the same name
>>> -
>>> >> 'foobar'. Why would a vacuum of 'foobar' from one session block the
>>> vacuum
>>> >> of 'foobar' in the other session?
>>> >
>>> > That's fairly hard to believe.  Can you provide a self-contained test
>>> > case?
>>>
>>> Would also be nice to know what version of Postgres and where you are
>>> running the server?
>>>
>>> >
>>> >   regards, tom lane
>>> >
>>> >
>>>
>>> --
>>> Adrian Klaver
>>> adrian.kla...@aklaver.com
>>>
>>>
>>
>> --
>> Jim Vanns
>> Principal Production Engineer
>> Industrial Light & Magic, London
>>
>

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Re: VACUUM on temp table blocks VACUUM on another...

2024-08-02 Thread Jim Vanns
Official PGDB RPM build for Centos 7.9

On Thu, 1 Aug 2024 at 23:31, Adrian Klaver 
wrote:

> On 8/1/24 15:13, Jim Vanns wrote:
> > PG 15.4
>
> Community edition or some variant?
>
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Re: VACUUM on temp table blocks VACUUM on another...

2024-08-02 Thread Jim Vanns
*PGDG

On Fri, 2 Aug 2024 at 08:38, Jim Vanns  wrote:

> Official PGDB RPM build for Centos 7.9
>
> On Thu, 1 Aug 2024 at 23:31, Adrian Klaver 
> wrote:
>
>> On 8/1/24 15:13, Jim Vanns wrote:
>> > PG 15.4
>>
>> Community edition or some variant?
>>
>>
>>
>>
>> --
>> Adrian Klaver
>> adrian.kla...@aklaver.com
>>
>>
>
> --
> Jim Vanns
> Principal Production Engineer
> Industrial Light & Magic, London
>


-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Trouble understanding how to avoid/manage ERROR: multixact "members" limit exceeded

2024-08-09 Thread Jim Vanns
Hi pggen community!

I am struggling with this error almost daily now and despite various
efforts, not succeeding in avoiding or dealing with it;

ERROR:  multixact "members" limit exceeded
DETAIL:  This command would create a multixact with 2 members, but the
remaining space is only enough for 0 members.
HINT:  Execute a database-wide VACUUM in database with OID 16467 with
reduced vacuum_multixact_freeze_min_age and
vacuum_multixact_freeze_table_age settings.

Runtime details follow below, but before that; I am now seeing the above
error almost daily after approximately 12 hours of normal or expected
behaviour and throughput. Then it hits and all writes are blocked etc. and
the service is largely unusable/unable to recover. Restarting PG does allow
autovacuum processes to kick in with aggressive vacuuming to handle the
multixact freezing, but that isn't a suitable solution! Although having
read sources that now explain why multixact XIDs exist and when they're
used, I am not able to properly figure out how to plan for it or configure
postgresql appropriately to handle it given our workload.

My questions are;

1) How should I be managing this? Although not new to PG, I am new to this
particular problem.
2) How can I confirm what processes/tables are contributing to this
multixact "members" limit?
3) What are the units for
vacuum_multixact_freeze_min_age/vacuum_multixact_freeze_table_age and how
should I be
setting them appropriately for my rates etc.? I can't really find
anything that explains this clearly.
4) How can I check that autovacuum workers are specifically able to freeze
multixact XIDs and thus avoid this?
5) Can I check if autovacuum is actually succeeding in its work?

Schema (brief):
10 tables
1 table (the 'hottest) partitioned by 2h ranges; so an extra 12x
partitions/relations a day, though only the most recent one might be
considered 'active'

System (brief):
PG: 15.5 w/ TimescaleDB 2.14 extension
Write heavy workload;
Mean Txn/s (low):   8k
Mean Txn/s (high): 10k
Mean rows/s: 100k
Concurrency: 32 threads (local socket sessions) for 'upserts' via primary
service plus auxiliary processes (background workers/autovacuum workers
etc.)

Pattern (brief):
COPY (binary) to session-local temporary tables (ON COMMIT DELETE)
INSERT FROM tt TO main_table(s)
UPDATE FROM tt TO main_table(s)
VACUUM tt (every 100k txns)

Config (excerpt):
# - Transactions - (based on a period mean of ~8k txn/s)
# See/ref;
# www.postgresql.org/docs/15/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
#
www.percona.com/blog/tuning-autovacuum-in-postgresql-and-autovacuum-internals
# blog.sentry.io/transaction-id-wraparound-in-postgres
#
https://blog.danslimmon.com/2023/12/11/concurrent-locks-and-multixacts-in-postgres
#
https://pganalyze.com/blog/5mins-postgres-multiXact-ids-foreign-keys-performance
# *What unit is 'age' here? No. of. transactions?*
vacuum_freeze_min_age = 2880 # 1h @ 8k/s
vacuum_multixact_freeze_min_age = 2880 # 1h @ 8k/s
autovacuum_freeze_max_age = 15760 # 2h @ 8k/s
autovacuum_multixact_freeze_max_age = 5760 # 2h @ 8k/s
vacuum_multixact_freeze_table_age = 11520 # 4h @ 8k/s

# We don't UPDATE or DELETE often; we never DELETE and only perform
# UPDATE operations every 6h due to internal cache(s). So we set this to
# zero so the formula doesn't consider it a major factor
autovacuum_vacuum_threshold = 0 # In tuples/rows UPDATEd or DELETEd
autovacuum_vacuum_insert_threshold = 5000 # In tuples/rows INSERTed

autovacuum_vacuum_scale_factor = 0.1 # 10%
autovacuum_analyze_scale_factor = 0.1 # 10%
autovacuum_vacuum_insert_scale_factor = 0.1 # 10%

autovacuum_naptime = 60
autovacuum_max_workers = 8

# Give autovacuum more credits to ensure a better chance at scanning
autovacuum_vacuum_cost_limit = 2000 # 10x the default

Thanks for your help and any guidance/knowledge you can share!

Jim

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Re: Trouble understanding how to avoid/manage ERROR: multixact "members" limit exceeded

2024-08-09 Thread Jim Vanns
To clear up any confusion, I tried the reductions as hinted in the
error message and after a few iterations still faced the same error.
Also, re. the sample configuration I provided - this is the result of
said iterations after initially using the stock values from the PGDG
RPM.

Jim

On Fri, 9 Aug 2024 at 11:26, Jim Vanns  wrote:
>
> Hi pggen community!
>
> I am struggling with this error almost daily now and despite various efforts, 
> not succeeding in avoiding or dealing with it;
>
> ERROR:  multixact "members" limit exceeded
> DETAIL:  This command would create a multixact with 2 members, but the 
> remaining space is only enough for 0 members.
> HINT:  Execute a database-wide VACUUM in database with OID 16467 with reduced 
> vacuum_multixact_freeze_min_age and vacuum_multixact_freeze_table_age 
> settings.
>
> Runtime details follow below, but before that; I am now seeing the above 
> error almost daily after approximately 12 hours of normal or expected 
> behaviour and throughput. Then it hits and all writes are blocked etc. and 
> the service is largely unusable/unable to recover. Restarting PG does allow 
> autovacuum processes to kick in with aggressive vacuuming to handle the 
> multixact freezing, but that isn't a suitable solution! Although having read 
> sources that now explain why multixact XIDs exist and when they're used, I am 
> not able to properly figure out how to plan for it or configure postgresql 
> appropriately to handle it given our workload.
>
> My questions are;
>
> 1) How should I be managing this? Although not new to PG, I am new to this 
> particular problem.
> 2) How can I confirm what processes/tables are contributing to this multixact 
> "members" limit?
> 3) What are the units for 
> vacuum_multixact_freeze_min_age/vacuum_multixact_freeze_table_age and how 
> should I be
> setting them appropriately for my rates etc.? I can't really find 
> anything that explains this clearly.
> 4) How can I check that autovacuum workers are specifically able to freeze 
> multixact XIDs and thus avoid this?
> 5) Can I check if autovacuum is actually succeeding in its work?
>
> Schema (brief):
> 10 tables
> 1 table (the 'hottest) partitioned by 2h ranges; so an extra 12x 
> partitions/relations a day, though only the most recent one might be 
> considered 'active'
>
> System (brief):
> PG: 15.5 w/ TimescaleDB 2.14 extension
> Write heavy workload;
> Mean Txn/s (low):   8k
> Mean Txn/s (high): 10k
> Mean rows/s: 100k
> Concurrency: 32 threads (local socket sessions) for 'upserts' via primary 
> service plus auxiliary processes (background workers/autovacuum workers etc.)
>
> Pattern (brief):
> COPY (binary) to session-local temporary tables (ON COMMIT DELETE)
> INSERT FROM tt TO main_table(s)
> UPDATE FROM tt TO main_table(s)
> VACUUM tt (every 100k txns)
>
> Config (excerpt):
> # - Transactions - (based on a period mean of ~8k txn/s)
> # See/ref;
> # www.postgresql.org/docs/15/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND
> # 
> www.percona.com/blog/tuning-autovacuum-in-postgresql-and-autovacuum-internals
> # blog.sentry.io/transaction-id-wraparound-in-postgres
> # 
> https://blog.danslimmon.com/2023/12/11/concurrent-locks-and-multixacts-in-postgres
> # 
> https://pganalyze.com/blog/5mins-postgres-multiXact-ids-foreign-keys-performance
> # What unit is 'age' here? No. of. transactions?
> vacuum_freeze_min_age = 2880 # 1h @ 8k/s
> vacuum_multixact_freeze_min_age = 2880 # 1h @ 8k/s
> autovacuum_freeze_max_age = 15760 # 2h @ 8k/s
> autovacuum_multixact_freeze_max_age = 5760 # 2h @ 8k/s
> vacuum_multixact_freeze_table_age = 11520 # 4h @ 8k/s
>
> # We don't UPDATE or DELETE often; we never DELETE and only perform
> # UPDATE operations every 6h due to internal cache(s). So we set this to
> # zero so the formula doesn't consider it a major factor
> autovacuum_vacuum_threshold = 0 # In tuples/rows UPDATEd or DELETEd
> autovacuum_vacuum_insert_threshold = 5000 # In tuples/rows INSERTed
>
> autovacuum_vacuum_scale_factor = 0.1 # 10%
> autovacuum_analyze_scale_factor = 0.1 # 10%
> autovacuum_vacuum_insert_scale_factor = 0.1 # 10%
>
> autovacuum_naptime = 60
> autovacuum_max_workers = 8
>
> # Give autovacuum more credits to ensure a better chance at scanning
> autovacuum_vacuum_cost_limit = 2000 # 10x the default
>
> Thanks for your help and any guidance/knowledge you can share!
>
> Jim
>
> --
> Jim Vanns
> Principal Production Engineer
> Industrial Light & Magic, London



-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London




BitmapOr node not used in plan for ANY/IN but is for sequence of ORs ...

2024-11-12 Thread Jim Vanns
In PG16.4, we have a table of key/pair data (around 30M rows) where there
are about 7 distinct keys and each has a conditional or partial index on
them (the distribution is different for each key/value pair combination).
I've found that when we have a query that uses an OR then those partial
indexes are used but not if the query is written to use ANY/IN, which is
more convenient from a programmer POV (especially any with 3rd party query
generators etc.). Naturally, the result sets returned by the queries are
identical due to the filter semantics of any of the 3 solution variants.

Here's a shareable, MRP;

https://dbfiddle.uk/OKs_7HWv

Is there any trick I can do to get the planner to make use of the
conditional/partial index? Or is this simply an unoptimised code path yet
to be exploited!?

Cheers,

Jim

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London


Parallel workers via functions?

2025-01-27 Thread Jim Vanns
Apologies for the rushed question ...

If I have a function that is marked 'stable parallel safe' and returns
a table, can a calling function or procedure (marked volatile parallel
unsafe) still take advantage of the parallel workers from the first
function - as the data source. I.e.

func_a(); // selects, returns table, parallel safe
func_b() {
   insert into foo
   select * from func_a(); // Will func_a still execute parallel
workers to fetch the data?
}

Or even if func_b() uses 'create temporary table as select * from
func_a()' and then insert?

I ask because when I simply call func_a() from a psql shell, I see the
parallel workers run and everything is nice and swift. But when called
from a data-modifying function like func_b(), no workers are spawned
:( Even from the read-part of the code.

Are there differences in functions vs. stored procedures that might
affect the behaviour of the planner to disregard workers?

Cheers

Jim

-- 
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London




Re: Parallel workers via functions?

2025-01-28 Thread Jim Vanns
Thanks for the reply Laurenz. Inline replies follow...

On Tue, 28 Jan 2025 at 04:47, Laurenz Albe  wrote:
>
> On Mon, 2025-01-27 at 18:08 +0000, Jim Vanns wrote:
> > If I have a function that is marked 'stable parallel safe' and returns
> > a table, can a calling function or procedure (marked volatile parallel
> > unsafe) still take advantage of the parallel workers from the first
> > function - as the data source. I.e.
> >
> > func_a(); // selects, returns table, parallel safe
> > func_b() {
> >insert into foo
> >select * from func_a(); // Will func_a still execute parallel
> > workers to fetch the data?
> > }
> >
> > Or even if func_b() uses 'create temporary table as select * from
> > func_a()' and then insert?
> >
> > I ask because when I simply call func_a() from a psql shell, I see the
> > parallel workers run and everything is nice and swift. But when called
> > from a data-modifying function like func_b(), no workers are spawned
> > :( Even from the read-part of the code.
> >
> > Are there differences in functions vs. stored procedures that might
> > affect the behaviour of the planner to disregard workers?
>
> See 
> https://www.postgresql.org/docs/current/when-can-parallel-query-be-used.html

Thanks. Yup, read that. Seems easy enough to understand... however...

> The problem here is the INSERT.  Data modifying statements won't use
> parallel query.

OK, that's clear enough.

> There are exceptions: CREATE TABLE ... AS SELECT ... should be able
> to use parallel query.

I've been experimenting with this. The problem deepens... It seems
that actually, it's the function itself - func_a() in my example
above. Even simply calling that from psql doesn't spawn parallel
workers to run as part of the query defined in the funcion body. But
if I copy the body of the function and paste it into a psql shell, it
does parallelise. This function is marked STABLE PARALLEL SAFE though.
Are there limitations or restrictions I'm missing!? I'll try to find
the time to provide a MRP but I'm hoping somebody will just magically
know what the problem is or at least could be!

So... I am still confused! This is PG 15.5 BTW.

Jim

> Yours,
> Laurenz Albe




--
Jim Vanns
Principal Production Engineer
Industrial Light & Magic, London