Upgrade 96 -> 11

2019-09-01 Thread James Sewell
Hi,

I'm in the process of upgrading from 96 -> 11 (on RHEL 7.3) . Both the
versions have PostGIS 2.5.1 installed and working.

pg_upgrade fails with:

pg_restore: [archiver (db)] Error from TOC entry 440; 1259 537086 TABLE
tablename databasename
pg_restore: [archiver (db)] could not execute query: ERROR:  relation
"public.spatial_ref_sys" does not exist
LINE 39: "location_pt" "public"."geography"(Point,4283),

On looking further at the sequence of events I can see that:

   1. The PostGIS extension is created (but somehow the related tables such
   as spatial_ref_sys do not get created)
   2. The tablename table gets created causing the above error
   3. At some point later in the upgrade  spatial_ref_sys is to be created


Is there any way round this type of issue (I guess forcing spatial_ref_sys to
be created either with the extension as normal or just before any tables
which rely on it).

Cheers,
James Sewell,

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Upgrade 96 -> 11

2019-09-02 Thread James Sewell
On Tue, 3 Sep 2019 at 7:15 am, Adrian Klaver 
wrote:

> On 9/2/19 2:04 PM, James Sewell wrote:
> Please reply to list also.
> Ccing list.
> >
> >
> > On Mon, 2 Sep 2019 at 11:56 pm, Adrian Klaver  > <mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 9/1/19 9:03 PM, James Sewell wrote:
> >  > Hi,
> >  >
> >  > I'm in the process of upgrading from 96 -> 11 (on RHEL 7.3) .
> > Both the
> >  > versions have PostGIS 2.5.1 installed and working.
> >  >
> >  > pg_upgrade fails with:
> >  >
> >  > pg_restore: [archiver (db)] Error from TOC entry 440; 1259 537086
> > TABLE
> >  > tablename databasename
> >  > pg_restore: [archiver (db)] could not execute query: ERROR:
> >   relation
> >  > "public.spatial_ref_sys" does not exist
> >  > LINE 39: "location_pt" "public"."geography"(Point,4283),
> >  >
> >
> > You used the 11 version of pg_upgrade, correct?
> >
> >
> > Correct
> >
> >
> >
> >
> >  > On looking further at the sequence of events I can see that:
> >  >
> >  >  1. The PostGIS extension is created (but somehow the related
> tables
> >  > such as spatial_ref_sys do not get created)
> >  >  2. The tablename table gets created causing the above error
> >  >  3. At some point later in the upgrade spatial_ref_sys is to be
> > created
> >
> >
> > Questions:
> >
> > 1) How was PostGIS installed on the 9.6.? and 11.? versions?
> > Where extensions used or was the manual method used?
> >
> >
> > Both from extensions, although 9.6 was and older version which was
> > upgraded (with ALTER)
> >
> >
> >
> > 2) Did you end up with a working install in 11.?
> >
> >
> > Yes. If I create a cluster and start it and create the extension all is
> > fine (postgis tabkes created).
>
> So this is a separate cluster from the one you used pg_upgrade on?


In that case yes it was seperate


>
> In other words when doing pg_upgrade you could not get a working
> cluster, correct?


Pg_upgrade does not complete - but as it hasn’t started migrating data I
can start it and it’s fine (except for postgis which is now in a partial
install state)


>
>
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
-- 
James Sewell,
Chief Architect

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
P (+61) 2 8099 9000  W www.jirotech.com  F (+61) 2 8099 9099

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Upgrade 96 -> 11

2019-09-02 Thread James Sewell
On Tue, 3 Sep 2019 at 7:41 am, Adrian Klaver 
wrote:

> On 9/2/19 2:20 PM, James Sewell wrote:
>
> >
> > So this is a separate cluster from the one you used pg_upgrade on?
> >
> >
> > In that case yes it was seperate
> >
> >
> >
> > In other words when doing pg_upgrade you could not get a working
> > cluster, correct?
> >
> >
> > Pg_upgrade does not complete - but as it hasn’t started migrating data I
> > can start it and it’s fine (except for postgis which is now in a partial
> > install state)
>
> Now I will actually respond to list:)
>
> So from your original post:
>
> 1) "The tablename table gets created causing the above error"
>
> Is 'tablename' a user table or part of PostGIS?
>
>
The tablename table is a user table, spatial_ref_sys is a postgis table.

2) "Both the versions have PostGIS 2.5.1 installed and working"
>
> How do you know it is working on the 11 version?


By version I mean 11 - I can init a new cluster and it’s fine

The more I look at this it seems like:

A) pg_upgrade somehow creates postgis without supporting tables
B) while the tables would be created from the schema dump files, they
happen too late


>
>
> >
> >
> >
> >
> >
> >
> >
> > --
>
> > James Sewell,
> > Chief Architect
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
-- 
James Sewell,
Chief Architect

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
P (+61) 2 8099 9000  W www.jirotech.com  F (+61) 2 8099 9099

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Upgrade 96 -> 11

2019-09-02 Thread James Sewell
>
>
>
> > I am going to assume then that it has to do with this:
> > "LINE 39: "location_pt" "public"."geography"(Point,4283), "
> >
> > What is the above pointing to?
> >
> >
> > This needs the PostGIS types and tables to work - they don't exist as
> > they were not created with CREATE EXTENSION, but rather are to be
> > created later in the restore.
> >
> >
> >
> >  >
> >  > 2) "Both the versions have PostGIS 2.5.1 installed and
> working"
> >  >
> >  > How do you know it is working on the 11 version?
> >  >
> >  >
> >  > By version I mean 11 - I can init a new cluster and it’s fine
> >  >
> >  > The more I look at this it seems like:
> >  >
> >  > A) pg_upgrade somehow creates postgis without supporting tables
> >
> > So you see a CREATE EXTENSION postgis?
> >
> > Is there anything in the logs that pertains to the above?
> >
> >
> > Yes and no. Just the create extension.
> >
> >
> >
> >  > B) while the tables would be created from the schema dump files,
> > they
> >  > happen too late
> >
> > What happens if you use pg_dump from version 11 against the 9.6
> cluster
> > with the -s and -p?
> >
> > Does it shows the schema creation in the order you need?
> >
> >
> > Yes. It's only in the binary_upgrade mode I see issues (because the
> > extension stuff isn't created with CREATE EXTENSION)
> >
>
> What is the pg_upgrade command you are using?
>

pg_upgrade --link --username postgres

I have set PGBINOLD, PGBINNEW, PGDATAOLD, PGDATANEW correctly.


>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Upgrade 96 -> 11

2019-09-02 Thread James Sewell
On Tue, 3 Sep 2019 at 07:55, Adrian Klaver 
wrote:

> On 9/2/19 2:45 PM, James Sewell wrote:
> >
> >
> > On Tue, 3 Sep 2019 at 7:41 am, Adrian Klaver  > <mailto:adrian.kla...@aklaver.com>> wrote:
> >
> > On 9/2/19 2:20 PM, James Sewell wrote:
> >
> >  >
> >  > So this is a separate cluster from the one you used
> > pg_upgrade on?
> >  >
> >  >
> >  > In that case yes it was seperate
> >  >
> >  >
> >  >
> >  > In other words when doing pg_upgrade you could not get a
> working
> >  > cluster, correct?
> >  >
> >  >
> >  > Pg_upgrade does not complete - but as it hasn’t started migrating
> > data I
> >  > can start it and it’s fine (except for postgis which is now in a
> > partial
> >  > install state)
> >
> > Now I will actually respond to list:)
> >
> > So from your original post:
> >
> > 1) "The tablename table gets created causing the above error"
> >
> > Is 'tablename' a user table or part of PostGIS?
> >
> >
> > The tablename table is a user table, spatial_ref_sys is a postgis table.
>
> I am going to assume then that it has to do with this:
> "LINE 39: "location_pt" "public"."geography"(Point,4283), "
>
> What is the above pointing to?
>

This needs the PostGIS types and tables to work - they don't exist as they
were not created with CREATE EXTENSION, but rather are to be created later
in the restore.

>
>
> >
> > 2) "Both the versions have PostGIS 2.5.1 installed and working"
> >
> > How do you know it is working on the 11 version?
> >
> >
> > By version I mean 11 - I can init a new cluster and it’s fine
> >
> > The more I look at this it seems like:
> >
> > A) pg_upgrade somehow creates postgis without supporting tables
>
> So you see a CREATE EXTENSION postgis?
>
> Is there anything in the logs that pertains to the above?
>

Yes and no. Just the create extension.


>
>
> > B) while the tables would be created from the schema dump files, they
> > happen too late
>
> What happens if you use pg_dump from version 11 against the 9.6 cluster
> with the -s and -p?
>
> Does it shows the schema creation in the order you need?
>

Yes. It's only in the binary_upgrade mode I see issues (because the
extension stuff isn't created with CREATE EXTENSION)


>
> >
> >
> >
> >
> >  >
> >  >
> >  >
> >  >
> >  >
> >  >
> >  >
> >  > --
>
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
> >
> > --
> > James Sewell,
> > Chief Architect
> >
> > Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
> > P (+61) 2 8099 9000  W www.jirotech.com
> > <http://www.jirotech.com>  F (+61) 2 8099 9099
> >
> > 
> > The contents of this email are confidential and may be subject to legal
> > or professional privilege and copyright. No representation is made that
> > this email is free of viruses or other defects. If you have received
> > this communication in error, you may not copy or distribute any part of
> > it or otherwise disclose its contents to anyone. Please advise the
> > sender of your incorrect receipt of this correspondence.
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Upgrade 96 -> 11

2019-09-02 Thread James Sewell
On Tue, 3 Sep 2019 at 10:34, Adrian Klaver 
wrote:

> On 9/2/19 5:20 PM, James Sewell wrote:
> >
>
> > What is the pg_upgrade command you are using?
> >
> >
> > pg_upgrade --link --username postgres
>
> Where are you in being able to?:
> https://www.postgresql.org/docs/11/pgupgrade.html
>
> 16. Reverting to old cluster
>
> ...
>
> If the --link option was used, the data files might be shared between
> the old and new cluster:
>
>
It's still creating the schema elements when it fails, it hasn't started
linking yet


> >
> > I have set PGBINOLD, PGBINNEW, PGDATAOLD, PGDATANEW correctly.
> >
> >
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Upgrade 96 -> 11

2019-09-03 Thread James Sewell
On Wed, 4 Sep 2019 at 5:47 am, Adrian Klaver 
wrote:

> On 9/2/19 5:52 PM, Adrian Klaver wrote:
>
> >> It's still creating the schema elements when it fails, it hasn't
> >> started linking yet
> >
> > Alright at least you still a working 9.6 cluster .
> >
> > Not sure where to go from here. Like you I am not sure how it can CREATE
> > EXTENSION and not actually follow through on that. Especially with no
> > errors for that operation. I'm going to have to think on this. Hopefully
> > someone else has an idea on this and can chime in.
>
> Aah. I don't have postgis installed, still:
>
> pg_dump --binary-upgrade -s  -d production -U postgres >
> production_binary.sql
>
> -- For binary upgrade, create an empty extension and insert objects into it
> DROP EXTENSION IF EXISTS tablefunc;
> SELECT pg_catalog.binary_upgrade_create_empty_extension('tablefunc',
> 'public', true, '1.0', NULL, NULL, ARRAY[]::pg_catalog.text[]);
>
>
> Try the above on your schema and see what you get.


Yep - an empty extension. I think this logic is wrong. Creating an empty
extension is fine and makes sense but extension owned relations should be
created as the next step, not just at some time later.

>
>
>
> >
> >>
> >>  >
> >>  > I have set PGBINOLD, PGBINNEW, PGDATAOLD, PGDATANEW correctly.
> >>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
-- 
James Sewell,
Chief Architect

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
P (+61) 2 8099 9000  W www.jirotech.com  F (+61) 2 8099 9099

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Upgrade 96 -> 11

2019-09-03 Thread James Sewell
If anyone hits this it is an issue with using the Geography type with a non
4326 SRID on a table and pg_upgrade.

It should be fixed (the check dropped as it's something of a relic) in the
next version of PostGIS. In the meantime you would have to patch it out
yourself.

(
https://github.com/postgis/postgis/blob/svn-trunk/postgis/gserialized_typmod.c#L296
 )


On Wed, 4 Sep 2019 at 10:03, Adrian Klaver 
wrote:

> On 9/3/19 3:45 PM, James Sewell wrote:
> >
> >
>
> >
> > -- For binary upgrade, create an empty extension and insert objects
> > into it
> > DROP EXTENSION IF EXISTS tablefunc;
> > SELECT pg_catalog.binary_upgrade_create_empty_extension('tablefunc',
> > 'public', true, '1.0', NULL, NULL, ARRAY[]::pg_catalog.text[]);
> >
> >
> > Try the above on your schema and see what you get.
> >
> >
> > Yep - an empty extension. I think this logic is wrong. Creating an empty
> > extension is fine and makes sense but extension owned relations should
> > be created as the next step, not just at some time later.
> >
>
> So to be clear you ran pg_dump with  --binary-upgrade and the extension
> elements where created after the user table that tripped the error?
>
>
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Max locks

2019-12-19 Thread James Sewell
Hello all,

I have a system which is giving me the log hint to increase
max_locks_per_transaction. This is somewhat expected due to the workload -
but what I can't understand is the numbers:

Based on the docs I calculate my theoretical max locks as:

max_locks_per_transaction * (max_connections + max_prepared_transactions)
256 * (600 + 0) = *153600*

However, looking at my Prometheus monitoring (polling every 15s) which does
a SELECT from pg_locks and groups by mode I can see there are over 500K
AccessShareLocks consistently (up to around 570K at peak).

How is this possible???

Cheers,

James Sewell,

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Partitioned tables and locks

2019-12-19 Thread James Sewell
Is it expected that a lock on a partitioned table will take out 2 locks per
child regardless of the number of children which are excluded at plan time?

For example I can select count(*) from a table with 3500 partitions in
transaction, and see 7000 AccessShareLocks show up till I finish the
transaction.

Bonus question - sometimes if I query pg_locks fast enough I can see ~30K
locks - how could this possibly be the case. This is on an otherwise idle
system.

Cheers,
James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Partitioned tables and locks

2019-12-19 Thread James Sewell
> Is it expected that a lock on a partitioned table will take out 2 locks
> per
> > child regardless of the number of children which are excluded at plan
> time?
>
> Depends on the details of your query, and on which PG version you're
> using, but it's by no means surprising for each child table to get
> locked.  (I'm not sure where *two* locks would come from, though.)


The queries were just select count(*) both with a where clause which
excludes some partitions and without, I’m on 11.

The table has over 3K partitions

>
>
> If you're working with massively partitioned tables, increasing
> max_locks_per_transaction is a good idea.


It’s 256 at the moment, with 600 connections - guess I need more.
-- 
James Sewell,
Chief Architect

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
P (+61) 2 8099 9000  W www.jirotech.com  F (+61) 2 8099 9099

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max locks

2019-12-19 Thread James Sewell
> pg_locks shows exactly two types of locks: "heavy" locks (which are not
> merely relation locks but also object, tuple, extension, transaction and
> advisory locks), and predicate locks (SIReadLock) which are limited by a
> separate configuration parameter (and are not relevant in this case
> based on the prior IRC discussion).
>
>  Peter> Filter by locktype = 'relation' to get the appropriate view.
>
> This is incorrect (some predicate locks may be tagged 'relation' as
> well, and some heavy locks will not be relation locks).
>
> There's also the question of fastpath locks, but I believe there can
> only be a small number of these (16?) per backend, so that wouldn't
> account for this.


OK - So how would I calculate the total number of locks which are relevant
here (it’s a bit unclear if this is all heavy locks and how to find them)?
Ideally I want to alert on locks > configured locks


>
> I think what's going on is that the max size of the lock hashtable isn't
> strictly enforced; it'll add enough memory space for the configured
> number of locks to the total size of the shared memory segment, but it
> won't actually report an error until shared memory is actually
> exhausted, and it's possible that there may be unused space.
> (Performance may degrade if there are more locks than the configured
> maximum, because the hash table will have been sized for that maximum
> and can't be grown.) See comments for ShmemInitHash.


How much memory is consumed per configured lock? Is this removed from the
shared buffers size or added to it?

Thanks for the follow up!

>
> --
James Sewell,
Chief Architect

Suite 112, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
P (+61) 2 8099 9000  W www.jirotech.com  F (+61) 2 8099 9099

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Lock leaking out of Transaction?

2020-01-14 Thread James Sewell
Hi all,

I am trying to chase down a locking issue - it looks like a materialized
view refresh is being held up by a relation  lock which is held by an out
of transaction session. My understanding was that this was not possible
(see SQL output below).

The locking session is making progress (I can see query_start advancing),
which makes it even more confusing.

Any advice?

# select * from pg_locks l join pg_stat_activity a on l.pid = a.pid where
relation = 1438729::regclass;
-[ RECORD 1
]--+-
locktype | relation
database | 16428
relation | 1438729
page |
tuple |
virtualxid |
transactionid |
classid |
objid |
objsubid |
virtualtransaction | 3/26588281
pid | 88955
mode | ShareUpdateExclusiveLock
granted | f
fastpath | f
datid | 16428
datname | monitoring
pid | 88955
usesysid |
usename |
application_name |
client_addr |
client_hostname |
client_port |
backend_start | 14-JAN-20 11:50:25.139819 +11:00
xact_start | 14-JAN-20 16:27:40.534726 +11:00
query_start | 14-JAN-20 16:27:40.534726 +11:00
state_change | 14-JAN-20 16:27:40.534726 +11:00
wait_event_type | Lock
wait_event | relation
state | active
backend_xid |
backend_xmin | 1655752595
query | autovacuum: VACUUM supply_nodes (to prevent wraparound)
backend_type | autovacuum worker
-[ RECORD 2
]--+-
locktype | relation
database | 16428
relation | 1438729
page |
tuple |
virtualxid |
transactionid |
classid |
objid |
objsubid |
virtualtransaction | 254/8624453
pid | 6839
mode | ExclusiveLock
granted | f
fastpath | f
datid | 16428
datname | monitoring
pid | 6839
usesysid | 10
usename | postgres
application_name | psql.bin
client_addr |
client_hostname |
client_port | -1
backend_start | 14-JAN-20 17:02:53.860451 +11:00
xact_start | 14-JAN-20 18:01:49.211728 +11:00
query_start | 14-JAN-20 18:01:49.211728 +11:00
state_change | 14-JAN-20 18:01:49.21173 +11:00
wait_event_type | Lock
wait_event | relation
state | active
backend_xid |
backend_xmin | 1689815577
query | REFRESH MATERIALIZED VIEW CONCURRENTLY supply_nodes ;
backend_type | client backend
-[ RECORD 3
]--+-
locktype | relation
database | 16428
relation | 1438729
page |
tuple |
virtualxid |
transactionid |
classid |
objid |
objsubid |
virtualtransaction | 355/0
pid | 65447
mode | ExclusiveLock
granted | t
fastpath | f
datid | 16428
datname | monitoring
pid | 65447
usesysid | 169436
usename | f_process
application_name | PostgreSQL JDBC Driver
client_addr | 10.153.154.36
client_hostname |
client_port | 40899
backend_start | 14-JAN-20 18:00:02.784211 +11:00
xact_start |
query_start | 14-JAN-20 18:02:26.831979 +11:00
state_change | 14-JAN-20 18:02:26.833197 +11:00
wait_event_type | Client
wait_event | ClientRead
state | idle
backend_xid |
backend_xmin |
query | COMMIT
backend_type | client backend

James Sewell,

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Inherited an 18TB DB & need to backup

2020-05-17 Thread James Sewell
On Fri, 15 May 2020 at 17:09, Suhail Bamzena  wrote:

> Hello All,
> I have very recently inherited an 18 TB DB that is running version 9.2.
> Apparently this database has never been backed up and I have been tasked to
> set in a periodic backup routine (weekly full & daily incremental) and dump
> it into a NAS. What is the best way to go about this? Did some reading and
> hear that pgbackrest does a good job with such huge sizes. Your expert
> advise is needed.
>
>
Just chiming in for pgbackrest - we are backing up multiple databases in
the 50TB - 150TB range across multiple customers and it works very well
(and most importantly at that scale, very efficiently).

I've found the team really responsive to issues / bugs / errors reported
via GitHub when we hit them, which is always a bonus.

James Sewell,
*Chief Architect, Jirotech*
Suite 46, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+61) 2 8099 9000 <(+61)%202%208099%209000>  *W* www.jirotech.com  *F *
(+61) 2 8099 9099 <(+61)%202%208099%209000>

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Safe switchover

2020-07-09 Thread James Sewell
Hi all,

I’m trying to work out a procedure for a safe zero data loss switchover
under (high) load, which allows the old master to be reconnected without
the use of pgrewind.

Would the following be sane?

- open connection to database
- smart shutdown master
- terminate all other connections
- wait for shutdown (archiving will finish)
- Identify last archived WAL file name (ensure it’s on backup server.)
- wait till a standby has applied this WAL
- promote that standby
- attach old master to new master

Cheers,
James Sewell
-- 
James Sewell,
*Chief Architect, Jirotech*
Suite 46, Jones Bay Wharf, 26-32 Pirrama Road, Pyrmont NSW 2009
*P *(+61) 2 8099 9000 <(+61)%202%208099%209000>  *W* www.jirotech.com  *F *
(+61) 2 8099 9099 <(+61)%202%208099%209000>

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Safe switchover

2020-07-10 Thread James Sewell
> - open connection to database
> - smart shutdown master
> - terminate all other connections
> - wait for shutdown (archiving will finish)
>

OK despite what it looked like from the code - upon testing it seems like
even a fast shutdown will wait for logs to be archived *as long as progress
is being made* (the same as smart).

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: determine what column(s) form the primary key, in C extention

2020-08-30 Thread James Sewell
>
> (2)
>
> I'll use this C code as an example to build an extention in Rust. The
> Postgresql bindings for Rust I have don't contain a definition of
> `FirstLowInvalidHeapAttributeNumber` for some reason. I can define it
> since it's simply single digit constant.
>

Not an answer to your question - but use better bindings!
 https://github.com/zombodb/pgx  

[nix-shell:~/rust/pgx_master]$ grep -R FirstLowInvalidHeapAttributeNumber
pgx-pg-sys/
pgx-pg-sys/generated-bindings/pg12_specific.rs:pub const
FirstLowInvalidHeapAttributeNumber: i32 = -7;

Which is obviously not quite right still, so I pushed a version with some
extra includes. Now you will get:

[nix-shell:~/rust/pgx_master]$ grep -R FirstLowInvalidHeapAttributeNumber
./pgx-pg-sys/
./pgx-pg-sys/generated-bindings/pg10_specific.rs:pub const
FirstLowInvalidHeapAttributeNumber: i32 = -8;
./pgx-pg-sys/generated-bindings/pg12_specific.rs:pub const
FirstLowInvalidHeapAttributeNumber: i32 = -7;
./pgx-pg-sys/generated-bindings/pg11_specific.rs:pub const
FirstLowInvalidHeapAttributeNumber: i32 = -8;

You'll need to use the Github version not the crates.io until the next
release if you want all the versions.


Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-18 Thread James Sewell
>
> Right, the first step would be for a WSL user to figure out what's
> wrong with builds on the WSL and show us how to fix it; I heard
> through the grapevine that if you try it, initdb doesn't work (it must
> be something pretty subtle in the configure phase or something like
> that, since the Ubuntu .deb apparently works, except for the issue
> reported in this thread).


That's correct - initdb doesn't work when you've built on WSL as
*somehow* HAVE_FDATASYNC is set to 1 by configure - but it ends up not
being included by #ifdef blocks. This causes the following PANIC

PANIC:  unrecognized wal_sync_method: 1


Which happens because wal_sync is set to 1, but in
src/backend/access/transam/xlog.c that block in the switch is inside the
#ifdef so never gets checked,

--
James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-18 Thread James Sewell
> What do you mean by "not being included by #ifdef blocks"? The only
> guard in issue_xlog_fsync() is #ifdef HAVE_FDATASYNC, which ought to be
> independent of any includes?  I can see how this'd go wrong if configure
> did *not* detect fdatasync, because then
>

And now this looks like it works again from a clean build - something
screwy with WSL perhaps? Or me?

Either way, I can't reproduce - annoyingly.

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: WSL (windows subsystem on linux) users will need to turn fsync off as of 11.2

2019-02-18 Thread James Sewell
> Here's a starter patch that shows one of the approaches discussed.  It
> gets WSL users to a better place than they were before, by suppressing
> further warnings after the first one.
>

This wasn't quite right, updated to check erro for ENOSYS (not rc)

This compiles and stops the panic on WSL (with a single warning).

I haven't tested if a version compiled on Linux will behave the same way -
but based on the error messages in the top post it looks like the behavior
is the same.

--
James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.
From 9580cab0728a509e79be9d639937e0537f5d4cf6 Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Tue, 19 Feb 2019 10:47:14 +1300
Subject: [PATCH] Tolerate ENOSYS failure from sync_file_range().

One unintended consequence of commit 9ccdd7f6 was that Windows WSL
users now get a PANIC whenever PostgreSQL attempt to start flushing
data with sync_file_range(), because Windows does not implement that
system call.  Previously, it generated a stream of periodic warnings.

Prevent that by handling ENOSYS specially to avoid promotion to
PANIC, and suppress all attempts after the first.

Updated to correct the ENOSYS test to use errno rather than rc

Reported-by: Bruce Klein
Discussion: CA+mCpegfOUph2U4ZADtQT16dfbkjjYNJL1bSTWErsazaFjQW9A@mail.gmail.com">https://postgr.es/m/CA+mCpegfOUph2U4ZADtQT16dfbkjjYNJL1bSTWErsazaFjQW9A@mail.gmail.com
---
 src/backend/storage/file/fd.c | 21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index 213de76..7544bd5 100644
--- a/src/backend/storage/file/fd.c
+++ b/src/backend/storage/file/fd.c
@@ -420,6 +420,10 @@ pg_flush_data(int fd, off_t offset, off_t nbytes)
 #if defined(HAVE_SYNC_FILE_RANGE)
 	{
 		int			rc;
+		static bool	not_implemented_by_kernel = false;
+
+		if (not_implemented_by_kernel)
+			return;
 
 		/*
 		 * sync_file_range(SYNC_FILE_RANGE_WRITE), currently linux specific,
@@ -434,7 +438,22 @@ pg_flush_data(int fd, off_t offset, off_t nbytes)
 			 SYNC_FILE_RANGE_WRITE);
 		if (rc != 0)
 		{
-			ereport(data_sync_elevel(WARNING),
+			int			elevel;
+
+			/*
+			 * For systems that don't have an implementation of
+			 * sync_file_range() such as Windows WSL, generate only one
+			 * warning and then suppress all further attempts by this process.
+			 */
+			if (errno == ENOSYS)
+			{
+elevel = WARNING;
+not_implemented_by_kernel = true;
+			}
+			else
+elevel = data_sync_elevel(WARNING);
+
+			ereport(elevel,
 	(errcode_for_file_access(),
 	 errmsg("could not flush dirty data: %m")));
 		}
-- 
1.9.1



Max connections reached without max connections reached

2021-11-22 Thread James Sewell
Hi,

This is a bit of an odd one - I'm on PostgreSQL 11.9 with a single
streaming replica.

The system handles a lot of connections - we have a max_connections of 600.
Most are long lived JDBC, but there are a lot of ETL / ad-hoc jobs etc.

Connections normally sit at 300ish, with 70 active at the most. The
machines have 32 CPU cores . PgBouncer is sadly not an option hereas we are
using many long lived connections which make use of prepared statements.

Sometimes a strange condition occurs. The number of connections is well
under 600 (and dropping), but new connections are not being allowed into
the database, I can see this message in the logs:

   (0:53300)FATAL:  remaining connection slots are reserved for
non-replication superuser connections

>From ps I can see a lot of processes like this:

  postgres: accessor monitoring 10.10.7.54[41655] startup

The number of these grows, and no new connections are allowed in. These
startup connections do not appear in pg_stat_activity so I can't find what
they are waiting on.

Removing some long lived connections sometimes appears to help clear the
startup processes and return the system to normal - but I have not been
able to find a correlation. It's more blind luck at the moment.

Any pointers on where I could start prodding?

Cheers,
James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-11-23 Thread James Sewell
>
> What are you looking at to claim the number of connections is under 600?
> Maybe there's some disconnect between what you're measuring and what the
> database thinks.
>
> A different line of thought is that ProcArray slots can be consumed by
> things that aren't client connection processes, in particular
> (1) parallel-query workers
> (2) autovacuum workers
> Looking into pg_stat_activity when you see this issue might help
> clarify that.
>

Sorry, I could have been clearer - pg_stat_activity is what I'm looking at
- I'm recording connection info from here every 15 seconds (from a
superuser account so I don't get locked out). It never peaks above 300 (in
fact when the incident happens no new connections can come in so it falls
to around 100) - yet I'm seeing the log lines claim that I'm still hitting
597 (600 - 3 reserved).

If I measure from `ps  -ef | grep postgres` and look at the connections
then I can see that with the startup connections I am hitting this limit.

So client processes which are listed to the OS as "startup" ARE counted
towards the 597 connections, but are NOT reported in pg_stat_activity

James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-11-23 Thread James Sewell
>
> > Sorry, I could have been clearer - pg_stat_activity is what I'm looking
> > at - I'm recording connection info from here every 15 seconds (from a
> > superuser account so I don't get locked out). It never peaks above 300
> > (in fact when the incident happens no new connections can come in so it
> > falls to around 100) - yet I'm seeing the log lines claim that I'm still
> > hitting 597 (600 - 3 reserved).
>
> What is the query you are using against pg_stat_activity?
>
>
For general counts ad-hoc we just use:

SELECT count(*) FROM pg_stat_activity

The monitoring system runs:

SELECT CASE WHEN usename IS NOT NULL THEN usename ELSE 'SYSTEM' END as
role,
datname AS database,
state,
wait_event,
count(*) AS connection
FROM pg_stat_activity
GROUP BY 1,2,3,4

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-11-23 Thread James Sewell
>
>
> So I guess the question becomes why are they spending so much time in
> the startup state.  That should take mere milliseconds, unless the
> clients are being slow to handle the authentication exchange?
>
> I'm also wondering a bit about whether they're being blocked on a lock,
> eg. due to something taking an exclusive lock on pg_authid or pg_database.
> pg_locks might be interesting to check.
>

It's v odd as it happens *sometimes* - having said that it's happening
right this moment:

[enterprisedb@oprpgs001 edb-as-11]$ ps -ef|grep postgres:  | wc -l
517

[enterprisedb@oprpgs001 ~]$ ps -ef|grep postgres: | grep -i start | wc -l
480

postgres=# select count(*) from pg_stat_activity;
 count
---
   97
(1 row)

postgres=# select * from pg_locks where not granted;
 locktype | database | relation | page | tuple | virtualxid | transactionid
| classid | objid | objsubid | virtualtransaction | pid | mode | granted |
fastpath
--+--+--+--+---++---+-+---+--++-+--+-+--
(0 rows)

Logs are showing a stream of:

2021-11-24 10:50:58 AEDT [869]: [1-1]
user=u_rundeck_rw,db=rundeck_cluster,client=10.154.19.23
10.154.19.23(57122) (0:53300)FATAL:  remaining connection slots are
reserved for non-replication superuser connections
2021-11-24 10:50:58 AEDT [870]: [1-1]
user=u_rundeck_rw,db=rundeck_cluster,client=10.154.19.23
10.154.19.23(57124) (0:53300)FATAL:  remaining connection slots are
reserved for non-replication superuser connections
2021-11-24 10:50:58 AEDT [871]: [1-1]
user=u_rundeck_rw,db=rundeck_cluster,client=10.154.19.23
10.154.19.23(57126) (0:53300)FATAL:  remaining connection slots are
reserved for non-replication superuser connections

James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-11-23 Thread James Sewell
> The enterprisedb is one of their customized versions?
>
> rundeck_cluster below refers to https://digital.ai/technology/rundeck?
>
>
re: EnterpriseDB yes it is - I'm having this same discussion with them in
parallel
re: rundeck, yes - but this is only one of many things connecting. it's not
doing anything special.

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-11-23 Thread James Sewell
> > re: EnterpriseDB yes it is - I'm having this same discussion with them
> > in parallel
>
> What version of theirs?
>

PostgreSQL 11.9 (EnterpriseDB Advanced Server 11.9.18) on
x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5 20150623 (Red Hat
4.8.5-36), 64-bit

>
>
> > re: rundeck, yes - but this is only one of many things connecting. it's
> > not doing anything special.
>
> Except sending a stream of connections to the server.
>
> Any idea what they are doing?
>

It's used to trigger ETL jobs. There are other bigger consumers of
connections - the issue isn't the stream of jobs, that works fine under
normal operation. The issue is that when jobs stay in "startup" and don't
enter pg_stat_activity the system spirals downwards and no new connections
are allowed.

James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-11-23 Thread James Sewell
>
>
> > It's used to trigger ETL jobs. There are other bigger consumers of
> > connections - the issue isn't the stream of jobs, that works fine under
> > normal operation. The issue is that when jobs stay in "startup" and
> > don't enter pg_stat_activity the system spirals downwards and no new
> > connections are allowed.
>
> Is there a correlation to these ETL jobs or the other consumers?
>
> If so what are the consumers trying to do at that time?
>

Not really no - it seems to be pretty randomly timed.

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-11-24 Thread James Sewell
>
> The hypothesis I'm thinking of is that incoming sessions are being blocked
> somewhere before they can acquire a ProcArray entry; if so, they'd not
> show up in either pg_stat_activity or pg_locks.  What we have to look for
> then is evidence of somebody holding a strong lock on a shared relation.
> Try "select * from pg_locks where locktype = 'relation' and database = 0".
>
>
That doesn't show anything when the issue is happening.

Strace shows the startup processes looping like so:

13:51:15.510850 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=733559666}) = 0
13:51:15.511254 futex(0x7ee891405238, FUTEX_WAKE, 1) = 1
13:51:15.511422 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=734037545}) = 0
13:51:15.511690 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.520470 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=743136450}) = 0
13:51:15.520772 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=743376746}) = 0
13:51:15.521005 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.539309 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=761920081}) = 0
13:51:15.539508 futex(0x7ee891406bb8, FUTEX_WAKE, 1) = 1
13:51:15.539690 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=762298276}) = 0
13:51:15.539882 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.547311 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=769929341}) = 0
13:51:15.547757 futex(0x7ee89140a738, FUTEX_WAKE, 1) = 1
13:51:15.547931 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=770524120}) = 0
13:51:15.548089 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.551482 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = -1 EAGAIN
(Resource temporarily unavailable)
13:51:15.551950 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.593631 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = -1 EAGAIN
(Resource temporarily unavailable)
13:51:15.593807 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.626167 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.627590 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=850204434}) = 0
13:51:15.627817 futex(0x7ee8914033b8, FUTEX_WAKE, 1) = 1
13:51:15.627987 open("pg_subtrans/A100", O_RDWR) = 16
13:51:15.628270 lseek(16, 131072, SEEK_SET) = 131072
13:51:15.628461 read(16,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
8192) = 8192
13:51:15.628677 close(16)   = 0
13:51:15.628911 open("pg_subtrans/A0E1", O_RDWR) = 16
13:51:15.629132 lseek(16, 122880, SEEK_SET) = 122880
13:51:15.629309 read(16,
"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,
8192) = 8192
13:51:15.629503 close(16)   = 0
13:51:15.629685 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=852272645}) = 0
13:51:15.629863 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.656694 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=879314944}) = 0
13:51:15.656923 futex(0x7ee8914125b8, FUTEX_WAKE, 1) = 1
13:51:15.657129 open("pg_subtrans/A0EE", O_RDWR) = 16
13:51:15.657385 lseek(16, 73728, SEEK_SET) = 73728
13:51:15.657638 read(16,
"\0\0\0\0\0H\356\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5H\356\240\0\0\0\0"...,
8192) = 8192
13:51:15.657907 close(16)   = 0
13:51:15.658141 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=880753952}) = 0
13:51:15.658346 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.689705 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=912331058}) = 0
13:51:15.689950 futex(0x7ee89140a0b8, FUTEX_WAKE, 1) = 1
13:51:15.690182 futex(0x7ee8914118b8, FUTEX_WAKE, 1) = 1
13:51:15.690420 open("pg_subtrans/A0EE", O_RDWR) = 16
13:51:15.690742 lseek(16, 73728, SEEK_SET) = 73728
13:51:15.691037 read(16,
"\0\0\0\0\0H\356\240\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\5H\356\240\0\0\0\0"...,
8192) = 8192
13:51:15.691303 close(16)   = 0
13:51:15.691582 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=914202777}) = 0
13:51:15.691837 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.713319 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.716423 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=939067321}) = 0
13:51:15.716737 futex(0x7ee89140a0b8, FUTEX_WAKE, 1) = 1
13:51:15.717010 futex(0x7ee89140f3b8, FUTEX_WAKE, 1) = 1
13:51:15.717286 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=939899218}) = 0
13:51:15.717578 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.722270 futex(0x7ee891405938, FUTEX_WAIT, 0, NULL) = 0
13:51:15.746391 clock_gettime(CLOCK_MONOTONIC, {tv_sec=19439358,
tv_nsec=969020657}) = 0
13:51:15.746664 futex(0x7ee89140fa38, FUTEX_WAKE, 1) = 1
13:51:15.746877 open("pg_subtrans/A0EE", O_RDWR) = 16


While gdb shows similar to:

   24 ^[[?1034h(gdb) bt
   23 #0  0x7f18fb9deb3b in do_futex_wait.constprop.1 () from
/lib64/libpthread.so.0
   22 #1  0x7f18fb9debcf in __new_sem_wait_slow.constprop.0 () from
/lib64/libpthread.so.0
   21 #2  0x7f18fb9dec6b in sem_wait@@GLIBC_2.

Re: Max connections reached without max connections reached

2021-11-24 Thread James Sewell
> > Seems like some of the processes are taking a long time or stuck while
> > reading/writing SLRU pages, and due to that while creating a new
> > connection the backend process is not able to check the transaction
> > status (from pg_xlog) of the pg_class tuple and gets stuck/taking a
> > long time in a startup.
>
> My next question would be whether this particular process shown is
> stack is stuck forever or finally, it is able to make a connection?  I
> want to understand is this just due to slow I/O or something else?
>

They don't ever seem to clear, (more and more show up)  and IO doesn't seem
to be deadlocked at that time.

It's a really odd one!

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-11-29 Thread James Sewell
>
> How did you verify that, maybe some process started IO and stuck
> there? Can we check pg_stat_activity that is there some process that
> shows in the wait event as SLRURead/SLRUWrite and not coming out of
> that state?


There are no interesting entries in pg_stat_activity, and no wait events of
those types DURING the problem periods (I poll this every 15s) .

It's an odd one.

- James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-12-01 Thread James Sewell
> Looking again into the back trace[1], it appeared that the backend is
> getting stuck while getting the subtransaction's parent information,
> and that can only happen if the snapshot is getting marked as
> overflowed.  So it seems that some of the scripts are using a lot of
> sub-transaction (> 64) within a transaction and in such case, we can
> not cache the sub-transaction information and the snapshot will be
> marked overflowed.  After that, for checking the visibility of any
> tuple we need to consult the SLRU to get TopMost transaction
> information and if the subtransaction SLRU cache is full then it will
> have to do the I/O.  So in this case it appeared that a lot of
> parallel scans/new connections are trying to read the pg_subtrans and
> due to limited cache size that is not even able to load the data in
> the cache and getting frequently blocked.  So I assume that when you
> say the startup is getting stuck forever, how long did you wait, maybe
> it is moving but getting stuck again for the next tuple, so to confirm
> that attach the process with gdb put breakpoint in XidInMVCCSnapshot()
> and check whether it is hitting that breakpoint again or not.  Also,
> check whether you are seeing "SubtransControlLock" in
> pg_stat_activity.  There is one nice blog[2] explaining this issue so
> you can refer to that as well.


Ah. Snap. This makes a lot of sense. The developers have a nasty habit of
using LOOP over rows which has never been stamped out. I bet they have
applied this to something with a lot of rows.

Is there a way of seeing which statements / backends / processes have
greater than 64 stxns? I'd like to monitor this.

- James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-12-02 Thread James Sewell
>
> I expect my 'vote' counts for naught, but I fully expect seeing these show
> up in the logs would have helped me much more quickly have insight into
> what was going on during times of very high concurrency and extreme
> slowness with many processes showing as waiting on LwLocks.
>

Is there any mechanism I could use from a C extension and a hook to warn
about >64 overflows?

- James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-12-05 Thread James Sewell
>
> Agreed with both points.  What about we add, subxid count and overflow
> status in LocalPgBackendStatus and through that, we can show in
> pg_stat_activity.  That way we don't have to report it ever and
> whenever the user is running pg_stat_activity they can fetch it
> directly from "proc->subxidStatus", along with fetching the proc.xid
> and proc.xmin.  Does this make sense?
>

Not sure about the overhead and performance aspects, but I like this
solution from a functional point of view.

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-12-05 Thread James Sewell
+1, I too like the idea. The patch doesn't seem to be doing any heavy
> lifting, I think that much overhead should be acceptable.
>

I'm guessing this won't be back-patched? Is it possible to somehow read
this information from a C function?

- James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-12-20 Thread James Sewell
>
>
> I'm guessing this won't be back-patched? Is it possible to somehow read
> this information from a C function?
>
>

Ok it is possible, I've got a C extension up and running which hooks
ExecutorStart, then once for each TX ( I monitor the nesting depth like in
pg_stat_statements, and only attach at the top level) attaches a sub
transaction callback, tracking start subtransaction events and incrementing
a counter / keeping track of the worst offenders in a hashmap.

This seems to work very well - but I've got a question. How many sub
transactions would you expect the following anon block and function to
create respectively? The first seems to report 9, and the second 10 - is
this expected? It feels like it might be - I just want to make sure.

do $$
begin
   for counter in 1..10 loop
   begin
   INSERT INTO a VALUES (counter);
exception
when no_data_found then  raise exception 'gosh';
  end;
   end loop;
end;
$$


CREATE OR REPLACE FUNCTION public.create_subtransactions()
 RETURNS void
 LANGUAGE plpgsql
AS $function$
begin
   for counter in 1..10 loop
 begin
   INSERT INTO a VALUES (counter);
 exception
   when no_data_found then  raise exception 'gosh';
 end;
  end loop;
end;
$function$

- James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.


Re: Max connections reached without max connections reached

2021-12-20 Thread James Sewell
> Ok it is possible, I've got a C extension up and running which hooks
> ExecutorStart, then once for each TX ( I monitor the nesting depth like in
> pg_stat_statements, and only attach at the top level) attaches a sub
> transaction callback, tracking start subtransaction events and incrementing
> a counter / keeping track of the worst offenders in a hashmap.
>

Sorry that should be ExecutorRun  - although I'm a little confused about
which is the correct choice.

-James

-- 
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.