dead tuple difference between pgstattuple and pg_stat_user_tables

2024-08-23 Thread Matthew Tice
Hi All,

I'm trying to understand why there's a difference between what pgstattuple
reports and pg_stat_user_tables reports (for the number of dead tuples).

As I understand, pgstattuple and pgstattuple_approx return the exact number
of dead tuples (as noted in the documentation) and based on an older Stack
Overflow answer the value returned from pg_stat_user_tables "uses the most
recent data collected by ANALYZE".

Why would it be that even after analyzing a table the n_dead_tup value is
still vastly different than dead_tuple_count?

> SELECT * FROM (SELECT dead_tuple_count from
pgstattuple_approx('oban.oban_jobs'))a, (SELECT
n_dead_tup,last_autovacuum,last_analyze,now(),autovacuum_c
 ount FROM pg_stat_user_tables WHERE relname = 'oban_jobs' and schemaname =
'oban')b;
-[ RECORD 1 ]-
dead_tuple_count | 3736
n_dead_tup   | 1127044
last_autovacuum  | 2024-08-23 16:00:30.983141+00
last_analyze | 2024-08-23 15:33:50.628422+00
now  | 2024-08-23 16:01:19.915893+00
autovacuum_count | 446478
SELECT 1

> vacuum (verbose,analyze) oban.oban_jobs;

vacuuming "oban.oban_jobs"
table "oban_jobs": index scan bypassed: 29341 pages from table (0.79% of
total) have 747 dead item identifiers
launched 2 parallel vacuum workers for index cleanup (planned: 2)
index "oban_jobs_args_index" now contains 18281 row versions in 10232 pages
0 index row versions were removed.
0 index pages were newly deleted.
56 index pages are currently deleted, of which 833 are currently reusable.
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
index "oban_jobs_meta_index" now contains 18281 row versions in 9698 pages
0 index row versions were removed.
0 index pages were newly deleted.
35 index pages are currently deleted, of which 621 are currently reusable.
CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s.
table "oban_jobs": found 855 removable, 9661 nonremovable row versions in
29341 out of 3727204 pages
1330 dead row versions cannot be removed yet, oldest xmin: 1378705314
Skipped 0 pages due to buffer pins, 3696951 frozen pages.
912 skipped pages using mintxid fork.
CPU: user: 0.12 s, system: 0.08 s, elapsed: 0.22 s.
vacuuming "pg_toast.pg_toast_72454950"
table "pg_toast_72454950": found 0 removable, 0 nonremovable row versions
in 0 out of 0 pages
0 dead row versions cannot be removed yet, oldest xmin: 1378705314
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 skipped pages using mintxid fork.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
analyzing "oban.oban_jobs"
"oban_jobs": scanned 3 of 3727204 pages, containing 75 live rows and
10501 dead rows; 75 rows in sample, 9318 estimated total rows
VACUUM

> SELECT * FROM (SELECT dead_tuple_count from
pgstattuple_approx('oban.oban_jobs'))a, (SELECT
n_dead_tup,last_autovacuum,last_analyze,now(),autovacuum_c
 ount FROM pg_stat_user_tables WHERE relname = 'oban_jobs' and schemaname =
'oban')b;
-[ RECORD 1 ]-
dead_tuple_count | 1701
n_dead_tup   | 1306009
last_autovacuum  | 2024-08-23 16:01:31.034229+00
last_analyze | 2024-08-23 16:01:47.85574+00
now  | 2024-08-23 16:01:55.734589+00
autovacuum_count | 446479

This is a Google Alloy DB instance running:
> select version();
-[ RECORD 1 ]-
version | PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by Debian clang
version 12.0.1, 64-bit
SELECT 1


Re: dead tuple difference between pgstattuple and pg_stat_user_tables

2024-08-23 Thread Matthew Tice
On Fri, Aug 23, 2024 at 10:26 AM Adrian Klaver 
wrote:

> On 8/23/24 09:14, Matthew Tice wrote:
> > Hi All,
> >
> > I'm trying to understand why there's a difference between what
> > pgstattuple reports and pg_stat_user_tables reports (for the number of
> > dead tuples).
> >
> > As I understand, pgstattuple and pgstattuple_approx return the exact
> > number of dead tuples (as noted in the documentation) and based on an
>
> https://www.postgresql.org/docs/current/pgstattuple.html
>
> pgstattuple_approx(regclass) returns record
>
>  pgstattuple_approx is a faster alternative to pgstattuple that
> returns approximate results.
>
> Not sure how you get exact count out of that?
>

Maybe the wording is a little confusing to me. Under the section
for pgstattuple_approx:
"pgstattuple_approx tries to avoid the full-table scan and returns exact
dead tuple statistics along with an approximation of the number and size of
live tuples and free space."


>
> > This is a Google Alloy DB instance running:
>
> https://cloud.google.com/alloydb/docs/overview
>
> "AlloyDB for PostgreSQL is a fully managed, PostgreSQL-compatible
> database service that's designed for your most demanding workloads,
> including hybrid transactional and analytical processing. AlloyDB pairs
> a Google-built database engine with a cloud-based, multi-node
> architecture to deliver enterprise-grade performance, reliability, and
> availability."
>
> Where the important parts are 'PostgreSQL-compatible' and 'Google-built
> database engine'. You probably need to reach out to Google to see what
> that means for this situation.
>
> Got it, thanks Adrian.


>
> >  > select version();
> > -[ RECORD 1 ]-
> > version | PostgreSQL 14.10 on x86_64-pc-linux-gnu, compiled by Debian
> > clang version 12.0.1, 64-bit
> > SELECT 1
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


datfrozenxid not dropping after vacuum

2021-09-01 Thread Matthew Tice
Hi,

Starting this morning at 0830 local time I noticed that my
datfrozenxid starts moving past the `autovacuum_freeze_max_age` value
of 2.   When we encountered this in the past the solution has
been to do one of the following:

1. This is related an error similar to
```
found xmin 2675436435 from before relfrozenxid 321165377
```
Where the solution has been to move the `pg_internal.init` file out of
the way and let Postgresql recreate it.  Or;

2. A long-running transaction.  Typically I'll just find the `idle in
transaction` transactions that have a `query_start` around when my
alarm went off notifying me when `datfrozenxid` breaches
`autovacuum_freeze_max_age`.  Using a query similar to
```
SELECT pid, query_start, datname, usename, state, backend_xmin,
age(backend_xmin)
FROM pg_stat_activity
WHERE state = 'idle in transaction';
```

3. The autovacuum process seemed to be "stuck" on a particular table.
We would kill the pid of the autovacuum process.

The problem is that neither of these solutions have seemed to drop
`datfrozenxid` back down and there is one specific database in this
cluster that's holding onto it.

Using these queries from CrunchyData:

# Show oldest current xid
# WITH max_age AS (
SELECT 20 as max_old_xid
, setting AS autovacuum_freeze_max_age
FROM pg_catalog.pg_settings
WHERE name = 'autovacuum_freeze_max_age' )
, per_database_stats AS (
SELECT datname
, m.max_old_xid::int
, m.autovacuum_freeze_max_age::int
, age(d.datfrozenxid) AS oldest_current_xid
FROM pg_catalog.pg_database d
JOIN max_age m ON (true)
WHERE d.datallowconn )
SELECT max(oldest_current_xid) AS oldest_current_xid
, max(ROUND(100*(oldest_current_xid/max_old_xid::float))) AS
percent_towards_wraparound
, max(ROUND(100*(oldest_current_xid/autovacuum_freeze_max_age::float)))
AS percent_towards_emergency_autovac
FROM per_database_stats;

 oldest_current_xid | percent_towards_wraparound |
percent_towards_emergency_autovac
++---
  230935699 | 12 |
  115

# Top 8 individual databases
SELECT datname
, age(datfrozenxid)
, current_setting('autovacuum_freeze_max_age')
FROM pg_database
ORDER BY 2 DESC LIMIT 8;

  datname  |age| current_setting
---+---+-
 siteservice  | 230935699 | 2
 coupon-ws   | 217613246 | 2
 contextchangestore | 211343280 | 2
 template0 | 210351639 | 2
 productmanager | 207876167 | 2
 rhistory  | 207876167 | 2
 smsservice   | 207876167 | 2
 techservice  | 205432524 | 2

That top database `siteservice` is the "problematic" one where a
manual vacuum won't coerce it to free up the `datfrozenxid`.

Looking at the tables in that database:

# SELECT c.oid::regclass
, age(c.relfrozenxid)
, pg_size_pretty(pg_total_relation_size(c.oid))
FROM pg_class c
JOIN pg_namespace n on c.relnamespace = n.oid
WHERE relkind IN ('r', 't', 'm')
AND n.nspname NOT IN ('pg_toast')
ORDER BY 2 DESC LIMIT 10;

   oid |age| pg_size_pretty
+---+
 pg_database| 230935699 | 4264 kB
 pg_proc|  93543215 | 976 kB
 pg_collation   |  93543215 | 560 kB
 pg_attribute   |  93543215 | 600 kB
 pg_shdepend|  59515320 | 15 MB
 pg_statistic   |  53828900 | 464 kB
 pg_subscription|  53172718 | 16 kB
 pg_pltemplate  |  53172718 | 56 kB
 pg_authid  |  53172718 | 8616 kB
 pg_db_role_setting |  53172718 | 64 kB

I thought maybe it had to do with my replication slots somehow:

# select slot_name, slot_type, database, active, catalog_xmin,
restart_lsn, confirmed_flush_lsn from pg_replication_slots ;
   slot_name| slot_type |
database| active | catalog_xmin |  restart_lsn  |
confirmed_flush_lsn
+---+---++--+---+-
 dbs1db02   | physical  |
 | f  |  |   |
 dbs1db01   | physical  |
 | t  |  | 4D25/ACE6EE08 |
 dbs1db03   | physical  |
 | t  |  | 4D25/ACE6EE08 |
 dbs2db01   | physical  |
 | t  |  | 4D25/ACE6EE08 |
 debezium_cmanager  | logical   | campaign-manager
 | t  |   2152258063 | 4D25/A421A6C8 | 4D25/ABC18C88
 debezium_rservice| logical   | retail-content-service
| t  |   2152238060 | 4D25/8EC403B0 | 4D25/A6105DF8
 debezium_partnerservice | logical   | partnerservice | t  |
2152238060 | 4D25/8EC403B0 | 4D25/A5446630
 

Re: datfrozenxid not dropping after vacuum

2021-09-01 Thread Matthew Tice
Hi Alvaro, thanks for the quick reply.

I'm scheduled to do my patching maintenance at the end of this month -
but at this point I don't think I'm going to make it.

Other than patching, is there a work around?  For example, in #2 above:
>The fix for 2) is simpler,
>simply always remove both the shared and local init files.

I'm not familiar with the differences between 'shared' and 'local'
init files (I'd imagine I referenced a 'local' file in my original
post)?



Thanks!

Matt

On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera  wrote:
>
> On 2021-Sep-01, Matthew Tice wrote:
>
> [ problem table is pg_database ]
>
> > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my
> > standby databases have been patched to 10.17.
>
> Hmm, I think there was a bug in the early 10.x versions where advancing
> the xid age of shared tables would not work correctly for some reason ...
> Ah yes, this was fixed in 10.5, a mere three years ago:
>
> Author: Andres Freund 
> Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700
> Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 -0700
> Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 -0700
> Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 -0700
> Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 -0700
> Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 -0700
>
> Fix bugs in vacuum of shared rels, by keeping their relcache entries 
> current.
>
> When vacuum processes a relation it uses the corresponding relcache
> entry's relfrozenxid / relminmxid as a cutoff for when to remove
> tuples etc. Unfortunately for nailed relations (i.e. critical system
> catalogs) bugs could frequently lead to the corresponding relcache
> entry being stale.
>
> This set of bugs could cause actual data corruption as vacuum would
> potentially not remove the correct row versions, potentially reviving
> them at a later point.  After 699bf7d05c some corruptions in this vein
> were prevented, but the additional error checks could also trigger
> spuriously. Examples of such errors are:
>   ERROR: found xmin ... from before relfrozenxid ...
> and
>   ERROR: found multixact ... from before relminmxid ...
> To be caused by this bug the errors have to occur on system catalog
> tables.
>
> The two bugs are:
>
> 1) Invalidations for nailed relations were ignored, based on the
>theory that the relcache entry for such tables doesn't
>change. Which is largely true, except for fields like relfrozenxid
>etc.  This means that changes to relations vacuumed in other
>sessions weren't picked up by already existing sessions.  Luckily
>autovacuum doesn't have particularly longrunning sessions.
>
> 2) For shared *and* nailed relations, the shared relcache init file
>was never invalidated while running.  That means that for such
>tables (e.g. pg_authid, pg_database) it's not just already existing
>sessions that are affected, but even new connections are as well.
>That explains why the reports usually were about pg_authid et. al.
>
> To fix 1), revalidate the rd_rel portion of a relcache entry when
> invalid. This implies a bit of extra complexity to deal with
> bootstrapping, but it's not too bad.  The fix for 2) is simpler,
> simply always remove both the shared and local init files.
>
> Author: Andres Freund
> Reviewed-By: Alvaro Herrera
> Discussion:
> https://postgr.es/m/20180525203736.crkbg36muzxrj...@alap3.anarazel.de
> 
> https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bpgg+_gdmxe25tvuy4s...@mail.gmail.com
> 
> https://postgr.es/m/cakmfjucqbuodrfxpdx39wha3vjyxwerg_zdvxzncr6+5wog...@mail.gmail.com
> 
> https://postgr.es/m/cagewt-ujgpmlq09gxcufmzazsgjc98vxhefbf-tppb0fb13...@mail.gmail.com
> Backpatch: 9.3-
>
>
> --
> Álvaro Herrera   39°49'30"S 73°17'W
> "El número de instalaciones de UNIX se ha elevado a 10,
> y se espera que este número aumente" (UPM, 1972)




Re: datfrozenxid not dropping after vacuum

2021-09-02 Thread Matthew Tice
Interestingly enough, I hopped on the database system this morning and
found the `datfrozenxid` dropped back down below
`autovacuum_freeze_max_age` around 0200 local time (roughly 18 hours
after the fact).

Looking through the Postgresql logs I don't see anything standing out
at that time.

I still plan on patching to 10.17 tonight.

Matt

On Wed, Sep 1, 2021 at 4:01 PM Matthew Tice  wrote:
>
> Hi Alvaro, thanks for the quick reply.
>
> I'm scheduled to do my patching maintenance at the end of this month -
> but at this point I don't think I'm going to make it.
>
> Other than patching, is there a work around?  For example, in #2 above:
> >The fix for 2) is simpler,
> >simply always remove both the shared and local init files.
>
> I'm not familiar with the differences between 'shared' and 'local'
> init files (I'd imagine I referenced a 'local' file in my original
> post)?
>
>
>
> Thanks!
>
> Matt
>
> On Wed, Sep 1, 2021 at 3:00 PM Alvaro Herrera  wrote:
> >
> > On 2021-Sep-01, Matthew Tice wrote:
> >
> > [ problem table is pg_database ]
> >
> > > My primary, read/write database is Postgresql 10.4 (CentOS 7) while my
> > > standby databases have been patched to 10.17.
> >
> > Hmm, I think there was a bug in the early 10.x versions where advancing
> > the xid age of shared tables would not work correctly for some reason ...
> > Ah yes, this was fixed in 10.5, a mere three years ago:
> >
> > Author: Andres Freund 
> > Branch: master Release: REL_11_BR [a54e1f158] 2018-06-12 11:13:21 -0700
> > Branch: REL_10_STABLE Release: REL_10_5 [2ce64caaf] 2018-06-12 11:13:21 
> > -0700
> > Branch: REL9_6_STABLE Release: REL9_6_10 [6a46aba1c] 2018-06-12 11:13:21 
> > -0700
> > Branch: REL9_5_STABLE Release: REL9_5_14 [14b3ec6f3] 2018-06-12 11:13:21 
> > -0700
> > Branch: REL9_4_STABLE Release: REL9_4_19 [817f9f9a8] 2018-06-12 11:13:22 
> > -0700
> > Branch: REL9_3_STABLE Release: REL9_3_24 [9b9b622b2] 2018-06-12 11:13:22 
> > -0700
> >
> > Fix bugs in vacuum of shared rels, by keeping their relcache entries 
> > current.
> >
> > When vacuum processes a relation it uses the corresponding relcache
> > entry's relfrozenxid / relminmxid as a cutoff for when to remove
> > tuples etc. Unfortunately for nailed relations (i.e. critical system
> > catalogs) bugs could frequently lead to the corresponding relcache
> > entry being stale.
> >
> > This set of bugs could cause actual data corruption as vacuum would
> > potentially not remove the correct row versions, potentially reviving
> > them at a later point.  After 699bf7d05c some corruptions in this vein
> > were prevented, but the additional error checks could also trigger
> > spuriously. Examples of such errors are:
> >   ERROR: found xmin ... from before relfrozenxid ...
> > and
> >   ERROR: found multixact ... from before relminmxid ...
> > To be caused by this bug the errors have to occur on system catalog
> > tables.
> >
> > The two bugs are:
> >
> > 1) Invalidations for nailed relations were ignored, based on the
> >theory that the relcache entry for such tables doesn't
> >change. Which is largely true, except for fields like relfrozenxid
> >etc.  This means that changes to relations vacuumed in other
> >sessions weren't picked up by already existing sessions.  Luckily
> >autovacuum doesn't have particularly longrunning sessions.
> >
> > 2) For shared *and* nailed relations, the shared relcache init file
> >was never invalidated while running.  That means that for such
> >tables (e.g. pg_authid, pg_database) it's not just already existing
> >sessions that are affected, but even new connections are as well.
> >That explains why the reports usually were about pg_authid et. al.
> >
> > To fix 1), revalidate the rd_rel portion of a relcache entry when
> > invalid. This implies a bit of extra complexity to deal with
> > bootstrapping, but it's not too bad.  The fix for 2) is simpler,
> > simply always remove both the shared and local init files.
> >
> > Author: Andres Freund
> > Reviewed-By: Alvaro Herrera
> > Discussion:
> > 
> > https://postgr.es/m/20180525203736.crkbg36muzxrj...@alap3.anarazel.de
> > 
> > https://postgr.es/m/CAMa1XUhKSJd98JW4o9StWPrfS=11bpgg+_gdmxe25tvuy4s...@mail.gmail.com
> > 
> > https://postgr.es/m/cakmfjucqbuodrfxpdx39wha3vjyxwerg_zdvxzncr6+5wog...@mail.gmail.com
> > 
> > https://postgr.es/m/cagewt-ujgpmlq09gxcufmzazsgjc98vxhefbf-tppb0fb13...@mail.gmail.com
> > Backpatch: 9.3-
> >
> >
> > --
> > Álvaro Herrera   39°49'30"S 73°17'W
> > "El número de instalaciones de UNIX se ha elevado a 10,
> > y se espera que este número aumente" (UPM, 1972)




Re: What are best practices wrt passwords?

2024-10-16 Thread Matthew Tice



> On Oct 16, 2024, at 10:50 AM, Christophe Pettus  wrote:
> 
> 
> 
>> On Oct 16, 2024, at 09:47, Tom Lane  wrote:
>> I believe it depends on your platform --- some BSDen are pretty
>> permissive about this, if memory serves.  On a Linux box it seems
>> to work for processes owned by yourself even if you're not superuser.
> 
> I just tried it on an (admittedly kind of old) Ubuntu system and MacOS 14, 
> and it looks like shows everything owned by everyone, even from a non-sudoer 
> user.
> 
Interesting, that’s not my experience.  Only root can see the env variables of 
another user.

Terminal 1

$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.6 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/";
SUPPORT_URL="https://help.ubuntu.com/";
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/";
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy";
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal

$ whoami
testusr

$ export FOOBAR=true

$ bash

$ env | grep FOOBAR
FOOBAR=true

Terminal 2
$  whoami
mtice

$  ps e -U testusr | grep -c FOOBAR
0

$  sudo ps e -U testusr | grep -c FOOBAR
1