>  1) Do the 77 share some trait the other 80 don't.

No pattern found yet .. but still verifying a few things

> 2) Do the OS system logs reveal anything?

Nothing found in syslog

> 3) What was happening in the databases just prior to the time the stats
reset?

Here's an example (log extracts) for a stats reset occurrence:

select datname, stats_reset, now()-stats_reset as since_reset
from pg_stat_database
where ( now()-stats_reset ) < interval '1 day'
order by 3  limit 1;

    datname     |          stats_reset          |   since_reset
----------------+-------------------------------+-----------------
 MyDB           | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304

<--LOGS-->
2024-11-21 13:48:34.324 UTC pid=[322035][2]  db=[MyDB] usr=[user1]
client=[host1] app=[[unknown]]LOG:  connection authorized: user=user1
database=MyDB applicatio
n_name=app1 <..>

<.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..>

2024-11-21 13:48:34.336 UTC pid=[322035][3]  db=[MyDB] usr=[user1]
client=[host1] app=[app1]LOG:  duration: 1.071 ms  parse <unnamed>: SELECT
<..>
<--LOGS-->

As you can see from above, the stats for MyDB were reset at  ".332" . The
only logs before/after for the db was the connection (at .324), and then
the parse (at .336).  NB: I also checked the logs at ".333" in case there
would have been a rounding up, but nothing relevant was found. With that
said, I only verified one occurence - tomorrow I'll check a few more just
to validate.


> 4) Do you have external tools accessing these databases?

We have internal micro-services accessing the databases, as well as a
monitoring tool (Netdata), and some of the Devs use pgAdmin. I discarded
the scenario where someone would inadvertently do a "pg_stat_reset" via
pgAdmin, just because a lot of databases have their stats reset within a
short period of time.

On the other hand, Netdata does connect to most (if not all) databases
frequently by its nature - so as a test, I stopped the Netdata service
today to see if tomorrow we're still seeing the stats reset or not. I can
report back tomorrow on this.

> 5) Is the cluster directly open to the world?

No. It's an on-premise installation. Only local applications can connect to
it.


-Steeve

On Thu, Nov 21, 2024 at 4:32 PM Adrian Klaver <adrian.kla...@aklaver.com>
wrote:

> On 11/21/24 13:31, Steeve Boulanger wrote:
> >  > All I can think to do is look at the logs  around the stats_reset
> times
> >  > for the databases and see if there is anything relevant.
> >
> > That was already done, but nothing relevant was found unfortunately.
>
> Unless it was not recognized as relevant. Since for the time being I am
> eliminating magic as the cause, something concrete is causing this and
> it should be leaving a trace. In your post you had this affecting 77 out
> of 157 databases in the cluster.
>
> 1) Do the 77 share some trait the other 80 don't.
>
> 2) Do the OS system logs reveal anything?
>
> 3) What was happening in the databases just prior to the time the stats
> reset?
>
> 4) Do you have external tools accessing these databases?
>
> 5) Is the cluster directly open to the world?
>
> >
> > -Steeve
> >
> > On Thu, Nov 21, 2024 at 3:12 PM Adrian Klaver <adrian.kla...@aklaver.com
> > <mailto:adrian.kla...@aklaver.com>> wrote:
> >
> >     On 11/21/24 12:57, Steeve Boulanger wrote:
> >      >
> >      >  > Please reply to list also.
> >      >
> >      > My apologies - I thought I did a "Reply all", but apparently not.
> >     I'm a
> >      > little bit of a noob with email distrib lists.
> >      >
> >      >  > 1) What is log_min_error_statement set to?
> >      >
> >      >            name           | setting | pending_restart
> >      > -------------------------+---------+-----------------
> >      >   log_min_error_statement | error   | f
> >      >
> >      >  > 2) Did you reload the server when changing?:
> >      >
> >      > yes - pg_reload_conf()
> >
> >     All I can think to do is look at the logs  around the stats_reset
> times
> >     for the databases and see if there is anything relevant.
> >
> >      >
> >      > -Steeve
> >
> >
> >     --
> >     Adrian Klaver
> >     adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com>
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>

Reply via email to