> 1) Do the 77 share some trait the other 80 don't. No pattern found yet .. but still verifying a few things
> 2) Do the OS system logs reveal anything? Nothing found in syslog > 3) What was happening in the databases just prior to the time the stats reset? Here's an example (log extracts) for a stats reset occurrence: select datname, stats_reset, now()-stats_reset as since_reset from pg_stat_database where ( now()-stats_reset ) < interval '1 day' order by 3 limit 1; datname | stats_reset | since_reset ----------------+-------------------------------+----------------- MyDB | *2024-11-21 13:48:34.332*785+00 | 00:00:22.266304 <--LOGS--> 2024-11-21 13:48:34.324 UTC pid=[322035][2] db=[MyDB] usr=[user1] client=[host1] app=[[unknown]]LOG: connection authorized: user=user1 database=MyDB applicatio n_name=app1 <..> <.. no calls at "2024-11-21 13:48:34.332" - WHY?? ..> 2024-11-21 13:48:34.336 UTC pid=[322035][3] db=[MyDB] usr=[user1] client=[host1] app=[app1]LOG: duration: 1.071 ms parse <unnamed>: SELECT <..> <--LOGS--> As you can see from above, the stats for MyDB were reset at ".332" . The only logs before/after for the db was the connection (at .324), and then the parse (at .336). NB: I also checked the logs at ".333" in case there would have been a rounding up, but nothing relevant was found. With that said, I only verified one occurence - tomorrow I'll check a few more just to validate. > 4) Do you have external tools accessing these databases? We have internal micro-services accessing the databases, as well as a monitoring tool (Netdata), and some of the Devs use pgAdmin. I discarded the scenario where someone would inadvertently do a "pg_stat_reset" via pgAdmin, just because a lot of databases have their stats reset within a short period of time. On the other hand, Netdata does connect to most (if not all) databases frequently by its nature - so as a test, I stopped the Netdata service today to see if tomorrow we're still seeing the stats reset or not. I can report back tomorrow on this. > 5) Is the cluster directly open to the world? No. It's an on-premise installation. Only local applications can connect to it. -Steeve On Thu, Nov 21, 2024 at 4:32 PM Adrian Klaver <adrian.kla...@aklaver.com> wrote: > On 11/21/24 13:31, Steeve Boulanger wrote: > > > All I can think to do is look at the logs around the stats_reset > times > > > for the databases and see if there is anything relevant. > > > > That was already done, but nothing relevant was found unfortunately. > > Unless it was not recognized as relevant. Since for the time being I am > eliminating magic as the cause, something concrete is causing this and > it should be leaving a trace. In your post you had this affecting 77 out > of 157 databases in the cluster. > > 1) Do the 77 share some trait the other 80 don't. > > 2) Do the OS system logs reveal anything? > > 3) What was happening in the databases just prior to the time the stats > reset? > > 4) Do you have external tools accessing these databases? > > 5) Is the cluster directly open to the world? > > > > > -Steeve > > > > On Thu, Nov 21, 2024 at 3:12 PM Adrian Klaver <adrian.kla...@aklaver.com > > <mailto:adrian.kla...@aklaver.com>> wrote: > > > > On 11/21/24 12:57, Steeve Boulanger wrote: > > > > > > > Please reply to list also. > > > > > > My apologies - I thought I did a "Reply all", but apparently not. > > I'm a > > > little bit of a noob with email distrib lists. > > > > > > > 1) What is log_min_error_statement set to? > > > > > > name | setting | pending_restart > > > -------------------------+---------+----------------- > > > log_min_error_statement | error | f > > > > > > > 2) Did you reload the server when changing?: > > > > > > yes - pg_reload_conf() > > > > All I can think to do is look at the logs around the stats_reset > times > > for the databases and see if there is anything relevant. > > > > > > > > -Steeve > > > > > > -- > > Adrian Klaver > > adrian.kla...@aklaver.com <mailto:adrian.kla...@aklaver.com> > > > > -- > Adrian Klaver > adrian.kla...@aklaver.com > >