so 25. 1. 2025 v 21:01 odesílatel Frits Hoogland <frits.hoogl...@gmail.com> napsal:
> I am looking at whether sampling key database catalog information per > second would have any drawback whatsoever. > I think you're saying that you think isn't the case, except maybe for > pg_database, and I figure that is because of the frozen and multi xact > fields per database. > > If the database client application is too unpredictable to know what SQL > it will produce, then having runtime data available at that granularity, so > it can be reasonably constructed what is going on is very convenient and > allows tremendous insight. It would also allow usage of the waitevents to > spot any weird behavior, such as short-lived peaks. (pg_stat_statements can > do that on a busy database, for example). > And if there is no known drawback, if such a low interval can be > organized: why not? I am not saying you are doing it wrong, this is about > trying to figure out what are the borders of what would be technically > possible without unreasonably affecting the database, a thought experiment. > > If course the gathered data needs to be organized so that you don't swamp > in it, and it shouldn't lead to the monitoring data swamping the system, > either in memory or on disk, but that is a given. > > Why would per second be too much for locks? Is there overhead to select > from pg_locks, or pg_blocking_pids()? > when you have a query about 10ms, then the lock 50ms is important overhead. Surely queries to pg_locks are queries like any other - there is a lot of overhead with planner, executor, and locking. Fortunately, it is almost CPU related. > > Again, please realise I am happy and appreciative of the time you take, I > am toying with the above described idea. > > *Frits Hoogland* > > > > > On 25 Jan 2025, at 19:18, Pavel Stehule <pavel.steh...@gmail.com> wrote: > > Hi > > so 25. 1. 2025 v 18:00 odesílatel Frits Hoogland <frits.hoogl...@gmail.com> > napsal: > >> Thank you Pavel, that is really useful. I can imagine other people >> thinking about getting fine grained data from postgres might wonder the >> same as I do about this. >> And really from a computer's perspective I would say that once a second >> isn't really a high frequency? >> > > I usually work with minute sampling and usually it is good enough > (statistics are cumulative, so you can lose the timestamp, but you never > lose data. > > Only when we try to investigate some special case, then I use second > sampling. When you investigate lock issues, then seconds are too much > > Regards > > Pavel > > >> If I time the amount of time that these queries take, it's around 20ms >> (local connection), so there is a relative long time of all the objects >> including pg_database are not actively queried. >> >> I git grepped the sourcecode, it seems that there is a rowexclusive lock >> for pg_database manipulation in case of addition, removal and change of a >> database in dbcommands.c, but I do think your reasoning is based on the >> columns datfrozenxid and datminmxid? >> >> There is a lock for updating the frozenxid and mxid for a database in >> (vacuum.c:LockDatabaseFrozenIds, ExclusiveLock), but it seems a select >> should play nice with that? >> >> btw, it's interesting to see that both datfrozenxid and datminmxid are in >> place updated, with no read consistency provided. >> >> *Frits Hoogland* >> >> >> >> >> On 25 Jan 2025, at 14:32, Pavel Stehule <pavel.steh...@gmail.com> wrote: >> >> Hi >> >> so 25. 1. 2025 v 12:23 odesílatel Frits Hoogland < >> frits.hoogl...@gmail.com> napsal: >> >>> For monitoring database behaviour and trying to build an history of >>> activity, if I would create an application that creates a single connection >>> and execute something like: >>> select * from pg_stat_activity; >>> select * from pg_stat_database; >>> select * from pg_stat_bgwriter; >>> select * from pg_stat_wal; >>> select * from pg_settings; >>> select * from pg_database; >>> For which the query is prepared, and execute that every 1 second, would >>> there be any realistic danger or overhead that should be considered? >>> My thinking is that the data for these catalogs are all in shared memory >>> and when executed serially and do not cause any significant resources to be >>> taken? >>> >> >> The queries to all tables excluding pg_database every 1 sec will have >> probably zero impact to performance. >> >> I am not sure about pg_database - it is a very important table, and your >> query can block operations that need exclusive lock to this table. So >> theoretically, there can be some impact to performance. >> >> Regards >> >> Pavel >> >> >>> >>> Thanks, >>> >>> *Frits Hoogland* >>> >>> >>> >>> >>> >> >