On Fri, 27 Dec 2024 15:15:40 +0100 "Jelte Fennema-Nio" <postg...@jeltef.nl> wrote:
> On Tue Dec 24, 2024 at 4:52 PM CET, Tom Lane wrote: > > torikoshia <torikos...@oss.nttdata.com> writes: > >> I have attached a PoC patch that modifies EXPLAIN to include page > >> fault information during both the planning and execution phases of > >> a query. > > > > Surely these numbers would be too unstable to be worth anything. > > What makes you think that? I'd expect them to be similarly stable to > the numbers we get for BUFFERS. i.e. Sure they won't be completely > stable, but I expect them to be quite helpful when debugging perf > issues, because large numbers indicate that the query is disk-bound > and small numbers indicate that it is not. > > These numbers seem especially useful for setups where shared_buffers > is significantly smaller than the total memory available to the > system. In those cases the output from BUFFERS might give the > impression that that you're disk-bound, but if your working set still > fits into OS cache then the number of page faults is likely still > low. Thus telling you that the numbers that you get back from BUFFERS > are not as big of a problem as they might seem. We'd probably need to combine both pg_buffercache_evict() and /proc/sys/vm/drop_caches to get stable numbers - which is something I have done in the past for testing. Another thought would be splitting out the IO timing information into two values - IO timing for reads that triggered major faults, versus IO timing for reads that did not. And system views like pg_stat_database seem worth considering too. -Jeremy