On Sat, Feb 29, 2020 at 11:22 AM Jeff Janes <jeff.ja...@gmail.com> wrote:
> On Thu, Feb 27, 2020 at 11:33 AM Ben Snaidero <bensnaid...@geotab.com> > wrote: > > >> I have the following query that was on average running in ~2ms suddenly >> jump up to on average ~25ms. >> > > What are you averaging over? The plan you show us is slow enough that if > you were averaging over the last 1000 executions, that one execution could > skew the entire average just by itself. When individual execution times > can vary over 4 powers of 10, I don't think averages are a very good way of > analyzing things. > > > >> This query is called millions of time per day and there were cases of the >> query taking 20-30 seconds. Below is the explain analyze of one such >> example. >> When seeing this issue, the server was under some CPU pressure but even >> with that, I would not think it should get as slow as shown below as we are >> using SSDs and none of the windows disk counters (IOPS, queue length) show >> any value that would be of concern. >> > > What is the average and expected random read latency on your SSDs? Have > you benchmarked them (outside of the database system) to see if they are > performing as expected? > > >> Rows Removed by Filter: 91686 >> Buffers: shared hit=12102 read=13259 written=111 >> > > Do the faster executions have fewer rows removed by the filter (and fewer > buffers read), or are they just faster despite having about the same values? > > > >> We took a perfview during the issue and below is the call stack from a >> process running this query, two call paths are shown. >> > > I've never used perfview. But if I try to naively interpret it similar to > gdb backtrace, it doesn't make much sense to me. InitBufTable is only > called by "postmaster" while starting the database, how could it be part of > call paths during regular operations? Are these views of the slow-running > back end itself, or of some other postgresql process which was idle at the > time the snapshot was taken? > > Cheers, > > Jeff > Query statistics were averaged over ~3million calls so I don't think a single outlier would skew the results too much. The perfview call stack is similar to gdb backtrace. I am 99% sure that this call path is from the backend running this query as we queried pg_stat_activity at the time of the perfview and cross-referenced the PIDs. That said I am going to try building on windows with debug symbols enabled and see if I can use gdb to debug and confirm.