Hi, On 2021-02-24 14:59:19 -0500, Greg Stark wrote: > I guess what I would be looking for in stats would be a way to tell > what the bandwidth, latency, and queue depth is. Ideally one day > broken down by relation/index and pg_stat_statement record.
I think doing it at that granularity will likely be too expensive... > I think seeing the actual in flight async requests in a connection is > probably not going to be very useful in production. I think it's good for analyzing concrete performance issues, but probably not that much more. Although, it's not too hard to build sampling based on top of it with a tiny bit of work (should display the relfilenode etc). > So number of async reads we've initiated, how many callbacks have been > called, total cumulative elapsed time between i/o issued and i/o > completed, total bytes of i/o initiated, total bytes of i/o completed. Much of that is already in pg_stat_aio_backends - but is lost after disconnect (easy to solve). We don't track bytes of IO currently, but that'd not be hard. However, it's surprisingly hard to do the measurement between "issued" and "completed" in a meaningful way. It's obviously not hard to measure the time at which the request was issued, but there's no real way to determine the time at which it was completed. If a backend is busy doing other things (e.g. invoke aggregate transition functions), we'll not see the completion immediately, and therefore not have an accurate timestamp. With several methods of doing AIO we can set up signals that fire on completion, but that's pretty darn expensive. And it's painful to write such signal handlers in a safe way. > I have some vague idea that we should have a generic infrastructure > for stats that automatically counts things associated with plan nodes > and automatically bubbles that data up to the per-transaction, > per-backend, per-relation, and pg_stat_statements stats. But that's a > whole other ball of wax :) Heh, yea, let's tackle that separately ;) Greetings, Andres Freund