On Wed, Oct 02, 2024 at 11:12:37AM +0200, Benoit Lobréau wrote: > My collegues and I had a discussion about what could be done to improve > parallelism observability in PostgreSQL [0]. We thought about several > places to do it for several use cases. > > Guillaume Lelarge worked on pg_stat_statements [1].
Thanks, missed that. I will post a reply there. There is a good overlap with everything you are doing here, because each one of you wishes to track more data to the executor state and push it to different part of the system, system view or just an extension. Tracking the number of workers launched and planned in the executor state is the strict minimum for a lot of these things, as far as I can see. Once the nodes are able to push this data, then extensions can feed on it the way they want. So that's a good idea on its own, and covers two of the counters posted here: https://www.postgresql.org/message-id/CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X%2B_dZqKw%40mail.gmail.com Could you split the patch based on that? I'd recommend to move es_workers_launched and es_workers_planned closer to the top, say es_total_processed, and document what these counters are here for. After that comes the problem of where to push this data.. > Lastly the number would be more precise/easier to make sense of, since > pg_stat_statement has a limited size. Upper bound that can be configured. When looking for query-level patterns or specific SET tuning, using query-level data speaks more than this data pushed at database level. TBH, I am +-0 about pushing this data to pg_stat_database so as we would be able to tune database-level GUCs. That does not help with SET commands tweaking the number of workers to use. Well, perhaps few rely on SET and most rely on the system-level GUCs in their applications, meaning that I'm wrong, making your point about publishing this data at database-level better, but I'm not really sure. If others have an opinion, feel free. Anyway, what I am sure of is that publishing the same set of data everywhere leads to bloat, and I'd rather avoid that. Aggregating that from the queries also to get an impression of the whole database offers an equivalent of what would be stored in pg_stat_database assuming that the load is steady. Your point about pg_stat_statements not being set is also true, even if some cloud vendors enable it by default. Table/index-level data can be really interesting because we can cross-check what's happening for more complex queries if there are many gather nodes with complex JOINs. Utilities (vacuum, btree, brin) are straight-forward and best at query level, making pg_stat_statements their best match. And there is no need for four counters if pushed at this level while two are able to do the job as utility and non-utility statements are separated depending on their PlannedStmt leading to separate entries in PGSS. -- Michael
signature.asc
Description: PGP signature