Hi Tomas, Thank you for the review.
> > > 1) I read through the thread, and in general I agree with the reasoning > for removing the file part - it seems perfectly fine to just dump as > much as we can fit into a buffer, and then summarize the rest. But do we > need to invent a "new" limit here? The other places logging memory > contexts do something like this: > > MemoryContextStatsDetail(TopMemoryContext, 100, 100, false); > > Which means we only print the 100 memory contexts at the top, and that's > it. Wouldn't that give us a reasonable memory limit too? > > I think this prints more than 100 memory contexts, since 100 denotes the max_level and contexts at each level could have upto 100 children. This limit seems much higher than what I am currently storing in DSA which is approx. 7000 contexts. I will verify this again. > 2) I see the function got renamed to pg_get_process_memory_contexts(), > bu the comment still says pg_get_remote_backend_memory_contexts(). > > Fixed > > 3) I don't see any SGML docs for this new function. I was a bit unsure > what the "summary" argument is meant to do. The comment does not explain > that either. > > Added docs. Intention behind adding a summary argument is to report statistics of contexts at level 0 and 1 i.e TopMemoryContext and its immediate children. 4) I wonder if the function needs to return PID. I mean, the caller > knows which PID it is for, so it seems rather unnecessary. > > Perhaps it can be used to ascertain that the information indeed belongs to the requested pid. 5) In the "summary" mode, it might be useful to include info about how > many child contexts were aggregated. It's useful to know whether there > was 1 child or 10000 children. In the regular (non-summary) mode it'd > always be "1", probably, but maybe it'd interact with the limit in (1). > Not sure. > > Sure, I will add this in the next iteration. > > 6) I feel a bit uneasy about the whole locking / communication scheme. > In particular, I'm worried about lockups, deadlocks, etc. So I decided > to do a trivial stress-test - just run the new function through pgbench > with many clients. > > The memstats.sql script does just this: > > SELECT * FROM pg_get_process_memory_contexts( > (SELECT pid FROM pg_stat_activity > WHERE pid != pg_backend_pid() > ORDER BY random() LIMIT 1) > , false); > > where the inner query just picks a PID for some other backend, and asks > for memory context stats for that. > > And just run it like this on a scale 1 pgbench database: > > pgbench -n -f memstats.sql -c 10 test > > And it gets stuck *immediately*. I've seen it to wait for other client > backends and auxiliary processes like autovacuum launcher. > > This is absolutely idle system, there's no reason why a process would > not respond almost immediately. In my reproduction, this issue occurred because the process was terminated while the requesting backend was waiting on the condition variable to be signaled by it. I don’t see any solution other than having the waiting client backend timeout using ConditionVariableTimedSleep. In the patch, since the timeout was set to a high value, pgbench ended up stuck waiting for the timeout to occur. The failure happens less frequently after I added an additional check for the process's existence, but it cannot be entirely avoided. This is because a process can terminate after we check for its existence but before it signals the client. In such cases, the client will not receive any signal. I wonder if e.g. autovacuum launcher may > not be handling these requests, or what if client backends can wait in a > cycle. Did not see a cyclic wait in client backends due to the pgbench stress test. > > 7) I've also seen this error: > > pgbench: error: client 6 script 0 aborted in command 0 query 0: \ > ERROR: can't attach the same segment more than once > I haven't investigated it, but it seems like a problem handling errors, > where we fail to detach from a segment after a timeout. Thanks for the hint, fixed by adding a missing call to dsa_detach after timeout. > > > I opted for DSAs over DSMs to enable memory reuse by freeing > > segments for subsequent statistics copies of the same backend, > > without needing to recreate DSMs for each request. > > I feel like this might be a premature optimization - I don't have a > clear idea how expensive it is to create DSM per request, but my > intuition is that it's cheaper than processing the contexts and > generating the info. > > I'd just remove that, unless someone demonstrates it really matters. I > don't really worry about how expensive it is to process a request > (within reason, of course) - it will happen only very rarely. It's more > important to make sure there's no overhead when no one asks the backend > for memory context info, and simplicity. > > Also, how expensive it is to just keep the DSA "just in case"? Imagine > someone asks for the memory context info once - isn't it a was to still > keep the DSA? I don't recall how much resources could that be. > > I don't have a clear opinion on that, I'm more asking for opinions. Imagining a tool that periodically queries the backends for statistics, it would be beneficial to avoid recreating the DSAs for each call. Currently, DSAs of size 1MB per process (i.e., a maximum of 1MB * (MaxBackends + auxiliary processes)) would be created and pinned for subsequent reporting. This size does not seem excessively high, even for approx 100 backends and auxiliary processes. > 8) Two minutes seems pretty arbitrary, and also quite high. If a timeout > is necessary, I think it should not be hard-coded. > > Not sure which is the ideal value. Changed it to 15 secs and added a #define as of now. Something that gives enough time for the process to respond but does not hold up the client for too long would be ideal. 15 secs seem to be not enough for github CI tests, which fail with timeout error with this setting. PFA an updated patch with the above changes.
v6-Function-to-report-memory-context-stats-of-any-backe.patch
Description: Binary data