Re: Adding column "mem_usage" to view pg_prepared_statements

Daniel Migowski Mon, 05 Aug 2019 13:47:36 -0700

Am 05.08.2019 um 19:16 schrieb Andres Freund:

On 2019-07-28 06:20:40 +0000, Daniel Migowski wrote:

how do you want to generalize it? Are you thinking about a view solely
for the display of the memory usage of different objects?

I'm not quite sure. I'm just not sure that adding separate
infrastructure for various objects is a sutainable approach. We'd likely
want to have this for prepared statements, for cursors, for the current
statement, for various caches, ...


I think an approach would be to add an 'owning_object' field to memory
contexts, which has to point to a Node type if set. A table returning reporting
function could recursively walk through the memory contexts, starting at
TopMemoryContext. Whenever it encounters a context with owning_object
set, it prints a string version of nodeTag(owning_object). For some node
types it knows about (e.g. PreparedStatement, Portal, perhaps some of
the caches), it prints additional metadata specific to the type (so for
prepared statement it'd be something like 'prepared statement', '[name
of prepared statement]'), and prints information about the associated
context and all its children.

I understand. So it would be something like the output ofMemoryContextStatsInternal, but in table form with some extra columns. Iwould have loved this extra information already inMemoryContextStatsInternal btw., so it might be a good idea to upgradeit first to find the information and wrap a table function over itafterwards.

The general context information probably should be something like:
context_name, context_ident,
context_total_bytes, context_total_blocks, context_total_freespace, 
context_total_freechunks, context_total_used, context_total_children
context_self_bytes, context_self_blocks, context_self_freespace, 
context_self_freechunks, context_self_used, context_self_children,

It might make sense to have said function return a row for the contexts
it encounters that do not have an owner set too (that way we'd e.g. get
CacheMemoryContext handled), but then still recurse.

A nice way to learn about the internals of the server and to analyze theeffects of memory reducing enhancements.

Arguably the proposed owning_object field would be a bit redundant with
the already existing ident/MemoryContextSetIdentifier field, which
e.g. already associates the query string with the contexts used for a
prepared statement. But I'm not convinced that's going to be enough
context in a lot of cases, because e.g. for prepared statements it could
be interesting to have access to both the prepared statement name, and
the statement.

The identifier seems to be more like a category at the moment, becauseit does not seem to hold any relevant information about the object inquestion. So a more specific name would be nice.

The reason I like something like this is that we wouldn't add new
columns to a number of views, and lack views to associate such
information to for some objects. And it'd be disproportional to add all
the information to numerous places anyway.

I understand your argumentation, but things like Cursors and Portals arerather short living while prepared statements seem to be the place wherememory really builds up.

One counter-argument is that it'd be more expensive to get information
specific to prepared statements (or other object types) that way. I'm
not sure I buy that that's a problem - this isn't something that's
likely going to be used at a high frequency. But if it becomes a
problem, we can add a function that starts that process at a distinct
memory context (e.g. a function that does this just for a single
prepared statement, identified by name) - but I'd not start there.

I also see no problem here, and with Konstantin Knizhnik's autoprepare Iwouldn't use this very often anyway, more just for monitoring purposes,where I don't care if my query is a bit more complex.

While being interesting I still believe monitoring the mem usage of
prepared statements is a bit more important than that of other objects
because of how they change memory consumption of the server without
using any DDL or configuration options and I am not aware of other
objects with the same properties, or are there some? And for the other
volatile objects like tables and indexes and their contents PostgreSQL
already has it's information functions.

Plenty other objects have that property. E.g. cursors. And for the
catalog/relation/... caches it's even more pernicious - the client might
have closed all its "handles", but we still use memory (and it's
absolutely crucial for performance).

Maybe we can do both? Add a single column to pg_prepared_statements, andadd another table for the output of MemoryContextStatsDetail? This hasthe advantage that the single real memory indicator useful for end users(to the question: How much mem takes my sh*t up?) is inpg_prepared_statements and some more intrinsic information in a detailview.

Thinking about the latter I am against such a table, at least in theform where it gives information like context_total_freechunks, becauseit would just be useful for us developers. Why should any end user carefor how many chunks are still open in a MemoryContext, except when he isworking on C-style extensions. Could just be a source of confusion forthem.

Let's think about the goal this should have: The end user should be ableto monitor the memory consumption of things he's in control of or couldaffect the system performance. Should such a table automaticallyaggregate some information? I think so. I would not add more than twomemory columns to the view, just mem_used and mem_reserved. And evenmem_used is questionable, because in his eyes only the memory he cannotuse for other stuff because of object x is important for him (that wasthe reason I just added one column). He would even ask: WHY is there 50%more memory reserved than used, and how I can optimize it? (Would leadto more curious PostgreSQL developers maybe, so that's maybe a plus).

Something that also clearly speaks FOR such a table and against myproposal is, that if someone cares for memory, he would most likely carefor ALL his memory, and in that case monitoring prepared statementswould just be a small subset of stuff to monitor. Ok, I am defeated andwill rewrite my patch if the next proposal finds approval:

I would propose a table pg_mem_usage containing the columnsobject_class, name, detail, mem_usage (rename them if it fits the styleof the other tables more). The name would be empty for some objects likethe unnamed prepared statement, the query strings would be in the detailcolumn. One could add a final "Other" row containing the mem no specificoutput line has been accounted for. Also it could contain lines forCursors and other stuff I am to novice to think of here.

And last: A reason why still we need a child-parent-relationship in thistable (and distinct this_ and total_ mem functions), is that preparedstatements start up to use much more memory when the Generic Plan isstored in it after a few uses. As a user I always had the assumptionthat prepared a statement would already do all the required work to befast, but a statement just becomes blazingly fast when the Generic Planis available (and used), and it would be nice to see for whichstatements that plan has already been generated to consume his memory. Ibelieve the reason for this would be the fear of excessive memory usage.

On the other hand: The Generic Plan had been created for the firstinvocation of the prepared statement, why not store it immediatly. It isa named statement for a reason that it is intended to be reused, evenwhen it is just twice, and since memory seems not to be seen as a scarceresource in this context why not store that immediately. Would drop theneed for a hierarchy here also.


Any comments?

Regards,
Daniel Migowski

Re: Adding column "mem_usage" to view pg_prepared_statements

Reply via email to