Am 05.08.2019 um 19:16 schrieb Andres Freund:
On 2019-07-28 06:20:40 +0000, Daniel Migowski wrote:
how do you want to generalize it? Are you thinking about a view solely
for the display of the memory usage of different objects?
I'm not quite sure. I'm just not sure that adding separate
infrastructure for various objects is a sutainable approach. We'd likely
want to have this for prepared statements, for cursors, for the current
statement, for various caches, ...

I think an approach would be to add an 'owning_object' field to memory
contexts, which has to point to a Node type if set. A table returning reporting
function could recursively walk through the memory contexts, starting at
TopMemoryContext. Whenever it encounters a context with owning_object
set, it prints a string version of nodeTag(owning_object). For some node
types it knows about (e.g. PreparedStatement, Portal, perhaps some of
the caches), it prints additional metadata specific to the type (so for
prepared statement it'd be something like 'prepared statement', '[name
of prepared statement]'), and prints information about the associated
context and all its children.
I understand. So it would be something like the output of MemoryContextStatsInternal, but in table form with some extra columns. I would have loved this extra information already in MemoryContextStatsInternal btw., so it might be a good idea to upgrade it first to find the information and wrap a table function over it afterwards.
The general context information probably should be something like:
context_name, context_ident,
context_total_bytes, context_total_blocks, context_total_freespace, 
context_total_freechunks, context_total_used, context_total_children
context_self_bytes, context_self_blocks, context_self_freespace, 
context_self_freechunks, context_self_used, context_self_children,

It might make sense to have said function return a row for the contexts
it encounters that do not have an owner set too (that way we'd e.g. get
CacheMemoryContext handled), but then still recurse.
A nice way to learn about the internals of the server and to analyze the effects of memory reducing enhancements.
Arguably the proposed owning_object field would be a bit redundant with
the already existing ident/MemoryContextSetIdentifier field, which
e.g. already associates the query string with the contexts used for a
prepared statement. But I'm not convinced that's going to be enough
context in a lot of cases, because e.g. for prepared statements it could
be interesting to have access to both the prepared statement name, and
the statement.
The identifier seems to be more like a category at the moment, because it does not seem to hold any relevant information about the object in question. So a more specific name would be nice.
The reason I like something like this is that we wouldn't add new
columns to a number of views, and lack views to associate such
information to for some objects. And it'd be disproportional to add all
the information to numerous places anyway.
I understand your argumentation, but things like Cursors and Portals are rather short living while prepared statements seem to be the place where memory really builds up.
One counter-argument is that it'd be more expensive to get information
specific to prepared statements (or other object types) that way. I'm
not sure I buy that that's a problem - this isn't something that's
likely going to be used at a high frequency. But if it becomes a
problem, we can add a function that starts that process at a distinct
memory context (e.g. a function that does this just for a single
prepared statement, identified by name) - but I'd not start there.
I also see no problem here, and with Konstantin Knizhnik's autoprepare I wouldn't use this very often anyway, more just for monitoring purposes, where I don't care if my query is a bit more complex.
While being interesting I still believe monitoring the mem usage of
prepared statements is a bit more important than that of other objects
because of how they change memory consumption of the server without
using any DDL or configuration options and I am not aware of other
objects with the same properties, or are there some? And for the other
volatile objects like tables and indexes and their contents PostgreSQL
already has it's information functions.
Plenty other objects have that property. E.g. cursors. And for the
catalog/relation/... caches it's even more pernicious - the client might
have closed all its "handles", but we still use memory (and it's
absolutely crucial for performance).

Maybe we can do both? Add a single column to pg_prepared_statements, and add another table for the output of MemoryContextStatsDetail? This has the advantage that the single real memory indicator useful for end users (to the question: How much mem takes my sh*t up?) is in pg_prepared_statements and some more intrinsic information in a detail view.

Thinking about the latter I am against such a table, at least in the form where it gives information like context_total_freechunks, because it would just be useful for us developers. Why should any end user care for how many chunks are still open in a MemoryContext, except when he is working on C-style extensions. Could just be a source of confusion for them.

Let's think about the goal this should have: The end user should be able to monitor the memory consumption of things he's in control of or could affect the system performance. Should such a table automatically aggregate some information? I think so. I would not add more than two memory columns to the view, just mem_used and mem_reserved. And even mem_used is questionable, because in his eyes only the memory he cannot use for other stuff because of object x is important for him (that was the reason I just added one column). He would even ask: WHY is there 50% more memory reserved than used, and how I can optimize it? (Would lead to more curious PostgreSQL developers maybe, so that's maybe a plus).

Something that also clearly speaks FOR such a table and against my proposal is, that if someone cares for memory, he would most likely care for ALL his memory, and in that case monitoring prepared statements would just be a small subset of stuff to monitor. Ok, I am defeated and will rewrite my patch if the next proposal finds approval:

I would propose a table pg_mem_usage containing the columns object_class, name, detail, mem_usage (rename them if it fits the style of the other tables more). The name would be empty for some objects like the unnamed prepared statement, the query strings would be in the detail column. One could add a final "Other" row containing the mem no specific output line has been accounted for. Also it could contain lines for Cursors and other stuff I am to novice to think of here.

And last: A reason why still we need a child-parent-relationship in this table (and distinct this_ and total_ mem functions), is that prepared statements start up to use much more memory when the Generic Plan is stored in it after a few uses. As a user I always had the assumption that prepared a statement would already do all the required work to be fast, but a statement just becomes blazingly fast when the Generic Plan is available (and used), and it would be nice to see for which statements that plan has already been generated to consume his memory. I believe the reason for this would be the fear of excessive memory usage.

On the other hand: The Generic Plan had been created for the first invocation of the prepared statement, why not store it immediatly. It is a named statement for a reason that it is intended to be reused, even when it is just twice, and since memory seems not to be seen as a scarce resource in this context why not store that immediately. Would drop the need for a hierarchy here also.

Any comments?

Regards,
Daniel Migowski



Reply via email to