Re: About Custom Aggregates, C Extensions and Memory

Marthin Laubscher Sun, 17 Aug 2025 14:11:46 -0700

Tom Lane [email protected] <mailto:[email protected]> wrote:



> Hm. We do not have in-memory tables, although in some cases a temporary table 
> is close enough.

Yay, I didn't somehow overlook them.

> But there is one other pre-existing mechanism that might help you: "expanded 
> objects". The idea there is that you have some "flat" representation of your 
> data type that can go into a table at need, but you also have an in-memory 
> representation that is better suited to computation, and most of your 
> complicated operations prefer to work on the expanded representation. The 
> name of the game then becomes how to minimize the number of times a value 
> gets flattened and expanded as you push it around in a computation.
> 
> As of v18, we have a pretty decent story for that when it comes to values you 
> are manipulating within pl/pgsql functions, although less so if you need to 
> use other languages. (The expanded-object functionality exists before v18, 
> but it's hard for user-defined types to avoid excess flattening in earlier 
> versions.)
>
> If that sounds promising, see...

I don't yet command the internal variety to engage with the complexities of 
Expanded Objects, which I thought formed part of the TOAST domain I was trying 
not to get too caught up in just yet.

I suspect that putting emphasis on the opacity of the aggregate value made it 
unclear that the I'd like for values of my UDT to look a lot like ordinary 
variable length binary data values for which the <, > and = operations apply 
with no decoding required. I only need the sticky decoding for more involved 
operations like aggregation, various set type operations including the 
simplistic adding or removing a scalar value rather than a set of size one, and 
testing set membership of scalars index values with whichever is more 
appropriate between IN or ANY.  I implemented the first trivial / naïve 
attempts in plpgsql but that environment really wasn't conducive to get the 
real functions written, but I once spoke C better than English (not a joke) so 
I'm pretty confident I can and should do it there.

Another, who shall not be named, suggested I take a look at the different 
memory contexts so I had a look. If I understood correctly, the User Defined 
Type itself does store something analogous to the aggcontext in an aggregate, 
but in the input and output functions as well as other operators and functions 
on the UDT I could define an internal memory structure, switch to a memory 
context with a suitable lifespan, allocate and manage memory using the 
appropriate functions along the lines StringInfo does and remembering to switch 
back to the original memory context the each function gets called with. There's 
even a suggestion that UDT specific custom memory contexts may be created in 
the TopMemoryContext using AllocSetContextCreate with some name or identifier. 
I'd have to figure out what to use for such identifier. Maybe there UDT 
instance already has a database or tuple ID that can be adopted for that 
purpose. It looks like I'd have to find a way to broker between "normal 
function memory contexts" and per tuple aggregate memory contexts but there are 
language in the comments around that about what is and isn't kosher that I 
don't know how to interpret.. 

I don't particularly trust the source. I've paraphrased the above but not all 
the terms mentioned could be found in the GitHub code base. So it could be 
utter bollocks, outdated information, never meant for public consumption, or 
entirely accurate and useful.

Does hearing me mention such things cause you nightmares, either for the perils 
I'd be facing or the mess it is likely to cause in your beloved database, or 
could it be the start of a plan that just might work? 

Essentially the aggregate functions would still be front and centre as defined 
for the user defined type, and though the user defined type itself would be 
largely unaware of it, all the individual functions that manipulate values of 
the UDT would go through the same process of getting access to the value in 
decoded for if it already exist before calling the decoding routines if it 
doesn't. If I choose the right memory context, would that simply age-out when 
the session, transaction, query or aggregate is done, or how what else would 
know we're done with the memory so we can let go of it? As for transitioning 
between per tuple aggregate context and normal function context a plan can be 
devised if the transition points can be detected, to copy stuff across in 
memory. Should be possible e.g. to do that in a final function, even without 
disturbing the value on the aggregate side of things as advised before.

You're forgiven for thinking I am crazy to consider manually manipulating 
memory contexts simpler than the high-level support functions created to toast 
and detoast values at the plpgsql level of abstraction. I'm a big fan of 
failing early and hard when things (besides user input) isn't as expected, and 
also a big fan of explicit programming, i.e. as little magical side-effects as 
possible. 

The SuiteSparse:GraphBLASTS stuff that sparked the conversation you referred to 
is very likely to play a role in my project somewhere down the line. They don't 
know the first thing about what I'm doing and why, I might not always see eye 
to eye with all the players from the domains where their work has found fertile 
grounds, and I'm far from ready for such complications, but it seems inevitable 
that our paths will cross at some point. Thanks for the reference, however 
unintentional.

Regards,
Marthin Laubscher

Re: About Custom Aggregates, C Extensions and Memory

Reply via email to