On Wed, Jul 23, 2025 at 12:37 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > There have been some complaints recently about how jsonb_agg() > is a lot slower than json_agg() [1]. That's annoying considering > that the whole selling point of JSONB is to have faster processing > than the original JSON type, so I poked into that. What I found > is that jsonb_agg() and its variants are just really inefficiently > implemented. Basically, for each aggregate input value, they will: > > 1. Build a JsonbValue tree representation of the input value. > 2. Flatten the JsonbValue tree into a Jsonb in on-disk format. > 3. Iterate through the Jsonb, building a JsonbValue that is part > of the aggregate's state stored in aggcontext, but is otherwise > identical to what phase 1 built. > > The motivation for this seems to have been to make sure that any > memory leakage during phase 1 does not happen in the long-lived > aggcontext. But it's hard not to call it a Rube Goldberg contraption. > > The attached patch series gets rid of phases 2 and 3 by refactoring > pushJsonbValue() and related functions so that the JsonbValue tree > they construct can be constructed in a context that's not > CurrentMemoryContext. With that and some run-of-the-mill optimization > work, I'm getting 2.5X speedup for jsonb_agg on a text column (as > measured by the attached test script) and a bit over 2X on an int8 > column. It's still a little slower than json_agg, but no longer > slower by integer multiples. > > 0001 is a somewhat invasive refactoring of the API for > pushJsonbValue and friends. It doesn't in itself have any > measurable speed consequences as far as I can tell, but I think > it makes the code nicer in any case. (I really do not like the > existing coding setup where sometimes it's important to capture > the result of pushJsonbValue and sometimes it's not; that > seems awfully confusing and bug-prone.) The real point though > is to have a way of commanding pushJsonbValue to build the > JsonbValue tree somewhere other than CurrentMemoryContext. >
in v1-0001 static void copyScalarSubstructure(JsonbValue *v, MemoryContext outcontext) { } outcontext will always be NULL in 0001. that means copyScalarSubstructure is never really called. it also means that JsonbInState->MemoryContext was always as NULL in 00001 maybe we don't need JsonbInState->MemoryContext in 0001 So 0001 refactoring change would be less invasive. minor issue: copyScalarSubstructure (v->type) order aligned with (enum jbvType) would be great.