Re: WIP Incremental JSON Parser

Andrew Dunstan Wed, 03 Jan 2024 07:00:06 -0800


On 2024-01-03 We 08:45, Robert Haas wrote:

On Wed, Jan 3, 2024 at 6:57 AM Andrew Dunstan <and...@dunslane.net> wrote:

Yeah. One idea I had yesterday was to stash the field names, which in
large JSON docs tent to be pretty repetitive, in a hash table instead of
pstrduping each instance. The name would be valid until the end of the
parse, and would only need to be duplicated by the callback function if
it were needed beyond that. That's not the case currently with the
parse_manifest code. I'll work on using a hash table.

IMHO, this is not a good direction. Anybody who is parsing JSON
probably wants to discard the duplicated labels and convert other
heavily duplicated strings to enum values or something. (e.g. if every
record has {"color":"red"} or {"color":"green"}). So the hash table
lookups will cost but won't really save anything more than just
freeing the memory not needed, but will probably be more expensive.



I don't quite follow.

Say we have a document with an array 1m objects, each with a fieldcalled "color". As it stands we'll allocate space for that field name 1mtimes. Using a hash table we'd allocated space for it once. Andallocating the memory isn't free, although it might be cheaper thandoing hash lookups.

I guess we can benchmark it and see what the performance impact of usinga hash table might be.

Another possibility would be simply to have the callback free the fieldname after use. for the parse_manifest code that could be a one-lineaddition to the code at the bottom of json_object_manifest_field_start().

The parse_manifest code does seem to pfree the scalar values it no
longer needs fairly well, so maybe we don't need to to anything there.

Hmm. This makes me wonder if you've measured how much actual leakage there is?

No I haven't. I have simply theorized about how much memory we mightconsume if nothing were done by the callers to free the memory.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: WIP Incremental JSON Parser

Reply via email to