Thanks for looking at this again!

On 3/4/20 1:33 PM, Alvaro Herrera wrote:
I came across an interesting thing, namely multirange_canonicalize()'s
use of qsort_arg with a callback of range_compare().  range_compare()
calls range_deserialize() (non-trivial parsing) for each input range;
multirange_canonicalize() later does a few extra deserialize calls of
its own.  Call me a premature optimization guy if you will, but I think
it makes sense to have a different struct (let's call it
"InMemoryRange") which stores the parsed representation of each range;
then we can deserialize all ranges up front, and use that as many times
as needed, without having to deserialize each range every time.

I don't know, this sounds like a drastic change. I agree that multirange_deserialize and range_deserialize do a lot of copying (not really any parsing though, and they both assume their inputs are already de-TOASTED). But they are used very extensively, so if you wanted to remove them you'd have to rewrite a lot.

I interpreted the intention of range_deserialize to be a way to keep the range struct fairly "private" and give a standard interface to extracting its attributes. Its motive seems akin to deconstruct_array. So I wrote multirange_deserialize to follow that principle. Both functions also handle memory alignment issues for you. With multirange_deserialize, there isn't actually much structure (just the list of ranges), so perhaps you could more easily omit it and give callers direct access into the multirange contents. That still seems risky though, and less well encapsulated.

My preference would be to see if these functions are really a performance problem first, and only redo the in-memory structures if they are. Also that seems like something you could do as a separate project. (I wouldn't mind working on it myself, although I'd prefer to do actual temporal database features first.) There are no backwards-compatibility concerns to changing the in-memory structure, right? (Even if there are, it's too late to avoid them for ranges.)

While I'm at this, why not name the new file simply multiranges.c
instead of multirangetypes.c?

As someone who doesn't do a lot of Postgres hacking, I tried to follow the approach in rangetypes.c as closely as I could, especially for naming things. So I named the file multirangetypes.c because there was already rangetypes.c. But also I can see how the "types" emphasizes that ranges and multiranges are not concrete types themselves, but more like abstract data types or generics (like arrays).

Yours,

--
Paul              ~{:-)
p...@illuminatedcomputing.com


Reply via email to