Hi, On 2021-08-04 09:31:25 -0400, Robert Haas wrote: > This is pretty integer-centric, though. If your pass-by-value type is > storing timestamps, for example, they're not likely to be especially > close to zero. Since a 64-bit address is pretty big, perhaps they're > still close enough to zero that this will work out to a win, but I > don't know, that seems a bit cheesy.
Yea, that's fair. The really badâ„¢ example probably is negative numbers - which wouldn't be easy to do something about in a datatype agnostic way. > I grant that it could work out to a win -- pass-by-value data types whose > distribution is very different from what's typical for integers, or for that > matter columns full of integers that all happen to be toward the extreme > values the data type can store, are probably not that common. It'd work out as a wash for common timestamps: ./varint_test -u 681413261095983 processing unsigned unsigned: 681413261095983 input bytes: 00 02 6b bd e3 5f 74 2f 8 output bytes: 01 02 6b bd e3 5f 74 2f decoded: 681413261095983 I don't think there's many workloads where plain integers would skew extreme enough for it to work out to a loss often enough to matter. But: > I just don't really like making such assumptions on a system-wide basis (as > opposed to a per-datatype basis where it's easier to reason about the > consequences). I'd not at all be opposed to datatypes having influence over the on-disk encoding. I was just musing about a default heuristic that could make sense. I do think you'd want something that chooses the encoding for one pg_attribute values based on preceding columns. Greetings, Andres Freund