Re: A varint implementation for PG?

Andres Freund Wed, 04 Aug 2021 10:42:03 -0700

Hi,

On 2021-08-04 09:31:25 -0400, Robert Haas wrote:
> This is pretty integer-centric, though. If your pass-by-value type is
> storing timestamps, for example, they're not likely to be especially
> close to zero. Since a 64-bit address is pretty big, perhaps they're
> still close enough to zero that this will work out to a win, but I
> don't know, that seems a bit cheesy.


Yea, that's fair. The really bad™ example probably is negative numbers - which
wouldn't be easy to do something about in a datatype agnostic way.


> I grant that it could work out to a win -- pass-by-value data types whose
> distribution is very different from what's typical for integers, or for that
> matter columns full of integers that all happen to be toward the extreme
> values the data type can store, are probably not that common.

It'd work out as a wash for common timestamps:

./varint_test -u 681413261095983
processing unsigned
unsigned:       681413261095983
  input bytes:   00 02  6b bd  e3 5f  74 2f
8 output bytes:  01 02  6b bd  e3 5f  74 2f
decoded:        681413261095983

I don't think there's many workloads where plain integers would skew extreme
enough for it to work out to a loss often enough to matter. But:

> I just don't really like making such assumptions on a system-wide basis (as
> opposed to a per-datatype basis where it's easier to reason about the
> consequences).

I'd not at all be opposed to datatypes having influence over the on-disk
encoding. I was just musing about a default heuristic that could make sense. I
do think you'd want something that chooses the encoding for one pg_attribute
values based on preceding columns.

Greetings,

Andres Freund

Re: A varint implementation for PG?

Reply via email to