Tom Lane wrote:
Tom Dunstan <[EMAIL PROTECTED]> writes:
On disk, enums will occupy 4 bytes: the high 22 bits will be an enum
identifier, with the bottom 10 bits being the enum value. This allows
1024 values for a given enum, and 2^22 different enum types, both of
which should be heaps. The exact distribution of bits doesn't matter all that much, we just picked some that we were comfortable with.


I think this is excessive concern for bit-shaving.  Make the on-disk
representation be 8 bytes instead of 4, then you can store the OID
directly and have no need for the separate identifier concept.  This
in turn eliminates one index, one syscache, and one set of lookup/cache
routines.  And you can have as many values of an enum as you darn please.

That's all true. It's a bit depressing to think that IMO 99% of users of this will have enum values whose range would fit into 1 byte, but we'll be using 8 to store it on disk. I had convinced myself that 4 was ok on the basis that alignment issues in surrounding columns would pad out the remaining bits anyway much of the time. Was I correct in that assumption? Would e.g. an int after a char require 3 bytes of padding?

Ok, I'll run one more idea up the flagpole before giving up on a 4 byte on disk representation. :) How about assigning a unique 4 byte id to each enum value, and storing that on disk. This would be unique across the database, not per enum type. The structure of pg_enum would be a bit different, as the per-type enum id would be gone, and there would be multiple rows for each enum type. The columns would be: the type oid, the associated unique id and the textual representation. That would probably simplify the caching mechanism as well, since input function lookups could do a straight syscache lookup on type oid and text representation, and the output function could do a straight lookup on the unique id. No need to muck around creating a little dynahash or whatever to attach to the fn_entra pointer.

It does still require the extra syscache, but it removes the limitations on number of enum types and number of values per type while keeping the on disk size smallish. I like that better than the original idea, actually.


If you didn't notice already: typcache is the place to put any
type-related caching you need to add.

I hadn't. I'll investigate. Thanks.

Cheers

Tom


---------------------------(end of broadcast)---------------------------
TIP 6: explain analyze is your friend

Reply via email to