We forgot to mention that we'll need to implement domains over enums and arrays of enums too.
cheers andrew Tom Dunstan wrote: > Hi guys > > Andrew and I got together and worked out a more detailed idea of how we > want to add enums to the postgresql core. This follows on from his > original enumkit prototype last year [1]. Here's a more formal proposal > / design with what we came up with. Comments / criticism hereby solicited. > > > How they will work (once created) is more or less the same as last time > with the enumkit, with the exception of how they're created. > > Enum types will be created with a specialised version of the CREATE TYPE > command thusly: > > CREATE TYPE rgb AS ENUM ('red', 'green', 'blue'); > > They can then be used as column types, being input in quoted string form > as with other user types: > > CREATE TABLE enumtest (col rgb); > INSERT INTO enumtest VALUES ('red'); > > Input is to be case sensitive, and ordering is to be in the definition > order, not the collation order of the text values (ie 'red' < 'green' in > the example above). See the original thread for more discussion and > usage examples. > > > The implementation will work as below. I've included something of a list > of stuff to do as well. > > On disk, enums will occupy 4 bytes: the high 22 bits will be an enum > identifier, with the bottom 10 bits being the enum value. This allows > 1024 values for a given enum, and 2^22 different enum types, both of > which should be heaps. The exact distribution of bits doesn't matter all > that much, we just picked some that we were comfortable with. > > The identifier is required as output functions are not fed information > about which exact type they are being asked to format (see below). > > The creation of a new pg_enum catalog is required. This will hold: > - the type OID for the enum, from pg_type > - the enum identifier for on disk storage > - the enum values in definition order, as an array of text values > > The CREATE TYPE command will create a row in pg_type and a row in > pg_enum. We will get a new enum id by scanning pg_enum and looking for > the first unused value, rather than using a sequence, to make reuse of > enum ids more predictable. > > Two new syscaches on pg_enum will be created to simplify lookup in the > i/o functions: one indexed by type oid for the input function, and one > indexed by enum id for the output function. > > All functions will be builtins; there will be no duplicate entries of > them in pg_proc as was required for the enumkit. > > The i/o functions will both cache enum info in the same way that the > domain and composite type i/o functions do, by attaching the data to the > fcinfo->flinfo->fn_extra pointer. The input function will look up the > enum data in the syscache using the type oid that it will be passed, and > cache it in a hashtable or binary tree for easy repeated lookup. The > output function will look up the enum data in the syscache using the > enum id stripped from the high 22 bits of the on-disk value and cache > the data as a straight array for easy access, with the enum value being > used as a index into the array. > > The other functions will all work pretty much like they did in the > enumkit, with comparison operators more or less treating the enum as its > integer representation. > > The grammar will have to be extended to support the new CREATE TYPE > syntax. This should not require making ENUM a reserved word. Likewise > psql will be extended to learn the new grammar. There's probably a bit > of work to do in DROP TYPE to make sure it deletes rows from pg_enum > when appropriate. > > pg_dump must be taught how to dump enums properly. > > We'll need some regression tests, maybe including one in one of the PL > testsuites to ensure that the io functions work happily when called from > a non-standard direction. > > Documentation etc. > > > General discussion: > > While we would really like to have had a 2 byte representation on disk > (or even 1 for most cases), with the stored value being *just* the enum > ordinal and not containing any type info about the enum type itself, > this is difficult. Since the output function cleanup [2] [3], postgresql > doesn't pass through the expected output type to output functions. This > makes it difficult to tell the difference between e.g. the first value > of the various enums, which would all have an integer representation of > 0. We could have gone down the path of having the output function look > up its expected type from the fcinfo->flinfo struct, as Martijn's tagged > types do [4], but that would have required extra entries in pg_proc for > every single enum. Alternatively we could have stored the full enum type > oid on disk, but that would have blown out the on-disk representation to > 5 or 6 bytes. The given approach of having a smaller enum id and the > enum ordinal value stored in the 4 bytes seems a reasonable tradeoff > given the current constraints. > > To preempt some questions (particularly some which came up in the > enumkit discussion), here's a list of stuff which will *not* be > implemented in the initial patch (and quite possibly never): > > - Support for ALTER TYPE to allow adding / modifying values etc. For > the time being you'll just have to create a new type, do a bunch of > ALTER TABLE commands, DROP the old type and rename the new one if you > want the old name back. > > - Inline column enum declarations a la MySQL. While this feature might > allow easier migration from MySQL, and we could theoretically do it by > creating an anonymous type when creating the table, the cleanup when the > column/table are dropped is a real problem, and pg_dump has to get a lot > smarter. Given the ugliness of suppporting something similar with SERIAL > columns [5], this is definitely not on the cards anytime soon. > > - Ordering by text value rather than the declaration order. If you > want this, you really want a varchar domain instead. Or alternately you > can order by e.g. colname::text if that does what you want. Doing > something like that sounds suspiciously like ordering something for > human consumption, though, which sounds like a really fast way to make > your application difficult to localize. Anyway, if that's the only > ordering you'll ever want, just define the values in alphabetical order. > :) > > - Access to the internal integer representation. If you need to modify > the values used or want to know what the integer is, use a lookup table > instead. Enums are the wrong abstraction for you. > > > Comments? Particularly on implementation strategy; the functionality was > thrashed out pretty well last time around. > > Cheers > > Tom "unholy chimera" Dunstan > > > [1] http://archives.postgresql.org/pgsql-hackers/2005-10/msg01243.php > [2] http://archives.postgresql.org/pgsql-hackers/2005-04/msg00998.php > [3] http://archives.postgresql.org/pgsql-hackers/2005-12/msg00454.php > [4] http://svana.org/kleptog/pgsql/taggedtypes.html > [5] http://archives.postgresql.org/pgsql-hackers/2006-04/msg01118.php > > ---------------------------(end of broadcast)--------------------------- TIP 6: explain analyze is your friend