> On 10/15/2010 04:33 AM, Dean Rasheed wrote: >> >> I started looking at this last night, but ran out of time. I'll >> continue this evening / over the weekend.
Continuing my review of this patch... Usability review ---------------- What the patch does: This patch adds syntax to allow additional enum values to be added to an enum type, at any position in the list. The syntax has already been discussed and a broad consensus reached. To me the new syntax seems very logical and easy to use/remember. Do we want that? Yes, I think so. I can think of a few past projects where I could have used this. A possible future extension would be the ability to remove enum values, but that is likely to have additional complications, and I think the number of use cases for it is smaller. I see no reason to insist that it be part of this patch. Do we already have it? No. Does it follow SQL spec, or the community-agreed behavior? Not in the SQL spec, but it does follow agreed behaviour. Does it include pg_dump support (if applicable)? Yes. Are there dangers? None that I can think of. Have all the bases been covered? I just noticed that a couple of columns have been added to pg_type, so there's another bit documentation that needs updating. Feature test ------------ Does the feature work as advertised? Yes. Are there corner cases the author has failed to consider? None that I could find. Are there any assertion failures or crashes? Yes, I'm afraid I managed to provoke a crash by running the regression tests with -DCATCACHE_FORCE_RELEASE, after spotting a suspicious bit of code (see below). Performance review ------------------ Does the patch slow down simple tests? There is a documented performance penalty when working with enum values that are not in OID order. To attempt to measure this I created 2 tables, foo and bar, each with 800,000 rows. foo had an enum column using the planets enum from the regression test, with values not in OID order. bar had a similar enum column, but with values in the default OID order. Performance differences for comparison-heavy operations are most noticable: SELECT MAX(p) FROM foo; max --------- neptune (1 row) Time: 132.305 ms SELECT MAX(p) FROM bar; max --------- neptune (1 row) Time: 93.313 ms SELECT p FROM foo ORDER BY p OFFSET 500000 LIMIT 1; p -------- saturn (1 row) Time: 1516.725 ms SELECT p FROM bar ORDER BY p OFFSET 500000 LIMIT 1; p -------- saturn (1 row) Time: 1043.010 ms (optimised build, asserts off) That's a bigger performance hit than a I would have expected, even though it is documented. enum_ccmp() is using bsearch(), so there's a fair amount of function call overhead there, and perhaps for small enums, a simple linear scan would be faster. I'm not sure to what extent this is worth worrying about. Code review ----------- I'm not familar enough with the pg_upgrade code to really comment on it. Looking at AddEnumLabel() I found it a bit hard to follow, but near the end of the block used when BEFORE/AFTER is specified, it does this: ReleaseCatCacheList(list); /* are the labels sorted by OID? */ if (result && newelemorder > 1) result = newOid > HeapTupleGetOid(existing[newelemorder-2]); if (result && newelemorder < nelems + 1) result = newOid < HeapTupleGetOid(existing[newelemorder-1]); It looks to me as though 'existing[...]' is a pointer into a member of the list that's just been freed, so it risks reading freed memory. That seems to be confirmed by running the tests with -DCATCACHE_FORCE_RELEASE. Doing so causes a number of the tests to fail/crash, but I didn't dig any deeper to confirm that this was the cause. For the most part, this patch looks good, but I think there is still a bit of tidying up to do. Regards, Dean -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers