I've been testing the new UUID functionality in 8.3dev and noticed that UUIDs are sorted using memcmp in their default in-memory layout, which is:
struct uuid { uint32_t time_low; uint16_t time_mid; uint16_t time_hi_and_version; uint8_t clock_seq_hi_and_reserved; uint8_t clock_seq_low; uint8_t node[_UUID_NODE_LEN]; }; When done that way, you're going to see a lot of index B-tree fragmentation with even DCE 1.1 (ISO/IEC 11578:1996) time based UUIDs, as described above. With random (version 4) or hashed based (version 3 or 5) UUIDs there's nothing that can be done to improve the situation, obviously. So I went down the path of changing the pgsql sorting order to instead sort by, from most significant to least: 1) Node (MAC address), 2) clock sequence, then 3) time. The implementation is as follows: /* internal uuid compare function */ static int uuid_internal_cmp(const pg_uuid_t *arg1, const pg_uuid_t *arg2) { int result; /* node */ if ((result = memcmp(&arg1->data[10], &arg2->data[10], 6)) != 0) return result; /* clock_seq_hi_and_reserved, clock_seq_low */ if ((result = memcmp(&arg1->data[8], &arg2->data[8], 2)) != 0) return result; /* time_hi_and_version */ if ((result = memcmp(&arg1->data[6], &arg2->data[6], 2)) != 0) return result; /* time_mid */ if ((result = memcmp(&arg1->data[4], &arg2->data[4], 2)) != 0) return result; /* time_low */ return memcmp(&arg1->data[0], &arg2->data[0], 4); } This results in much less fragmentation and reduced page hits when indexing a UUID column. When multiple UUID generators with different node values contribute to a single table concurrently, it should also result in better performance than if they sorted the way they do now or by time first. Sorting UUIDs when they are random/hashed with memcmp seems pretty darn useless in all scenarios and performs poorly on indexes. This method is equally poor with random/hashed UUIDs, but much better with version 1 time based UUIDs. What do you guys think about changing the default behavior of pgsql to compare UUIDs this way? -- Robert