After further thought about the mergejoinable-operators issue and some other longstanding planner problems, I have a modest proposal to make: we should invent the notion of "operator class groups", which identify sets of compatible operator classes. (I'm not wedded to the name "class group"; it seems a bit redundant. Anyone have a better idea?) We've touched on related ideas in past threads but never really put together a concrete proposal. Here is one.
A class group is associated with a specific index AM and can contain only opclasses for that AM. We might for instance invent "numeric" and "numeric_reverse" groups for btree, to contain the default opclasses and reverse-sort opclasses for the standard arithmetic types. We further require that any given opclass be a member of at most one class group (this simplifies matters, and there isn't any application I can see for one opclass being in more than one group), and that a class group contain at most one opclass for a given datatype (ditto). Recall that the member operators of an opclass all have the opclass' indexable datatype as their lefthand inputs, but the righthand input datatype can vary. These rules imply that any given operator can stand in at most one relationship to a class group: there is only one potential class member it could be in. It's still possible for an operator to appear in different groups (e.g., within both normal and reverse-sort groups). For btree opclasses, if two opclasses are in the same group, it means that they have compatible sort ordering and equality rules. In particular, for any "=" operators of member opclasses and values A, B, C of the appropriate datatypes, we require A = B and B = C implies A = C Likewise for "<" operators: A < B and B < C implies A < C and likewise for <=, >, >= (I think these cases are implied if the first two must hold). I believe these rules are sufficient to guarantee that any "=" operator in the opclass group can be mergejoined using "<" and ">" operators taken from the appropriate member opclasses. Therefore we can entirely eliminate the oprlsortop etc. fields of pg_operator and rely on looking into the opclass catalogs to determine how to do mergejoins. This is an improvement over the current situation (and also my proposal of yesterday) because it will be possible to support both ascending and descending mergejoins. In the current structure we can only associate one lsortop ("<") with any "=", but if there are reverse-sort opclasses in existence it would be nice to be able to perform a mergejoin using ">" as well. Furthermore, the class group structure provides a solid foundation for the planner's transitive-equality deductions, which are currently pretty bogus as discussed yesterday. Lastly, the group structure fixes a nasty problem in predicate testing: currently, we tend to fail to prove implications involving cross-datatype operators because we can't find operators that are in the same opclass and compare the two types we need to compare. With class groups, we can use any two operators in the same class group to make a deduction. (This also cleans up btree scankey analysis for cross-type situations; it has the same problem of not knowing how to determine which key values are lesser or greater.) For hash opclasses, placing two opclasses in the same group means that they have compatible hash functions. This allows us to support hashing of cross-type comparisons. As an example, "int4 eq int8" can be hashed if the int8 hash function guarantees that for any int8 value that could fit in int4, it will produce the same hash code as the int4 hash function does. (Which, in fact, our current integer hash functions are designed to do.) Currently we cannot exploit this because we have no way to identify which two hash functions to use. With opclass groups, we could put cross-type equality operators into hash opclasses the same way as we do for btree opclasses, and the rule would be to use the hash functions for the group's opclasses matching the left and right datatypes. I don't have any idea whether opclass groups would be useful for GiST or GIN indexes, but maybe Oleg and Teodor can think of applications. As far as syntax goes, I was envisioning CREATE/DROP OPERATOR CLASS GROUP name USING index_method; and adding an optional "GROUP name" clause to CREATE OPERATOR CLASS. We could imagine adding a lot of additional stuff such as being able to ALTER an existing opclass' group membership, but at least for the first cut I see no need for anything more than this. (In particular, since we'd probably only allow these commands to superusers, we needn't bother with a concept of ownership for class groups.) In the system catalogs, we'd need to add a "pg_opclassgroup" catalog with OIDs and columns for name, namespace, and index AM's OID; and pg_opclass would get a new column for the OID of the class's group, or 0 if not in any group (which really effectively means it's a group of one opclass). BTW, I'm also pretty strongly tempted to denormalize the catalogs a bit by adding group OID and index AM OID to individual pg_amop entries, thereby reducing lookup time when trying to identify an operator's properties. Currently we need an extra probe into pg_opclass to see which index AM the opclass belongs to, and with this proposal we'd also be looking at the group OID quite a lot. Comments? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq