On Mon, 2023-11-13 at 22:36 +0100, Tomas Vondra wrote: > Yeah. I don't quite agree with the initial argument that not > specifying > the collation explicitly in CREATE TABLE or a query means the user > does > not care about the collation.
I didn't argue that the user doesn't care about collation -- we need to honor the collation semantics of the column. And a column with unspecified collation must be the database collation (otherwise what would the database collation mean?). But the index's collation is an implementation detail that is not necessary to provide the requested semantics. I'm arguing that pathkeys are often not even useful for providing the requested semantics, so why should the user have the pain of poor performance and versioning risks for every text index in their system? If the user just wants PK/FK constraints, and equality lookups, then an index with the "C" collation makes a lot of sense to serve those purposes. > For example, I don't see how we could arbitrarily override the > collation > for indexes backing primary keys, because how would you know the user > will never do a sort on it? The column collation and index collation are tracked separately in the catalog. The column collation cannot be overridden because it's semantically signficant, but there are at least some degrees of freedom we have with the index collation. I don't think we can completely change the default index collation to be "C", but perhaps there could be a database-level option to do so, and that would have no effect on semantics at all. If the user notices some queries that could benefit from an index with a non-"C" collation, they can add/replace an index as they see fit. > Not that uncommon with natural primary keys, > I think (not a great practice, but people do that). Natural keys often have an uncorrelated index, and if the index is not correlated, it's often not useful ORDER BY. When I actually think about schemas and plans I've seen in the wild, I struggle to think of many cases that would really benefit from an index in a non-"C" collation. The best cases I can think of are where it's doing some kind of prefix search. That's not rare, but it's also not so common that I'd like to risk index corruption on every index in the system by default in case a prefix search is performed. > Perhaps we could allow the PK index to have a different collation, > say > by supporting something like this: > > ALTER TABLE distributors ADD PRIMARY KEY (dist_id COLLATE "C"); Yes, I'd like something like that to be supported. We'd have to check that, if the collations are different, that both are deterministic. > And then the planner would just pick the right index, I think. Right now the planner doesn't seem to understand that an index in the "C" collation works just fine for answering equality queries. That should be fixed. > If the > user cares about ordering, they'll specify ORDER BY with either an > explicit or the default collation. If the index collation matches, it > may be useful for the ordering. Exactly. > Of course, if we feel entitled to create the primary key index with a > collation of our choosing, that'd make this unpredictable. I wouldn't describe it as "unpredictable". We'd have some defined way of defaulting the collation of an index which might be affected by a database option or something. In any case, it would be visible with \d. Regards, Jeff Davis >