Mark Dilger <mark.dil...@enterprisedb.com> writes: > According to Tom: >> (BTW, before v12 the text '=' operator indeed did not care for collation, >> so this example would've worked. But the change in behavior is a >> necessary consequence of having invented nondeterministic collations, >> not a bug.)
> I’m still struggling with that, because the four collations I used in > the example are all deterministic. I totally understand why having more > than one collation matters if you ask that your data be in sorted order, > as the system needs to know which ordering to use. But for equality, I > would think that deterministic collations are all interchangeable, > because they all agree on whether A = B, regardless of the collation > defined on column A and/or on column B. Maybe I’m wrong about that. Well, you're not wrong, but you're assuming much finer distinctions than the collation machinery actually makes (or than it'd be sane to ask it to make, IMO). We don't have a way to tell texteq that "well, we don't know what collation to assign to this operation, but it's okay to assume that it's deterministic". Nor does the parser have any way to know that texteq could be satisfied by that knowledge --- if it doesn't even know whether texteq cares about collation, how could it know that? There are other issues here too. Just because the query could theoretically be implemented without reference to any specific collation doesn't mean that that's a practical thing to do. It'd be unclear for instance whether we can safely use indexes that *do* have specific collations attached. We'd also lose the option to consider plans like mergejoins. If the parser understood that a particular operator behaved like text equality --- which it does not, and I guarantee you I will shoot down any proposal to hard-wire a parser test for that particular operator --- you could imagine assigning "C" collation when we have an unresolvable combination of deterministic collations for the inputs. That dodges the problem of not having any way to represent the situation. But it's still got implementation issues, in that such a collation choice probably won't match the collations of any indexes for the input columns. Another issue is that collations "bubble up" in the parse tree, so sneaking in a collation that's not supposed to be there per spec carries a risk of causing unexpected semantics further up. I think we could get away with that for the particular case of equality (which returns collation-less boolean), but this is another thing that makes the case narrower and less useful. In the end, TBH, I'm not finding your example compelling enough to be worth putting in weird hacks for such cases. If you're joining columns of dissimilar collations, you're going to be finding it necessary to specify what collation to use in a lot of places ... so where's the value in making a weird special case for equality? regards, tom lane