> On Jan 28, 2020, at 6:46 PM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> 
> Mark Dilger <mark.dil...@enterprisedb.com> writes:
>> While reviewing the partition-wise join patch, I ran into an issue that 
>> exists in master, so rather than responding to that patch, I’m starting this 
>> new thread.
>> I noticed that this seems similar to the problem that was supposed to have 
>> been fixed in the "Re: COLLATE: Hash partition vs UPDATE” thread.  As such, 
>> I’ve included Tom and Amit in the CC list.
> 
> Hm, I don't see any bug here.  You're asking it to join
> 
>>> CREATE TABLE alpha (a TEXT COLLATE "ja_JP", b TEXT COLLATE "sv_SE");
>>> CREATE TABLE beta (a TEXT COLLATE "tr_TR", b TEXT COLLATE "en_US");
> 
>>> SELECT t1.a, t2.a FROM alpha t1 INNER JOIN beta t2 ON (t1.a = t2.a) WHERE 
>>> t1.a IN ('äbç', 'ὀδυσσεύς');
> 
> so t1.a and t2.a have different collations, and the system can't resolve
> which to use for the comparison.
> 
> Now, I'd be the first to agree that this error could be reported better.
> The parser knows that it couldn't resolve a collation for t1.a = t2.a, but
> what it does *not* know is whether the '=' operator cares for collation.
> Throwing an error when the operator wouldn't care at runtime isn't going
> to make many people happy.  On the other hand, when the operator finally
> does run and can't get a collation, all it knows is that it didn't get a
> collation, not why.  So we can't produce an error message as specific as
> "ja_JP and tr_TR collations conflict".
> 
> Now that the collations feature has settled in, it'd be nice to go back
> and see if we can't improve that somehow.  Not sure how.
> 
> (BTW, before v12 the text '=' operator indeed did not care for collation,
> so this example would've worked.  But the change in behavior is a
> necessary consequence of having invented nondeterministic collations,
> not a bug.)

I contemplated that for a while before submitting the report.  I agree that for 
strings that are not binary equal, some collations might say the two strings 
are equal, and other collations may say that they are not.  But when does any 
collation say that a string is not equal to itself?  All the strings in these 
columns were loaded from the same source table, and they should always equal 
themselves, so the only problem I am aware of is if some of them equal others 
of them under one of the collations in question, where the other collation 
doesn’t think so.  I’m pretty sure that does not exist in this concrete example.

I guess I’m arguing that the system is giving up too soon, saying, “In theory 
there might be values I don’t know how to compare, so I’m going to give up now 
and not look”.

I think what is happening here is that the system thinks, “Hey, I can use a 
hash join for this”, and then later realizes, “Oh, no, I can’t” and instead of 
falling back to something other than hash join, it gives up.

Is there some more fundamental reason this query couldn’t correctly be 
completed?  I don’t mind being enlightened about the part that I’m missing.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply via email to