On 11.03.2016 15:45, Alexander Barkov wrote: > FYI, I have added a new task for this: > > https://jira.mariadb.org/browse/MDEV-9711 >
Alexander, not sure if my question makes any sense, but what about the _cs collations? Like latin1_general_cs, latin2_czech_cs? Don't we like to have these, too? Regards, Binarus > > On 03/11/2016 03:20 PM, Alexander Barkov wrote: >> Hello Binarus, Kristian, >> >> On 03/11/2016 02:25 PM, Kristian Nielsen wrote: >>> Binarus <li...@binarus.de> writes: >>> >>>> "All MySQL collations are of type PADSPACE. This means that all CHAR, >>>> VARCHAR, and TEXT values in MySQL are compared without regard to any >>>> trailing spaces. “Comparison” in this context does not include the >>> >>> Yes, I have always found this terminally stupid as well. But I think it >>> comes from the SQL standard. >> >> Probably this comes from here: >> >>> 2) A <standard character set name> specifies the name of a character set >>> that is defined by a national or >>> international standard. The character repertoire of CS is defined by the >>> standard defining the character set >>> identified by that <standard character set name>. The default collation of >>> the character set is defined by >>> the order of the characters in the standard and has the PAD SPACE >>> characteristic. >> >> So we follow the PAD SPACE requirement for default collations here. >> >> I guess the reasoning here is to treat CHAR and VARCHAR in a similar way >> by default. I agree with the standard here. We don't want different >> behavior for CHAR and VARCHAR. >> >> >> Btw, we don't follow the requirement that the default collation must be >> defined according to "the order of the characters in the standard". >> Default collations are traditionally case insensitive in MariaDB/MySQL. >> >>> >>> The only workaround I know of is to use VARBINARY instead of VARCHAR. I >>> think it works much the same in most respects. But obviously some semantics >>> is lost when the server no longer is aware of the character set used. >> >> Correct. >> >>> >>>> Since the index behaviour obviously depends on the collation, would >>>> building an own collation which does not PADSPACE be an option? I have >>> >>> That would be interesting, actually. I don't know what support there is for >>> non-PADSPACE collations. Maybe bar knows (Cc:'ed)? >> >> We don't have NO PAD collations yet. >> I don't remember that anybody ever asked for this before. >> >> From a glance, this should not be too hard to implement. >> >> You previously wrote you need utf8. >> So there are three options to make a new collation, >> based on one of these existing collations: >> >> - utf8_general_ci >> - utf8_unicode_ci >> - utf8_bin >> >> but with NO PAD characteristics. >> >> Suppose you need utf8_general_nopad_ci >> (i.e. based on utf8_general_ci) >> >> >> utf8_general_ci is implemented in strings/ctype-utf8.c >> >> A new collation handler should be added, similar to this one: >> >> static MY_COLLATION_HANDLER my_collation_utf8_general_ci_handler = >> { >> NULL, /* init */ >> my_strnncoll_utf8_general_ci, >> my_strnncollsp_utf8_general_ci, >> my_strnxfrm_unicode, >> my_strnxfrmlen_unicode, >> my_like_range_mb, >> my_wildcmp_utf8, >> my_strcasecmp_utf8, >> my_instr_mb, >> my_hash_sort_utf8, >> my_propagate_complex >> }; >> >> >> but these three virtual functions must be redefined to new >> similar functions that do not ignore trailing spaces: >> >> - my_strnncollsp_utf8_general_ci - this is used for BTREE indexes >> - my_hash_sort_utf8 - this is used for HASH indexes >> - my_strnxfrm_unicode - this is used for filesort >> (non-indexed ORDER BY) >> >> All other functions can be reused from the existing PAD SPACE collation. >> >> >> >> Then a new "struct charset_info_st" should be defined, >> similar to my_charset_utf8_general_ci. >> >> Sounds like a few hours of work. >> >> >> But then thorough testing will be needed, >> which will be the most time consuming part. >> >> >> Also, for consistency, it's worthy to implement >> at least utf8_nopad_bin and utf8_unicode_nopad_ci at once, >> and then eventually NO PAD collations for all other character sets. >> >>> >>> - Kristian. >>> _______________________________________________ Mailing list: https://launchpad.net/~maria-discuss Post to : maria-discuss@lists.launchpad.net Unsubscribe : https://launchpad.net/~maria-discuss More help : https://help.launchpad.net/ListHelp