It's years since I discovered that mysql's utf8 is broken in this way, but I can still feel the pain. What part of "universal" did they not understand? The mysql docs more or less say that "utf8" is deprecated, certainly not future-proof, and suggest you use utf8mb4. See https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html <https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html>)
> On Oct 14, 2021, at 1:17 PM, Sergey Dorofeev <[email protected]> wrote: > > Thank you, did not know about it. Real UTF-8 in mysql is utf8mb4, I think it > should be used here. > > --- > Sergey > > > Jaime Crespo писал 2021-10-14 18:32: > >> I agree that LOWER doesn't make much sense in binary collation. >> >> Sadly, a utf8 (3-byte UTF-8) conversion may fail for 4-byte characters, so >> at the very least it should be utf8mb4 (4-byte UTF-8). I am not so familiar >> with ListPager to say if there could be other issues arising from that- >> sending a code review would be easier for better context. >> >> On Thu, Oct 14, 2021 at 5:16 PM Sergey Dorofeev <[email protected] >> <mailto:[email protected]>> wrote: >> Hello, >> >> I have got issue with ListFiles page in mediawiki 1.35.1 >> Filtering worked not very good, was case-sensitive and not always got >> text in middle of file name. >> I looked in DB and saw that img_name column is varbinary, but >> pagers/ImageListPager.php tries to do case-insensitive select with >> LOWERing both sides of strings. But LOWER does not work for varbinary >> So I think that following change will be reasonable: >> >> --- ImageListPager.php.orig 2021-10-14 16:31:52.000000000 +0300 >> +++ ImageListPager.php 2021-10-14 16:00:10.127694733 +0300 >> @@ -90,9 +90,10 @@ >> >> if ( $nt ) { >> $dbr = wfGetDB( DB_REPLICA ); >> - $this->mQueryConds[] = 'LOWER(img_name)' >> . >> + $this->mQueryConds[] = >> 'LOWER(CONVERT(img_name USING utf8))' . >> $dbr->buildLike( >> $dbr->anyString(), >> - strtolower( >> $nt->getDBkey() ), $dbr->anyString() ); >> + mb_strtolower( >> $nt->getDBkey() ), $dbr->anyString() ); >> + >> } >> } >> >> @@ -161,9 +162,9 @@ >> $nt = Title::newFromText( $this->mSearch ); >> if ( $nt ) { >> $dbr = wfGetDB( DB_REPLICA ); >> - $conds[] = 'LOWER(' . $prefix . '_name)' >> . >> + $conds[] = 'LOWER(CONVERT(' . $prefix . >> '_name USING utf8))' . >> $dbr->buildLike( >> $dbr->anyString(), >> - strtolower( >> $nt->getDBkey() ), $dbr->anyString() ); >> + mb_strtolower( >> $nt->getDBkey() ), $dbr->anyString() ); >> } >> } >> >> >> >> -- >> Sergey >> _______________________________________________ >> Wikitech-l mailing list -- [email protected] >> <mailto:[email protected]> >> To unsubscribe send an email to [email protected] >> <mailto:[email protected]> >> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ >> <https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/> >> >> -- >> Jaime Crespo >> <http://wikimedia.org <http://wikimedia.org/>> >> >> _______________________________________________ >> Wikitech-l mailing list -- [email protected] >> <mailto:[email protected]> >> To unsubscribe send an email to [email protected] >> <mailto:[email protected]> >> https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/ >> <https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/>_______________________________________________ > Wikitech-l mailing list -- [email protected] > To unsubscribe send an email to [email protected] > https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
_______________________________________________ Wikitech-l mailing list -- [email protected] To unsubscribe send an email to [email protected] https://lists.wikimedia.org/postorius/lists/wikitech-l.lists.wikimedia.org/
