Uh, ok, something obviously went wrong there. Checking.

On Sat, Jul 21, 2018 at 8:30 AM, Rasmus Lerdorf <ras...@lerdorf.com> wrote:

> For future reference, here is what I did to fix the encoding problem:
>
> MariaDB [phpbugsdb]> select sdesc from bugdb where id=76553;
> +-----------------------------------------------------------
> ------------------------------------------------------------
> ---------------------------------------------+
> | sdesc
>
>                |
> +-----------------------------------------------------------
> ------------------------------------------------------------
> ---------------------------------------------+
> | Ð˜Ð¼Ñ Ð¿ÐµÑ€ÐµÐ¼ÐµÐ½Ð½Ð¾Ð¹ может Ñ Ð¾Ð´ÐµÑ€Ð¶Ð°Ñ‚ÑŒ управлÑ
> ющие
>                 |
> +-----------------------------------------------------------
> ------------------------------------------------------------
> ---------------------------------------------+
> 1 row in set (0.00 sec)
>
> MariaDB [phpbugsdb]> alter table bugdb drop index email;
> Query OK, 76298 rows affected (0.85 sec)
> Records: 76298  Duplicates: 0  Warnings: 0
>
> MariaDB [phpbugsdb]> alter table bugdb modify sdesc varbinary(80) NOT NULL
> DEFAULT '', modify ldesc binary NOT NULL, modify email varbinary(40) NOT
> NULL DEFAULT '';
> Query OK, 76298 rows affected, 65535 warnings (0.65 sec)
> Records: 76298  Duplicates: 0  Warnings: 76091
>
> MariaDB [phpbugsdb]> alter table bugdb modify sdesc varchar(80) CHARACTER
> SET utf8mb4 NOT NULL DEFAULT '', modify ldesc text CHARACTER SET utf8mb4
> NOT NULL, modify email varchar(40) CHARACTER SET utf8mb4 NOT NULL DEFAULT
> '';
> Query OK, 76298 rows affected, 127 warnings (0.57 sec)
> Records: 76298  Duplicates: 0  Warnings: 127
>
> MariaDB [phpbugsdb]> alter table bugdb add FULLTEXT INDEX `email`
> (`email`,`sdesc`,`ldesc`);
> Query OK, 76298 rows affected (1.56 sec)
> Records: 76298  Duplicates: 0  Warnings: 0
>
> MariaDB [phpbugsdb]> select sdesc from bugdb where id=76553;
> +-----------------------------------------------------------
> -----------------------+
> | sdesc
>          |
> +-----------------------------------------------------------
> -----------------------+
> | Имя переменной может содержать управляющие
>         |
> +-----------------------------------------------------------
> -----------------------+
> 1 row in set (0.00 sec)
>
> The trick was to convert the columns to binary first. When I went straight
> from latin1 to utf8 I got the utf8 equivalent of the latin1 characters. By
> telling it that the data was actually binary first, it converted from
> binary to utf8 which appears to have worked. There were some warnings,
> which I assume are invalid utf8 byte sequences somewhere.
>
> -Rasmus
>

Reply via email to