> Le 28 nov. 2023 à 21:47, Hans Henrik Bergan <divinit...@gmail.com> a écrit : > >> What is the migration path for legacy code that use those directives? > > The migration path is to convert the legacy-encoding PHP files to UTF-8. > Luckily this can be largely automated, here is my attempt: > https://github.com/divinity76/php2utf8/blob/main/src/php2utf8.php > but that code definitely needs some proof-reading and additions - idk > if the approach used is even a good approach, it was just the first i > could think of, feel free to write one from scratch
Hi, Converting the character encoding of php files is by no means sufficient, except in the simplest cases. Strings of text are to be found in various places, such as: 1. in the php files, as literals; 2. inside memory, at runtime; 3. in non-php data files stored on the server; 4. in the database; 5. as presented to the user (e.g. html document) and as received from them (e.g. form submission); 6. etc. If you change the character encoding in (1), you necessarily change the encoding in (2), unless you wrap your literals with some function that performs the conversion in the other direction at runtime. And if you change the encoding in (2), you should be very careful when your text flows from and to (3), (4), (5) and (6): you should either change the encoding at those places, or make sure that proper conversion is done at the boundaries of those domains. Also, mechanical conversion is not the whole story. For example, if you change the encoding in (5), you should not forget to adapt the <meta charset> tag and/or the content-type http header. Also, all strings are not text, and only a human can decide whether the literal “\xe9” in a random location is meant to encode the raw byte 0xE9 or the character “é” in latin-1. Of course, because we live in an interesting world, there will be situations where the encoding is unknown or ambiguous. Yuya mentioned the case of Shift-JIS which has various incompatible variants, and I am happy not to have encountered such ambiguities (only unknownnesses) when I converted our code base from windows-1252 (aka latin-1) to utf-8 a few years ago. —Claude