2017-08-24 19:17 GMT+02:00 Andre Schappo via Unicode <unicode@unicode.org>:
> > Because there are many systems that can now handle BMP characters but not > cannot handle SMP characters. > > One example being systems that use mysql utf8 (3 byte encoding) and have > not yet updated to utf8mb4 (4 byte encoding) > Mysql's utf8 is known to cause severe problems, notably on wikis installed by default with it: the presence of any non-BMP character (SMP or emojis are now very frequent and available on almost all modern smartphones) in the edited text will cause its **silent** truncation when uploading it to the server (when it will save the text to the database) even if any unsaved preview was correct. You will see the truncation when the page is loaded again. Mysql's "utf8" should have been dropped since long and replaced by utf8mb4 or setup so that data send to an "utf8"-encoded database would cause a SQL error that cannot be silently ignored with truncation (or it least it should only cause the non-BMP characters to be filtered out, without silently deleting everything that follows). This is an old severe bug of Mysql (on the server itself) or in the connection protocol, or internal filters used by Mysql client library, that has caused many severe security issues (such as discarding logs or todo lists, or loss of pending commercial transactions such as lists of payments to process to a bank or truncated billings sent to customers, or loss of contact address or name, or broken complete addresses for product delivery to a customer, or missing items in a delivered box and lost products in the middle of their routing). This is a demosntration that not signaling encoding errors to an application, or not clearly specifiying that an API may cause encoding exceptions that must be caught and must not ignored in applications, can hurt. Even if you use "utf8mb4" encoding errors are still possible and must not be ignored as the final result will be unpredictable.