https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95959
Bug ID: 95959 Summary: Error in conversion from UTF16 to UTF8 Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: simon at pushface dot org Target Milestone: --- Created attachment 48799 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=48799&action=edit Demonstration There's an error in converting from UTF16 to UTF8 for code points in U+10000 to u+10FFFF (which require 4 UTF8 bytes). The attached demonstration shows this by taking a UTF8 character (Clef, U+1D11E), converting to UTF16, and converting back to UTF8, which should round-trip back to the same character, but doesn't. The third byte of the final UTF8 is wrong $ ./utftest Codepoint: 16#1D11E# UTF-8: 4: 2#11110000# 2#10011101# 2#10000100# 2#10011110# UTF-16: 2: 2#1101100000110100# 2#1101110100011110# UTF-8: 4: 2#11110000# 2#10011101# 2#10010000# 2#10011110# Bug The attached patch corrects the problem.