efriedma added inline comments.
================ Comment at: llvm/lib/Support/ConvertUTFWrapper.cpp:176 + // enough that we can fit a null terminator without reallocating. + Out.resize(SrcBytes.size() * UNI_MAX_UTF8_BYTES_PER_CODE_POINT + 1); + UTF8 *Dst = reinterpret_cast<UTF8 *>(&Out[0]); ---------------- MarcusJohnson91 wrote: > efriedma wrote: > > `SrcBytes.size() * UNI_MAX_UTF8_BYTES_PER_CODE_POINT + 1` seems like way > > too much memory. > I copied that from the UTF16 code I'm not sure the math is right even for UTF-16, but anyway, UTF-32 is a little different from UTF-16. A 2-byte character in UTF-16 can translate to 3 bytes in UTF-8. That sort of thing is impossible in UTF-32: a UTF-32 string is never shorter than its translation to UTF-8. A codepoint in UTF-8 is at most 4 bytes. ================ Comment at: llvm/lib/Support/ConvertUTFWrapper.cpp:168 + std::vector<UTF32> ByteSwapped; + if (Src[0] == UNI_UTF16_BYTE_ORDER_MARK_SWAPPED) { + ByteSwapped.insert(ByteSwapped.end(), Src, SrcEnd); ---------------- efriedma wrote: > MarcusJohnson91 wrote: > > efriedma wrote: > > > Wrong constant. > > > > > > Is this really the function you want to be using from clang? I don't > > > really understand why you'd want to handle byte order marks. > > I don't really care about the BOM tbh, I just figured if I was in here, I > > should flesh out the UTF-32 interface. > The BOM handling is actually actively a problem if you're planning to use the > interface to interpret wprintf format strings. We don't want to byteswap > `L"\uFFFE%s"` or something like that. Any thoughts on this? CHANGES SINCE LAST ACTION https://reviews.llvm.org/D106753/new/ https://reviews.llvm.org/D106753 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits