efriedma added inline comments.

================
Comment at: llvm/lib/Support/ConvertUTFWrapper.cpp:176
+  // enough that we can fit a null terminator without reallocating.
+  Out.resize(SrcBytes.size() * UNI_MAX_UTF8_BYTES_PER_CODE_POINT + 1);
+  UTF8 *Dst = reinterpret_cast<UTF8 *>(&Out[0]);
----------------
MarcusJohnson91 wrote:
> efriedma wrote:
> > `SrcBytes.size() * UNI_MAX_UTF8_BYTES_PER_CODE_POINT + 1` seems like way 
> > too much memory.
> I copied that from the UTF16 code
I'm not sure the math is right even for UTF-16, but anyway, UTF-32 is a little 
different from UTF-16.  A 2-byte character in UTF-16 can translate to 3 bytes 
in UTF-8.  That sort of thing is impossible in UTF-32: a UTF-32 string is never 
shorter than its translation to UTF-8.  A codepoint in UTF-8 is at most 4 bytes.


================
Comment at: llvm/lib/Support/ConvertUTFWrapper.cpp:168
+  std::vector<UTF32> ByteSwapped;
+  if (Src[0] == UNI_UTF16_BYTE_ORDER_MARK_SWAPPED) {
+    ByteSwapped.insert(ByteSwapped.end(), Src, SrcEnd);
----------------
efriedma wrote:
> MarcusJohnson91 wrote:
> > efriedma wrote:
> > > Wrong constant.
> > > 
> > > Is this really the function you want to be using from clang?  I don't 
> > > really understand why you'd want to handle byte order marks.
> > I don't really care about the BOM tbh, I just figured if I was in here, I 
> > should flesh out the UTF-32 interface.
> The BOM handling is actually actively a problem if you're planning to use the 
> interface to interpret wprintf format strings.  We don't want to byteswap 
> `L"\uFFFE%s"` or something like that.
Any thoughts on this?


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D106753/new/

https://reviews.llvm.org/D106753

_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to