sammccall added a comment.

Yeah I think there must be some confusion about what this code is doing. It's 
specifically iterating over the unicode codepoints of what are supposed to be 
UTF-8-encoded input bytes.

The input turns out sometimes not to be UTF-8 (e.g. the file on disk is 
ISO-8859-1 and clang thinks it's UTF-8 and just loads the bytes). We can't give 
any sort of right answer in these cases - we don't know the actual encoding and 
we can't even always detect these cases!

What we can do is strengthen the contract: instead of UB, assert in practice, 
we can say returns some garbage value but doesn't crash.


CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D74731/new/

https://reviews.llvm.org/D74731



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

Reply via email to