Issue |
138526
|
Summary |
[clang] Warning for comparing `char8_t` to `char32_t`
|
Labels |
clang
|
Assignees |
|
Reporter |
Eisenwave
|
Consider the following code:
```cpp
bool contains_oe(std::u8string_view str) {
for (char8_t c : str)
if (c == U'ö') // comparison always fails
return true;
return false;
}
```
If `str` is a correctly encoded UTF-8 string, the comparison always fails because no UTF-8 code unit can be `0x6F`, and `ö` is U+00F6. Comparing `charN_t` with different `N` is virtually always a bug, or could have just as well been written using a different type of literal. Comparing these types is not going to give meaningful results except for U+007F and below, and even then, it's unclear why you wouldn't use the proper type.
I've floated the idea of deprecating this behavior in the C++ standard in a number of places, and it was received positively. StackOverflow users also suggested getting rid of it here: https://stackoverflow.com/q/79604433/5740428
**In the meantime, it would be useful to have a warning when `charN_t` is converted to a different Unicode character type.** This warning should be triggered for any implicit conversion, not just as part of a comparison because the same bug can be produced like:
```cpp
bool contains_char(std::u8string_view str, char8_t c);
// ...
contains_char(U'ö');
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs