Issue 138526
Summary [clang] Warning for comparing `char8_t` to `char32_t`
Labels clang
Assignees
Reporter Eisenwave
    Consider the following code:
```cpp
bool contains_oe(std::u8string_view str) {
    for (char8_t c : str)
        if (c == U'ö') // comparison always fails
            return true;
    return false;
}
```
If `str` is a correctly encoded UTF-8 string, the comparison always fails because no UTF-8 code unit can be `0x6F`, and `ö` is U+00F6. Comparing `charN_t` with different `N` is virtually always a bug, or could have just as well been written using a different type of literal. Comparing these types is not going to give meaningful results except for U+007F and below, and even then, it's unclear why you wouldn't use the proper type.

I've floated the idea of deprecating this behavior in the C++ standard in a number of places, and it was received positively. StackOverflow users also suggested getting rid of it here: https://stackoverflow.com/q/79604433/5740428

**In the meantime, it would be useful to have a warning when `charN_t` is converted to a different Unicode character type.** This warning should be triggered for any implicit conversion, not just as part of a comparison because the same bug can be produced like:
```cpp
bool contains_char(std::u8string_view str, char8_t c);
// ...
contains_char(U'ö');
```
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to