https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84110

            Bug ID: 84110
           Summary: Null character in regex
           Product: gcc
           Version: 8.0.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: abigail.buccaneer at gmail dot com
  Target Milestone: ---

The following code, when compiled with libstdc++:

    #include <regex>
    int main() {
        auto r = std::regex{"\0", std::size_t{1}};
    }

...results in std::regex_error being thrown. My reading of the ECMAScript regex
spec says that this should be allowed, and that a null byte should match a
literal null byte:

    PatternCharacter ::
        SourceCharacter but not one of
        ^ $ \ . * + ? ( ) [ ] { } |
    SourceCharacter ::
        any Unicode code unit

(Elsewhere in the ECMAScript spec, it explicitly specifies that an unrelated
grammar production is 'SourceCharacter but not one of " or \ or U+0000 through
U+001F', so it makes sense to assume that SourceCharacter here very
intentionally includes null.)

Clang/libc++ seems to agree with this reading, and successfully compiles and
runs the following:

    #include <cassert>
    #include <regex>

    int main() {
        auto null = std::string{"\0", std::size_t{1}};
        std::smatch match_results;
        assert(std::regex_match(null, match_results, std::regex{null}));
        assert(match_results.position() == 0
               && match_results.length() == 1
               && match_results[0] == null);
    }

Reply via email to