https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83601
Bug ID: 83601 Summary: std::regex_replace C++14 conformance issue: escaping in SED mode Product: gcc Version: 8.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libstdc++ Assignee: unassigned at gcc dot gnu.org Reporter: andrey.y.guskov at intel dot com Target Milestone: --- C++14 standard (page 1107, see here: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4296.pdf#1121), 28.5.2 [Bitmask type regex_constants::match_flag_type]: ... format_sed When a regular expression match is to be replaced by a new string, the new string shall be constructed using the rules used by the sed utility in POSIX. ... The rules which SED uses are documented in IEEE 1003.1 (p. 3221): An <ampersand> ('&') appearing in the replacement shall be replaced by the string matching the BRE. The special meaning of '&' in this context can be suppressed by preceding it by a <backslash>. The characters "\n", where n is a digit, shall be replaced by the text matched by the corresponding back-reference expression. ... The special meaning of "\n" where n is a digit in this context, can be suppressed by preceding it by a <backslash>. The current implementation of std::regex_replace does not comply to the standard: special meanings of &, \0, \2 cannot be suppressed by escaping them with backslashes. Reproducer: #include <regex> int frep(const wchar_t *istr, const wchar_t *rstr, const wchar_t *ostr) { std::basic_regex<wchar_t> wrgx(L"(a*)(b+)"); std::basic_string<wchar_t> wstr = istr, wret = ostr, test; std::regex_replace(std::back_inserter(test), wstr.begin(), wstr.end(), wrgx, std::basic_string<wchar_t>(rstr), std::regex_constants::format_sed); return !printf("'%ls' %c= '%ls'\n", test.c_str(), (test == wret)? '=' : '!', wret.c_str()); } int main() { frep(L"xbbyabz", L"!\\\\2!", L"x!\\2!y!\\2!z"); frep(L"xbbyabz", L"!\\\\0!", L"x!\\0!y!\\0!z"); return frep(L"xbbyabz", L"!\\&!", L"x!&!y!&!z"); }