https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98677

            Bug ID: 98677
           Summary: std::regex constructor triggers valgrind under clang++
                    with undefined sanitizer; possible use-after-move
           Product: gcc
           Version: unknown
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libstdc++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: egor_suvorov at mail dot ru
  Target Milestone: ---

Consider the following code:

#include <regex>
int main() {
    std::regex regex("x{2,}");
}

If I compile and run it at Ubuntu 20.04 with

clang++-10 -fsanitize=undefined -O2 -g a.cpp && valgrind ./a.out 

I get the following error:

==2367== Memcheck, a memory error detector
==2367== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==2367== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==2367== Command: ./a.out
==2367== 
==2367== Conditional jump or move depends on uninitialised value(s)
==2367==    at 0x45AC3C:
std::__detail::_StateSeq<std::__cxx11::regex_traits<char> >::_M_clone()
(regex_automaton.tcc:208)
==2367==    by 0x4341EA:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_quantifier()
(regex_compiler.tcc:253)
==2367==    by 0x432F67:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term()
(regex_compiler.tcc:143)
==2367==    by 0x432B9A:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()
(regex_compiler.tcc:123)
==2367==    by 0x427E00:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction()
(regex_compiler.tcc:99)
==2367==    by 0x42747E:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char
const*, char const*, std::locale const&,
std::regex_constants::syntax_option_type) (regex_compiler.tcc:84)
==2367==    by 0x427149: __compile_nfa<std::__cxx11::regex_traits<char>, const
char *> (regex_compiler.h:183)
==2367==    by 0x427149: std::__cxx11::basic_regex<char,
std::__cxx11::regex_traits<char> >::basic_regex<char const*>(char const*, char
const*, std::locale, std::regex_constants::syntax_option_type) (regex.h:763)
==2367==    by 0x427025: basic_regex<const char *> (regex.h:507)
==2367==    by 0x427025: basic_regex (regex.h:440)
==2367==    by 0x427025: main (a.cpp:3)
==2367== 
==2367== Conditional jump or move depends on uninitialised value(s)
==2367==    at 0x45AC3C:
std::__detail::_StateSeq<std::__cxx11::regex_traits<char> >::_M_clone()
(regex_automaton.tcc:208)
==2367==    by 0x434218:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_quantifier()
(regex_compiler.tcc:257)
==2367==    by 0x432F67:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term()
(regex_compiler.tcc:143)
==2367==    by 0x432B9A:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative()
(regex_compiler.tcc:123)
==2367==    by 0x427E00:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction()
(regex_compiler.tcc:99)
==2367==    by 0x42747E:
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char
const*, char const*, std::locale const&,
std::regex_constants::syntax_option_type) (regex_compiler.tcc:84)
==2367==    by 0x427149: __compile_nfa<std::__cxx11::regex_traits<char>, const
char *> (regex_compiler.h:183)
==2367==    by 0x427149: std::__cxx11::basic_regex<char,
std::__cxx11::regex_traits<char> >::basic_regex<char const*>(char const*, char
const*, std::locale, std::regex_constants::syntax_option_type) (regex.h:763)
==2367==    by 0x427025: basic_regex<const char *> (regex.h:507)
==2367==    by 0x427025: basic_regex (regex.h:440)
==2367==    by 0x427025: main (a.cpp:3)
==2367== 
==2367== 
==2367== HEAP SUMMARY:
==2367==     in use at exit: 0 bytes in 0 blocks
==2367==   total heap usage: 20 allocs, 20 frees, 76,776 bytes allocated
==2367== 
==2367== All heap blocks were freed -- no leaks are possible
==2367== 
==2367== Use --track-origins=yes to see where uninitialised values come from
==2367== For lists of detected and suppressed errors, rerun with: -s
==2367== ERROR SUMMARY: 3 errors from 2 contexts (suppressed: 0 from 0)

Any of the following actions remove the error: replacing clang++ with g++,
disabling -fsanitize=undefined, disabling -O2, switching to -stdlib=libc++.

Versions are:

clang version 10.0.0-4ubuntu1 
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin

valgrind-3.15.0

libstdc++-10-dev/focal-updates,focal-security,now 10.2.0-5ubuntu1~20.04 amd64
[installed,automatic]

A friend of mine suggested that it's probably caused by use-after-move of
`__dup` in regex_automaton.tcc:206 (commit
e45c41988bfd655b1df7cff8fcf111dc6fb732e3 at GitHub mirror) and vaguely
suggested that maybe clang++ starts to implement some kind of destructive
moves:

          auto __id = _M_nfa._M_insert_state(std::move(__dup));
          __m[__u] = __id;
          if (__dup._M_has_alt())

Reply via email to