Thank you, that was very helpful indeed! I've filed a bug report with GCC just in case: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118113
Best regads, David Cortes On Wed, 2024-12-18 at 17:34 +0300, Ivan Krylov wrote: > В Tue, 17 Dec 2024 20:26:01 +0100 > David Cortes <david.cortes.riv...@gmail.com> пишет: > > > I am seeing a curious error in an ASAN package check which is not > > reproducible in the r-debug containers > > (https://github.com/wch/r-debug), and which I'm suspecting might be > > a > > compiler bug. > > r-debug differs from the gcc-ASAN special check in at least the > compiler version. The log at [1] says it's running with GCC 14.2.0, > while docker.io/wch1/r-debug uses GCC 12.3.0. Additionally, LTO was > recently enabled for R but not the packages [2]. > > The log says that the std::regex("\"") constructor somehow manages to > read a byte past the end (after the 0-terminator) of its C-style > string > argument. While I wasn't able to reproduce it even after starting > with docker.io/rocker/drd and rebuilding R according to [2], with GCC > 14 and LTO for R but not packages, the following much simpler example > does exhibit the same behaviour: > > #include <iostream> > #include <regex> > int main() { > std::string s{" gjdshlkhj \" lsjkhkljh "}; > const char * rx = "\""; > std::cout > << std::regex_replace(s, std::regex(rx), "\\\"") // <-- line 7 > << std::endl; > // the code below is required for the problem to happen above! > for (int i = 0; i < 100; ++i) volatile std::regex rxx(rx); > } > > g++-14 -flto=10 -o foo -g -O2 -mtune=native \ > -fsanitize=address,undefined,bounds-strict foo.cpp && ./foo > > ==648==ERROR: AddressSanitizer: global-buffer-overflow on address > 0x556ed780fa02 at pc 0x556ed7731520 bp 0x7fff41781420 sp > 0x7fff41781410 > READ of size 1 at 0x556ed780fa02 thread T0 > #0 0x556ed773151f in > std::__detail::_Scanner<char>::_M_scan_normal() > /usr/include/c++/14/bits/regex_scanner.tcc:98 > #1 0x556ed773151f in std::__detail::_Scanner<char>::_M_advance() > /usr/include/c++/14/bits/regex_scanner.tcc:79 > #2 0x556ed7734416 in > std::__detail::_Compiler<std::__cxx11::regex_traits<char> > >::_M_match_token(std::__detail::_ScannerBase::_TokenT) > /usr/include/c++/14/bits/regex_compiler.tcc:575 > #3 0x556ed7748374 in > std::__detail::_Compiler<std::__cxx11::regex_traits<char> > >::_M_atom() /usr/include/c++/14/bits/regex_compiler.tcc:310 > #4 0x556ed7748374 in > std::__detail::_Compiler<std::__cxx11::regex_traits<char> > >::_M_term() /usr/include/c++/14/bits/regex_compiler.tcc:133 > #5 0x556ed7748374 in > std::__detail::_Compiler<std::__cxx11::regex_traits<char> > >::_M_alternative() /usr/include/c++/14/bits/regex_compiler.tcc:115 > #6 0x556ed7747428 in > std::__detail::_Compiler<std::__cxx11::regex_traits<char> > >::_M_alternative() /usr/include/c++/14/bits/regex_compiler.tcc:118 > #7 0x556ed7753285 in > std::__detail::_Compiler<std::__cxx11::regex_traits<char> > >::_M_disjunction() /usr/include/c++/14/bits/regex_compiler.tcc:91 > #8 0x556ed77df36e in > std::__detail::_Compiler<std::__cxx11::regex_traits<char> > >::_Compiler(char const*, char const*, std::locale const&, > std::regex_constants::syntax_option_type) > /usr/include/c++/14/bits/regex_compiler.tcc:76 > #9 0x556ed77df36e in std::__cxx11::basic_regex<char, > std::__cxx11::regex_traits<char> >::_M_compile(char const*, char > const*, std::regex_constants::syntax_option_type) [clone > .constprop.0] /usr/include/c++/14/bits/regex.h:809 > #10 0x556ed771b8cf in std::__cxx11::basic_regex<char, > std::__cxx11::regex_traits<char> >::basic_regex(char const*, > std::regex_constants::syntax_option_type) > /usr/include/c++/14/bits/regex.h:473 > #11 0x556ed771b8cf in main foo.cpp:7 // <-- see line 7 above > > 0x556ed780fa02 is located 0 bytes after global variable '*.LC45' > defined in './foo.ltrans3.ltrans' (0x556ed780fa00) of size 2 > '*.LC45' is ascii string '"' > > Disabling LTO or removing that loop that constructs additional > regexps > makes the error vanish. > > At the time of the error, members _M_current and _M_end (which should > point at the current part of the string and past the end of the same > string, respectively) point at completely different strings with the > same content: > > (gdb) p _M_current-3 > $12 = 0x55ce00bb4a00 "\"" > (gdb) p _M_end-1 > $13 = 0x55ce00b956a0 "\"" > (gdb) p _M_end - _M_current > $14 = -127842 > (gdb) p _M_current-4 > $15 = 0x55ce00bb49ff "" > (gdb) p _M_end-2 > $16 = 0x55ce00b9569f "" > > Did the gcc-ASAN check enable LTO for packages too, not only R > itself? > > For a quick workaround, I can only recommend writing your own > replace_all() function using the std::string::find and > std::string::replace methods in a loop. Thankfully, you replace plain > strings, not complicated regular expressions. > ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel