В Tue, 17 Dec 2024 20:26:01 +0100
David Cortes <david.cortes.riv...@gmail.com> пишет:

> I am seeing a curious error in an ASAN package check which is not
> reproducible in the r-debug containers
> (https://github.com/wch/r-debug), and which I'm suspecting might be a
> compiler bug.

r-debug differs from the gcc-ASAN special check in at least the
compiler version. The log at [1] says it's running with GCC 14.2.0,
while docker.io/wch1/r-debug uses GCC 12.3.0. Additionally, LTO was
recently enabled for R but not the packages [2].

The log says that the std::regex("\"") constructor somehow manages to
read a byte past the end (after the 0-terminator) of its C-style string
argument. While I wasn't able to reproduce it even after starting
with docker.io/rocker/drd and rebuilding R according to [2], with GCC
14 and LTO for R but not packages, the following much simpler example
does exhibit the same behaviour:

#include <iostream>
#include <regex>
int main() {
 std::string s{" gjdshlkhj \" lsjkhkljh "};
 const char * rx = "\"";
 std::cout
  << std::regex_replace(s, std::regex(rx), "\\\"") // <-- line 7
  << std::endl;
 // the code below is required for the problem to happen above!
 for (int i = 0; i < 100; ++i) volatile std::regex rxx(rx);
}

g++-14 -flto=10 -o foo -g -O2 -mtune=native \
-fsanitize=address,undefined,bounds-strict foo.cpp && ./foo

==648==ERROR: AddressSanitizer: global-buffer-overflow on address 
0x556ed780fa02 at pc 0x556ed7731520 bp 0x7fff41781420 sp 0x7fff41781410
READ of size 1 at 0x556ed780fa02 thread T0
    #0 0x556ed773151f in std::__detail::_Scanner<char>::_M_scan_normal() 
/usr/include/c++/14/bits/regex_scanner.tcc:98
    #1 0x556ed773151f in std::__detail::_Scanner<char>::_M_advance() 
/usr/include/c++/14/bits/regex_scanner.tcc:79
    #2 0x556ed7734416 in 
std::__detail::_Compiler<std::__cxx11::regex_traits<char> 
>::_M_match_token(std::__detail::_ScannerBase::_TokenT) 
/usr/include/c++/14/bits/regex_compiler.tcc:575
    #3 0x556ed7748374 in 
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_atom() 
/usr/include/c++/14/bits/regex_compiler.tcc:310
    #4 0x556ed7748374 in 
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_term() 
/usr/include/c++/14/bits/regex_compiler.tcc:133
    #5 0x556ed7748374 in 
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() 
/usr/include/c++/14/bits/regex_compiler.tcc:115
    #6 0x556ed7747428 in 
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_alternative() 
/usr/include/c++/14/bits/regex_compiler.tcc:118
    #7 0x556ed7753285 in 
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_M_disjunction() 
/usr/include/c++/14/bits/regex_compiler.tcc:91
    #8 0x556ed77df36e in 
std::__detail::_Compiler<std::__cxx11::regex_traits<char> >::_Compiler(char 
const*, char const*, std::locale const&, 
std::regex_constants::syntax_option_type) 
/usr/include/c++/14/bits/regex_compiler.tcc:76
    #9 0x556ed77df36e in std::__cxx11::basic_regex<char, 
std::__cxx11::regex_traits<char> >::_M_compile(char const*, char const*, 
std::regex_constants::syntax_option_type) [clone .constprop.0] 
/usr/include/c++/14/bits/regex.h:809
    #10 0x556ed771b8cf in std::__cxx11::basic_regex<char, 
std::__cxx11::regex_traits<char> >::basic_regex(char const*, 
std::regex_constants::syntax_option_type) /usr/include/c++/14/bits/regex.h:473
    #11 0x556ed771b8cf in main foo.cpp:7 // <-- see line 7 above

0x556ed780fa02 is located 0 bytes after global variable '*.LC45' defined in 
'./foo.ltrans3.ltrans' (0x556ed780fa00) of size 2
  '*.LC45' is ascii string '"'

Disabling LTO or removing that loop that constructs additional regexps
makes the error vanish.

At the time of the error, members _M_current and _M_end (which should
point at the current part of the string and past the end of the same
string, respectively) point at completely different strings with the
same content:

(gdb) p _M_current-3
$12 = 0x55ce00bb4a00 "\""
(gdb) p _M_end-1
$13 = 0x55ce00b956a0 "\""
(gdb) p _M_end - _M_current
$14 = -127842
(gdb) p _M_current-4
$15 = 0x55ce00bb49ff ""
(gdb) p _M_end-2
$16 = 0x55ce00b9569f ""

Did the gcc-ASAN check enable LTO for packages too, not only R itself?

For a quick workaround, I can only recommend writing your own
replace_all() function using the std::string::find and
std::string::replace methods in a loop. Thankfully, you replace plain
strings, not complicated regular expressions.

-- 
Best regards,
Ivan

[1]
https://www.stats.ox.ac.uk/pub/bdr/memtests/gcc-ASAN/isotree/00check.log

[2]
https://www.stats.ox.ac.uk/pub/bdr/memtests/README.txt

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to