Hello, On 13/08/18 03:51 PM, Assaf Gordon wrote:
I suspect there is an uninitialized memory access deep inside regex_internal.c under very particular circumstances.
(continuation of https://lists.gnu.org/r/bug-gnulib/2018-08/msg00071.html ) I've pin-pointed the change that causes the segfault, and this likely also affect glibc. 1. The input regex contains multibyte character with different uppper/lower case representation. 2. The input regex also contains a NUL character. 3. In regex_internal.c function build_wcs_upper_buffer(), the code was changed like so: - if (BE ((size_t) (mbclen + 2) > 2, 1)) + if (BE (mbclen < (size_t) -2, 1)) And this changed code subtly treats case of "mbclen==0" differently, which eventually leads to incorrect code flow, and then to a crash. In gnulib, this was changed long ago: === https://git.savannah.gnu.org/cgit/gnulib.git/commit/?id=8335a4d6 commit 8335a4d6c7b4448cd0bcb6d0bebf1d456bcfdb17 Date: Mon Apr 10 06:43:33 2006 +0000 Merge regex changes from libc, removing some of our POSIX-conformance changes that were rejected and redoing them in a less-intrusive way. === And recently it was ported back to glibc: === https://sourceware.org/git/?p=glibc.git;a=commit;h=eb04c213 commit eb04c21373e2a2885f3d52ff192b0499afe3c672 Date: Wed Dec 20 09:47:44 2017 -0200 posix: Sync gnulib regex implementation === To reproduce (using gnulib's code), try the following: git clone git://git.sv.gnu.org/sed.git cd sed ./bootstrap This patch adds the old code vs new code with "#ifdef REGEX_FIX" patch -p1 < regex-internal-bug.patch ./configure --with-included-regex CFLAGS="-O0 -g" make printf "/\xe1\xbe\xbe\x5c\x00/I" > 1.sed This will segfault: ./sed/sed -f 1.sed < /dev/null Rebuild with the old code, will not segfault rm lib/regex.o ; make CFLAGS="-DREGEX_FIX" ./sed/sed -f 1.sed < /dev/null ==== Perhaps it is sufficient to just revert these two lines - but I'm not sure if there will be other side effects. Comments welcomed, - assaf
--- gnulib/lib/regex_internal.c 2018-08-24 17:16:59.161610807 -0600 +++ lib/regex_internal.c 2018-08-24 17:08:07.985496439 -0600 @@ -317,7 +317,11 @@ mbclen = __mbrtowc (&wc, ((const char *) pstr->raw_mbs + pstr->raw_mbs_idx + byte_idx), remain_len, &pstr->cur_state); +#ifdef REGEX_FIX + if (BE (mbclen + 2 > 2, 1)) +#else if (BE (mbclen < (size_t) -2, 1)) +#endif { wchar_t wcu = __towupper (wc); if (wcu != wc) @@ -386,7 +390,11 @@ else p = (const char *) pstr->raw_mbs + pstr->raw_mbs_idx + src_idx; mbclen = __mbrtowc (&wc, p, remain_len, &pstr->cur_state); +#ifdef REGEX_FIX + if (BE (mbclen + 2 > 2, 1)) +#else if (BE (mbclen < (size_t) -2, 1)) +#endif { wchar_t wcu = __towupper (wc); if (wcu != wc) @@ -409,6 +417,7 @@ if (pstr->offsets == NULL) { pstr->offsets = re_malloc (Idx, pstr->bufs_len); + memset (pstr->offsets, 0xBC, sizeof(Idx)*pstr->bufs_len); if (pstr->offsets == NULL) return REG_ESPACE;