Thanks for tracking this bug down. I introduced the bug in 2006 when I noticed
that the expression '(size_t) (mbclen + 2) > 2' can have undefined behavior on
(admittedly unlikely) platforms where size_t is one bit narrower than int. (Such
platforms have existed in the past - I even worked for a company that sold them!
- though these days I expect they're rarely used.) I replaced the expression
with 'mbclen < (size_t) -2' to avoid undefined behavior, but unfortunately my
replacement was incorrect as it is not equivalent when mbclen == 0.
Please try the attached gnulib patch, which should fix the problem in a portable
way. Modern GCC optimizes the clear code just as well as the confusing code, so
we might as well write it clearly.
>From 17542682f92da94550e275a58316c9ad96724374 Mon Sep 17 00:00:00 2001
From: Paul Eggert <egg...@cs.ucla.edu>
Date: Sat, 25 Aug 2018 00:35:05 -0700
Subject: [PATCH] regex: fix uninitialized memory access
Problem and draft fix reported by Assaf Gordon here:
https://lists.gnu.org/r/bug-gnulib/2018-08/msg00071.html
https://lists.gnu.org/r/bug-gnulib/2018-08/msg00142.html
I introduced this bug into gnulib in commit
8335a4d6c7b4448cd0bcb6d0bebf1d456bcfdb17 dated 2006-04-10.
* lib/regex_internal.c (build_wcs_upper_buffer):
Fix bug when mbrtowc returns 0.
---
ChangeLog | 11 +++++++++++
lib/regex_internal.c | 4 ++--
2 files changed, 13 insertions(+), 2 deletions(-)
diff --git a/ChangeLog b/ChangeLog
index acd3e2a05..da711a89d 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,14 @@
+2018-08-25 Paul Eggert <egg...@cs.ucla.edu>
+
+ regex: fix uninitialized memory access
+ Problem and draft fix reported by Assaf Gordon here:
+ https://lists.gnu.org/r/bug-gnulib/2018-08/msg00071.html
+ https://lists.gnu.org/r/bug-gnulib/2018-08/msg00142.html
+ I introduced this bug into gnulib in commit
+ 8335a4d6c7b4448cd0bcb6d0bebf1d456bcfdb17 dated 2006-04-10.
+ * lib/regex_internal.c (build_wcs_upper_buffer):
+ Fix bug when mbrtowc returns 0.
+
2018-08-23 Bruno Haible <br...@clisp.org>
getcwd: Add cross-compilation guesses.
diff --git a/lib/regex_internal.c b/lib/regex_internal.c
index 7f0083b91..b10588f1c 100644
--- a/lib/regex_internal.c
+++ b/lib/regex_internal.c
@@ -317,7 +317,7 @@ build_wcs_upper_buffer (re_string_t *pstr)
mbclen = __mbrtowc (&wc,
((const char *) pstr->raw_mbs + pstr->raw_mbs_idx
+ byte_idx), remain_len, &pstr->cur_state);
- if (BE (mbclen < (size_t) -2, 1))
+ if (BE (0 < mbclen && mbclen < (size_t) -2, 1))
{
wchar_t wcu = __towupper (wc);
if (wcu != wc)
@@ -386,7 +386,7 @@ build_wcs_upper_buffer (re_string_t *pstr)
else
p = (const char *) pstr->raw_mbs + pstr->raw_mbs_idx + src_idx;
mbclen = __mbrtowc (&wc, p, remain_len, &pstr->cur_state);
- if (BE (mbclen < (size_t) -2, 1))
+ if (BE (0 < mbclen && mbclen < (size_t) -2, 1))
{
wchar_t wcu = __towupper (wc);
if (wcu != wc)
--
2.17.1