package libc6 retitle 254314 regexec(): segfaults in UTF-8 locales under some circumstances thanks
Hi, the attached C program reproduces the bug in an UTF-8 locale. These are the circumstances under which the bug seems to get triggered: - the locale must be UTF-8. - an invalid utf-8 string must be given as input. - the regular expression must be compiled with the REG_ICASE flag. - the regular expression must contain a range, e.g. "[0-1]". Oddly (for me), "[a-b]" doesn't trigger the bug, but "[a-b]+" does. I guess is just a matter of forcing a call to find_collation_sequence_value(). If this is "expected behavior" and callers are expected to give always *valid* input data, please reassign back to mutt. cheers, -- Adeodato Simó EM: asp16 [ykwim] alu.ua.es | PK: DA6AE621 Arguing with an engineer is like wrestling with a pig in mud: after a while, you realize the pig is enjoying it.
/* * Little C program to reproduce Debian bug #254314. * Tested with LANG=es_ES.UTF-8 and libc6 version 2.3.2.ds1-13. * * Adeodato Simó <[EMAIL PROTECTED]>, 2004-06-15, public domain. */ #include <stdio.h> #include <regex.h> #include <locale.h> int main (void) { regex_t preg; char *r = "[0-1]"; /* also [a-b]+ */ char *s = "\xc3\xa1\xc3"; /* Invalid UTF-8 string! */ setlocale(LC_ALL, ""); if (regcomp(&preg, r, 0) == 0) /* ! REG_ICASE */ { regexec(&preg, s, 0, NULL, 0); printf("%s\n", "case-sensitive: successful"); } if (regcomp(&preg, r, REG_ICASE) == 0) { regexec(&preg, s, 0, NULL, 0); printf("%s\n", "case-insensitive: successful"); /* not reached for UTF-8 locales */ } return 0; }