Hi Pádraig, > I was testing multi-byte enhancements to coreutils/join with: > LC_ALL=en_US.utf8 join -i <(env printf '1\x00\n') <(env printf '2\x00\n') > and noticed it spun the CPU because of the NUL char. > The attached fixes the issue for me. This function is supposed > to skip over NUL chars like this right?
Right. > --- a/lib/mbmemcasecoll.c 2010-01-04 01:10:30.000000000 +0000 > +++ b/lib/mbmemcasecoll.c 2010-09-21 23:52:23.000000000 +0000 > @@ -61,6 +63,8 @@ > break; > if (n1 != (size_t)(-1)) > { > + if (n1 == 0) > + n1 = 1; /* copy NUL characters. */ > wint_t wc2 = towlower (wc1); > > if (wc2 != wc1) The patch is right, except that we stick with C89 syntax in gnulib: no statements before declarations. I'm adding the fix and a test case: 2010-09-22 Pádraig Brady <p...@draigbrady.com> Bruno Haible <br...@clisp.org> Fix endless loop in mbmemcasecoll. * lib/mbmemcasecoll.c (apply_towlower): When mbrtowc returns 0, copy 1 byte. * tests/test-mbmemcasecmp.h (test_ascii): Test embedded NULs. --- lib/mbmemcasecoll.c.orig Wed Sep 22 13:31:27 2010 +++ lib/mbmemcasecoll.c Wed Sep 22 13:30:00 2010 @@ -1,5 +1,5 @@ /* Locale-specific case-ignoring memory comparison. - Copyright (C) 2001, 2009, 2010 Free Software Foundation, Inc. + Copyright (C) 2001, 2009-2010 Free Software Foundation, Inc. Written by Bruno Haible <br...@clisp.org>, 2001. This program is free software: you can redistribute it and/or modify it @@ -61,8 +61,12 @@ break; if (n1 != (size_t)(-1)) { - wint_t wc2 = towlower (wc1); + wint_t wc2; + if (n1 == 0) /* NUL character? */ + n1 = 1; + + wc2 = towlower (wc1); if (wc2 != wc1) { size_t n2; --- tests/test-mbmemcasecmp.h.orig Wed Sep 22 13:31:27 2010 +++ tests/test-mbmemcasecmp.h Wed Sep 22 13:21:32 2010 @@ -62,6 +62,12 @@ ASSERT (my_casecmp ("para", 4, "paragraph", 9) < 0); ASSERT (my_casecmp ("paragraph", 9, "para", 4) > 0); + + /* Embedded NULs. */ + ASSERT (my_casecmp ("1\0", 2, "2\0", 2) < 0); + ASSERT (my_casecmp ("2\0", 2, "1\0", 2) > 0); + ASSERT (my_casecmp ("x\0""1", 3, "x\0""2", 3) < 0); + ASSERT (my_casecmp ("x\0""2", 3, "x\0""1", 3) > 0); } static void