On 05.06.2016 22:12, Pedro F. Giffuni wrote: > --- head/lib/libc/regex/regcomp.c Sun Jun 5 18:16:33 2016 > (r301460) > +++ head/lib/libc/regex/regcomp.c Sun Jun 5 19:12:52 2016 > (r301461) > @@ -821,10 +821,10 @@ p_b_term(struct parse *p, cset *cs) > (void)REQUIRE((uch)start <= (uch)finish, > REG_ERANGE); > CHaddrange(p, cs, start, finish); > } else { > - (void)REQUIRE(__collate_range_cmp(table, start, > finish) <= 0, REG_ERANGE); > + (void)REQUIRE(__wcollate_range_cmp(table, > start, finish) <= 0, REG_ERANGE); > for (i = 0; i <= UCHAR_MAX; i++) { > - if ( __collate_range_cmp(table, > start, i) <= 0 > - && __collate_range_cmp(table, i, > finish) <= 0 > + if ( __wcollate_range_cmp(table, > start, i) <= 0 > + && __wcollate_range_cmp(table, i, > finish) <= 0 > ) > CHadd(p, cs, i); > } >
As I already mention in PR, we have broken regcomp after someone adds wchar_t support there. Now regcomp ranges works only for the first 256 wchars of the current locale, notice that loop upper limit: for (i = 0; i <= UCHAR_MAX; i++) { In general, ranges are either broken in regcomp now or are memory eating. We have bitmask only for the first 256 wchars, all other added to the range literally. Imagine what happens if someone specify full Unicode range in regexp. Proper fix will be adding bitmask for the whole Unicode range, and even in that case regcomp attempting to use collation in ranges will be _very_slow_ since needs to check all Unicode chars in its for (i = 0; i <= Max_Unicode_wchar; i++) { loop. Better stop pretending that we are able to do collation support in the ranges, since POSIX cares about its own locale only here: "In the POSIX locale, a range expression represents the set of collating elements that fall between two elements in the collation sequence, inclusive. In other locales, a range expression has unspecified behavior: strictly conforming applications shall not rely on whether the range expression is valid, or on the set of collating elements matched." Until whole Unicode range bitmask will be implemented (if ever), better stop pretending to honor collation order, we just can't do it with wchars now and do what NetBSD/OpenBSD does (using wchar_t) instead. It does not prevent memory eating on big ranges (bitmask is needed, see above), but at least fix the thing that only first 256 wchars are considered. _______________________________________________ svn-src-head@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/svn-src-head To unsubscribe, send any mail to "svn-src-head-unsubscr...@freebsd.org"