For background, this thread started here: http://thread.gmane.org/gmane.comp.gnu.coreutils.bugs/9581
Andi Kleen <[EMAIL PROTECTED]> wrote: >> [*] Can you even get a 2GB-long string into expr? > > At least on Linux you can't, the kernel limits arguments and environments > to 128K or so. There are some patches floating around to enlarge that > limit, but I don't think it will be ever 2GB because it would be > too risky to pin that much memory unswappable. > > I think the others can only get regexprs from command line options too. > It would only make sense if you have a option to input the regexpr > from a file, which neither expr nor nl nor tac have. > > So I guess at least on Linux this doesn't make sense. The length of the string being searched matters. More precisely, it's the offsets (into the buffer being searched) of the beginning and end of the match. I wanted to prove to myself that I could exploit the bug when using the libc regexp, so cooked up this example with nl: First, show how nl's "-b pREGEX" option works: (number only lines matching REGEXP) $ seq 5|sed s/5/:/|nl -b p: 1 2 3 4 1 : Create a sparse file, exactly 2GB long: $ dd if=/dev/null of=2gb bs=1M seek=2k count=0 choose a long marker string so the search will go faster: $ re=$(printf %01000d 0) append the marker to the file (the start-of-marker will have offset 2^31), which would overflow a 32-bit signed type: $ echo $re >> 2gb Run nl, so that it numbers the sole (2^31+-byte-long) line matching the regexp, $ nl -b p$re 2gb > k That failed on a 64-bit system with 2GB ram because linebuffer.c tried to allocate a buffer longer than 4GB (it may have been 8GB), and there wasn't enough virtual memory. linebuffer.c uses xalloc.h's x2realloc function, which was doubling the buffer size whenever necessary. Obviously, when the buffer is very large, doubling is wasteful. Here, it was fatal. I've just changed xalloc's x2nrealloc to do n = 3n/2, rather than n *= 2, and with that change, the above invocation of "nl" completed: [BTW, even with --without-included-regex, the above nl command worked fine. That's because nl doesn't use the would-have-overflowed offsets. However, it looks like ptx *does*. So, users of ptx with very long lines, beware! :-) ] Give tools a better chance to allocate space for very large buffers. * lib/xalloc.h (x2nrealloc): Use 3/2, not 2, as buffer size factor. Index: lib/xalloc.h =================================================================== RCS file: /sources/gnulib/gnulib/lib/xalloc.h,v retrieving revision 1.32 diff -u -p -r1.32 xalloc.h --- lib/xalloc.h 8 Nov 2006 00:22:30 -0000 1.32 +++ lib/xalloc.h 1 Feb 2007 23:32:57 -0000 @@ -1,7 +1,7 @@ /* xalloc.h -- malloc with out-of-memory checking Copyright (C) 1990, 1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, - 1999, 2000, 2003, 2004, 2006 Free Software Foundation, Inc. + 1999, 2000, 2003, 2004, 2006, 2007 Free Software Foundation, Inc. This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by @@ -139,10 +139,10 @@ xnrealloc (void *p, size_t n, size_t s) allocating an initial block with a nonzero size, or by allocating a larger block. - In the following implementation, nonzero sizes are doubled so that - repeated reallocations have O(N log N) overall cost rather than - O(N**2) cost, but the specification for this function does not - guarantee that sizes are doubled. + In the following implementation, nonzero sizes are increased by a + factor of approximately 1.5 so that repeated reallocations have + O(N log N) overall cost rather than O(N**2) cost, but the + specification for this function does not guarantee that rate. Here is an example of use: @@ -204,9 +204,9 @@ x2nrealloc (void *p, size_t *pn, size_t } else { - if (((size_t) -1) / 2 / s < n) + if ((2 * (((size_t) -1 - 1) / 3)) / s < n) xalloc_die (); - n *= 2; + n = n + n / 2 + 1; } *pn = n;