On 25/10/10 01:26, Pádraig Brady wrote: > I noticed large realloc()s still being done by fread_file() for the > common case of a regular file. As well as being inefficient, > it could needlessly fail for large files. > > $ truncate -s1M t.f > $ ltrace ./a.out < t.f > malloc(1048577) = 0xb77c0008 > fread(0xb77c0008, 1, 1048576, 0x7b7420) = 1048576 > realloc(0xb77c0008, 1572865) = 0xb763f008 > fread(0xb773f008, 1, 524288, 0x7b7420) = 0 > realloc(0xb763f008, 1048577) = 0xb763f008
> On a related note I was wondering if we should fall back > to increasing the buffer by +=BUFSIZ rather than *=2 > when we get ENOMEM? If we do then we'd need to be more careful > with the overflow checking with lines like the following: > > if (size + BUFSIZ + 1 > alloc) The above overflow issue might trigger with a large file that subsequently increases in size. I noticed a couple of other issues with the code so I simplified/rearranged it further as per the attached. In summary the changes are: * Don't do 2 unnecessary reallocs for regular files * Keep allocation to multiples of BUFSIZ which may be more efficient * Allow allocating up to SIZE_MAX for streams, rather than about SIZE_MAX - SIZE_MAX/5 * Fix possible overflow causing invalid operation * Explicitly check for overflow rather than using a roll over variable For my own reference, I notice that even though we now do a single fread() for a regular file, we still get 3 read syscalls. I.E. fread(0xb77e9008, 1, 512002, 0xd52440) = 512001 results in: read(0, "\0"..., 512000) = 512000 read(0, "\0", 4096) = 1 read(0, "", 4096) = 0 cheers, Pádraig.
>From d45e034e8ee0a456adeb7e9382fb3a059f0c7fcf Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com> Date: Mon, 13 Dec 2010 08:08:23 +0000 Subject: [PATCH] read-file: reorganize to avoid various issues * lib/read-file.c (fread_file): Read 1 more byte than is currently in a regular file, to immediately detect EOF, and thus avoid any realloc()s. Allocate up to SIZE_MAX for streams, rather than limiting to about SIZE_MAX - SIZE_MAX/5. Don't use the 'size + BUFSIZ + 1' expression, which could overflow and cause invalid operation. Keep allocation to multiples of BUFSIZ, which is simpler and may be more efficient. As a style decision, explicitly check for overflow rather than using a temporary roll over variable (new_alloc). --- lib/read-file.c | 75 +++++++++++++++++++++++++----------------------------- 1 files changed, 35 insertions(+), 40 deletions(-) diff --git a/lib/read-file.c b/lib/read-file.c index 0a15c5a..ba3f172 100644 --- a/lib/read-file.c +++ b/lib/read-file.c @@ -39,12 +39,12 @@ and set *LENGTH to the length of the string. The string is zero-terminated, but the terminating zero byte is not counted in *LENGTH. On errors, *LENGTH is undefined, errno preserves the - values set by system functions (if any), and NULL is returned. */ + values set by system functions (if any), and NULL is returned. */ char * fread_file (FILE * stream, size_t * length) { char *buf = NULL; - size_t alloc = 0; + size_t alloc = BUFSIZ; /* For a regular file, allocate a buffer that has exactly the right size. This avoids the need to do dynamic reallocations later. */ @@ -59,59 +59,31 @@ fread_file (FILE * stream, size_t * length) { off_t alloc_off = st.st_size - pos; - if (SIZE_MAX <= alloc_off) + /* '1' below, accounts for the trailing NUL. */ + if (SIZE_MAX - 1 < alloc_off) { errno = ENOMEM; return NULL; } alloc = alloc_off + 1; - - buf = malloc (alloc); - if (!buf) - /* errno is ENOMEM. */ - return NULL; } } } + if (!(buf = malloc (alloc))) + return NULL; /* errno is ENOMEM. */ + { size_t size = 0; /* number of bytes read so far */ int save_errno; for (;;) { - size_t count; - size_t requested; - - if (size + BUFSIZ + 1 > alloc) - { - char *new_buf; - size_t new_alloc = alloc + alloc / 2; - - /* Check against overflow. */ - if (new_alloc < alloc) - { - save_errno = ENOMEM; - break; - } - - alloc = new_alloc; - if (alloc < size + BUFSIZ + 1) - alloc = size + BUFSIZ + 1; - - new_buf = realloc (buf, alloc); - if (!new_buf) - { - save_errno = errno; - break; - } - - buf = new_buf; - } - - requested = alloc - size - 1; - count = fread (buf + size, 1, requested, stream); + /* This reads 1 more than the size of a regular file + so that we get eof immediately. */ + size_t requested = alloc - size; + size_t count = fread (buf + size, 1, requested, stream); size += count; if (count != requested) @@ -121,7 +93,7 @@ fread_file (FILE * stream, size_t * length) break; /* Shrink the allocated memory if possible. */ - if (size + 1 < alloc) + if (size < alloc - 1) { char *smaller_buf = realloc (buf, size + 1); if (smaller_buf != NULL) @@ -132,6 +104,29 @@ fread_file (FILE * stream, size_t * length) *length = size; return buf; } + + { + char *new_buf; + + if (alloc == SIZE_MAX) + { + save_errno = ENOMEM; + break; + } + + if (alloc < SIZE_MAX - alloc / 2) + alloc = alloc + alloc / 2; + else + alloc = SIZE_MAX; + + if (!(new_buf = realloc (buf, alloc))) + { + save_errno = errno; + break; + } + + buf = new_buf; + } } free (buf); -- 1.7.3.2