On 25/10/10 01:26, Pádraig Brady wrote:
> I noticed large realloc()s still being done by fread_file() for the
> common case of a regular file.  As well as being inefficient,
> it could needlessly fail for large files.
> 
> $ truncate -s1M t.f
> $ ltrace ./a.out < t.f
> malloc(1048577)                                  = 0xb77c0008
> fread(0xb77c0008, 1, 1048576, 0x7b7420)          = 1048576
> realloc(0xb77c0008, 1572865)                     = 0xb763f008
> fread(0xb773f008, 1, 524288, 0x7b7420)           = 0
> realloc(0xb763f008, 1048577)                     = 0xb763f008

> On a related note I was wondering if we should fall back
> to increasing the buffer by +=BUFSIZ rather than *=2
> when we get ENOMEM? If we do then we'd need to be more careful
> with the overflow checking with lines like the following:
> 
> if (size + BUFSIZ + 1 > alloc)

The above overflow issue might trigger with a large
file that subsequently increases in size.
I noticed a couple of other issues with the code so I
simplified/rearranged it further as per the attached.

In summary the changes are:

* Don't do 2 unnecessary reallocs for regular files
* Keep allocation to multiples of BUFSIZ which may be more efficient
* Allow allocating up to SIZE_MAX for streams, rather than about SIZE_MAX - 
SIZE_MAX/5
* Fix possible overflow causing invalid operation
* Explicitly check for overflow rather than using a roll over variable

For my own reference, I notice that even though
we now do a single fread() for a regular file,
we still get 3 read syscalls. I.E.

  fread(0xb77e9008, 1, 512002, 0xd52440)  = 512001

results in:

  read(0, "\0"..., 512000) = 512000
  read(0, "\0", 4096)                     = 1
  read(0, "", 4096)                       = 0

cheers,
Pádraig.
>From d45e034e8ee0a456adeb7e9382fb3a059f0c7fcf Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?P=C3=A1draig=20Brady?= <p...@draigbrady.com>
Date: Mon, 13 Dec 2010 08:08:23 +0000
Subject: [PATCH] read-file: reorganize to avoid various issues

* lib/read-file.c (fread_file): Read 1 more byte than is
currently in a regular file, to immediately detect EOF,
and thus avoid any realloc()s.
Allocate up to SIZE_MAX for streams, rather than limiting
to about SIZE_MAX - SIZE_MAX/5.
Don't use the 'size + BUFSIZ + 1' expression, which
could overflow and cause invalid operation.
Keep allocation to multiples of BUFSIZ, which is simpler
and may be more efficient.
As a style decision, explicitly check for overflow rather
than using a temporary roll over variable (new_alloc).
---
 lib/read-file.c |   75 +++++++++++++++++++++++++-----------------------------
 1 files changed, 35 insertions(+), 40 deletions(-)

diff --git a/lib/read-file.c b/lib/read-file.c
index 0a15c5a..ba3f172 100644
--- a/lib/read-file.c
+++ b/lib/read-file.c
@@ -39,12 +39,12 @@
    and set *LENGTH to the length of the string.  The string is
    zero-terminated, but the terminating zero byte is not counted in
    *LENGTH.  On errors, *LENGTH is undefined, errno preserves the
-   values set by system functions (if any), and NULL is returned. */
+   values set by system functions (if any), and NULL is returned.  */
 char *
 fread_file (FILE * stream, size_t * length)
 {
   char *buf = NULL;
-  size_t alloc = 0;
+  size_t alloc = BUFSIZ;
 
   /* For a regular file, allocate a buffer that has exactly the right
      size.  This avoids the need to do dynamic reallocations later.  */
@@ -59,59 +59,31 @@ fread_file (FILE * stream, size_t * length)
           {
             off_t alloc_off = st.st_size - pos;
 
-            if (SIZE_MAX <= alloc_off)
+            /* '1' below, accounts for the trailing NUL.  */
+            if (SIZE_MAX - 1 < alloc_off)
               {
                 errno = ENOMEM;
                 return NULL;
               }
 
             alloc = alloc_off + 1;
-
-            buf = malloc (alloc);
-            if (!buf)
-              /* errno is ENOMEM.  */
-              return NULL;
           }
       }
   }
 
+  if (!(buf = malloc (alloc)))
+    return NULL; /* errno is ENOMEM.  */
+
   {
     size_t size = 0; /* number of bytes read so far */
     int save_errno;
 
     for (;;)
       {
-        size_t count;
-        size_t requested;
-
-        if (size + BUFSIZ + 1 > alloc)
-          {
-            char *new_buf;
-            size_t new_alloc = alloc + alloc / 2;
-
-            /* Check against overflow.  */
-            if (new_alloc < alloc)
-              {
-                save_errno = ENOMEM;
-                break;
-              }
-
-            alloc = new_alloc;
-            if (alloc < size + BUFSIZ + 1)
-              alloc = size + BUFSIZ + 1;
-
-            new_buf = realloc (buf, alloc);
-            if (!new_buf)
-              {
-                save_errno = errno;
-                break;
-              }
-
-            buf = new_buf;
-          }
-
-        requested = alloc - size - 1;
-        count = fread (buf + size, 1, requested, stream);
+        /* This reads 1 more than the size of a regular file
+           so that we get eof immediately.  */
+        size_t requested = alloc - size;
+        size_t count = fread (buf + size, 1, requested, stream);
         size += count;
 
         if (count != requested)
@@ -121,7 +93,7 @@ fread_file (FILE * stream, size_t * length)
               break;
 
             /* Shrink the allocated memory if possible.  */
-            if (size + 1 < alloc)
+            if (size < alloc - 1)
               {
                 char *smaller_buf = realloc (buf, size + 1);
                 if (smaller_buf != NULL)
@@ -132,6 +104,29 @@ fread_file (FILE * stream, size_t * length)
             *length = size;
             return buf;
           }
+
+        {
+          char *new_buf;
+
+          if (alloc == SIZE_MAX)
+            {
+              save_errno = ENOMEM;
+              break;
+            }
+
+          if (alloc < SIZE_MAX - alloc / 2)
+            alloc = alloc + alloc / 2;
+          else
+            alloc = SIZE_MAX;
+
+          if (!(new_buf = realloc (buf, alloc)))
+            {
+              save_errno = errno;
+              break;
+            }
+
+          buf = new_buf;
+        }
       }
 
     free (buf);
-- 
1.7.3.2

Reply via email to