https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91030

--- Comment #20 from Thomas Koenig <tkoenig at gcc dot gnu.org> ---
(In reply to David Edelsohn from comment #18)
> For GPFS, the striping unit is 16M.  The 8K buffer size chosen by GFortran
> is a huge performance sink. We have confirmed this with testing.

Could you share some benchmarks on this?  I'd really like if the
gfortran maintainers could form their own judgment on this, based
on numbers.

Here's a benchmark program:

#include <sys/time.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <sys/statvfs.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

double walltime (void)
{
  struct timeval TV;
  double elapsed;
  gettimeofday(&TV, NULL);
  elapsed = (double) TV.tv_sec + 1.0e-6*((double) TV.tv_usec);
  return elapsed;
}

#define NAME "out.dat"
#define N 250000000

int main()
{
  int fd;
  double *p, *w;
  long i, size, blocksize, left, to_write;
  int bits;
  double t1, t2;
  struct statvfs buf;

  printf ("Test using %e doubles\n", N * 1.0);
  statvfs (".", &buf);
  printf ("Block size of file system: %ld\n", buf.f_bsize);

  p = malloc(N * sizeof (*p));
  for (i=0; i<N; i++)
    p[i] = i;

  for (bits = 10; bits < 27; bits++)
    {
      sync();
      blocksize = 1 << bits;
      printf("bs = %10ld, ", blocksize);
      unlink (NAME);
      fd = open(NAME, O_WRONLY|O_CREAT, S_IRUSR | S_IWUSR);
      if (fd < 0)
        {
          perror ("Open of " NAME " failed");
          exit(1);
        }
      left = N;
      w = p;
      t1 = walltime();
      while (left > 0)
        {
          if (left >= blocksize)
            to_write = blocksize;
          else
            to_write = left;

          write (fd, w, blocksize * sizeof (double));
          w += to_write;
          left -= to_write;
        }
      close (fd);
      t2 = walltime ();
      printf ("%.2f MiB/s\n", N / (t2-t1) / 1048576);
    }
  free (p);
  unlink (NAME);

  return 0;
}

And here is some output on my home system (ext4):

Test using 2.500000e+08 doubles
Block size of file system: 4096
bs =       1024, 175.81 MiB/s
bs =       2048, 244.40 MiB/s
bs =       4096, 247.27 MiB/s
bs =       8192, 227.46 MiB/s
bs =      16384, 195.55 MiB/s
bs =      32768, 223.14 MiB/s
bs =      65536, 168.95 MiB/s
bs =     131072, 240.70 MiB/s
bs =     262144, 260.39 MiB/s
bs =     524288, 265.38 MiB/s
bs =    1048576, 261.67 MiB/s
bs =    2097152, 259.94 MiB/s
bs =    4194304, 258.71 MiB/s
bs =    8388608, 262.19 MiB/s
bs =   16777216, 260.19 MiB/s
bs =   33554432, 263.37 MiB/s
bs =   67108864, 264.47 MiB/s

And here is something on gcc135 (POWER9), also ext4:

Test using 2.500000e+08 doubles
Block size of file system: 4096
bs =       1024, 206.76 MiB/s
bs =       2048, 293.66 MiB/s
bs =       4096, 347.13 MiB/s
bs =       8192, 298.23 MiB/s
bs =      16384, 397.51 MiB/s
bs =      32768, 401.86 MiB/s
bs =      65536, 431.83 MiB/s
bs =     131072, 475.88 MiB/s
bs =     262144, 470.09 MiB/s
bs =     524288, 478.84 MiB/s
bs =    1048576, 485.68 MiB/s
bs =    2097152, 485.33 MiB/s
bs =    4194304, 483.96 MiB/s
bs =    8388608, 482.88 MiB/s
bs =   16777216, 485.04 MiB/s
bs =   33554432, 483.92 MiB/s
bs =   67108864, 485.55 MiB/s

So, write thoughput sort of seems to level out at ~ 131072 block size,
2**17.

For Fortran, this is only really relevant for unformatted files.

Reply via email to