https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91030
--- Comment #20 from Thomas Koenig <tkoenig at gcc dot gnu.org> --- (In reply to David Edelsohn from comment #18) > For GPFS, the striping unit is 16M. The 8K buffer size chosen by GFortran > is a huge performance sink. We have confirmed this with testing. Could you share some benchmarks on this? I'd really like if the gfortran maintainers could form their own judgment on this, based on numbers. Here's a benchmark program: #include <sys/time.h> #include <sys/types.h> #include <sys/stat.h> #include <sys/statvfs.h> #include <fcntl.h> #include <stdio.h> #include <stdlib.h> #include <unistd.h> double walltime (void) { struct timeval TV; double elapsed; gettimeofday(&TV, NULL); elapsed = (double) TV.tv_sec + 1.0e-6*((double) TV.tv_usec); return elapsed; } #define NAME "out.dat" #define N 250000000 int main() { int fd; double *p, *w; long i, size, blocksize, left, to_write; int bits; double t1, t2; struct statvfs buf; printf ("Test using %e doubles\n", N * 1.0); statvfs (".", &buf); printf ("Block size of file system: %ld\n", buf.f_bsize); p = malloc(N * sizeof (*p)); for (i=0; i<N; i++) p[i] = i; for (bits = 10; bits < 27; bits++) { sync(); blocksize = 1 << bits; printf("bs = %10ld, ", blocksize); unlink (NAME); fd = open(NAME, O_WRONLY|O_CREAT, S_IRUSR | S_IWUSR); if (fd < 0) { perror ("Open of " NAME " failed"); exit(1); } left = N; w = p; t1 = walltime(); while (left > 0) { if (left >= blocksize) to_write = blocksize; else to_write = left; write (fd, w, blocksize * sizeof (double)); w += to_write; left -= to_write; } close (fd); t2 = walltime (); printf ("%.2f MiB/s\n", N / (t2-t1) / 1048576); } free (p); unlink (NAME); return 0; } And here is some output on my home system (ext4): Test using 2.500000e+08 doubles Block size of file system: 4096 bs = 1024, 175.81 MiB/s bs = 2048, 244.40 MiB/s bs = 4096, 247.27 MiB/s bs = 8192, 227.46 MiB/s bs = 16384, 195.55 MiB/s bs = 32768, 223.14 MiB/s bs = 65536, 168.95 MiB/s bs = 131072, 240.70 MiB/s bs = 262144, 260.39 MiB/s bs = 524288, 265.38 MiB/s bs = 1048576, 261.67 MiB/s bs = 2097152, 259.94 MiB/s bs = 4194304, 258.71 MiB/s bs = 8388608, 262.19 MiB/s bs = 16777216, 260.19 MiB/s bs = 33554432, 263.37 MiB/s bs = 67108864, 264.47 MiB/s And here is something on gcc135 (POWER9), also ext4: Test using 2.500000e+08 doubles Block size of file system: 4096 bs = 1024, 206.76 MiB/s bs = 2048, 293.66 MiB/s bs = 4096, 347.13 MiB/s bs = 8192, 298.23 MiB/s bs = 16384, 397.51 MiB/s bs = 32768, 401.86 MiB/s bs = 65536, 431.83 MiB/s bs = 131072, 475.88 MiB/s bs = 262144, 470.09 MiB/s bs = 524288, 478.84 MiB/s bs = 1048576, 485.68 MiB/s bs = 2097152, 485.33 MiB/s bs = 4194304, 483.96 MiB/s bs = 8388608, 482.88 MiB/s bs = 16777216, 485.04 MiB/s bs = 33554432, 483.92 MiB/s bs = 67108864, 485.55 MiB/s So, write thoughput sort of seems to level out at ~ 131072 block size, 2**17. For Fortran, this is only really relevant for unformatted files.