https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116128
--- Comment #5 from mjr19 at cam dot ac.uk --- I think in general using partial sums improves accuracy. If one assumes that all of the data have the same sign and similar magnitude, then by the time the sum is nearly complete one is adding a single element to the sum of a large number of elements. Adding a small number to a big number causes precision loss. For accuracy, the ideal might be to add all elements in pairs to form an array of half the original size, and then recurse. Then the items one adds are always likely to be of similar size. But this would be very expensive. real::a(HUGE) a=1 t=0 do i=1,HUGE t=t+a(i) enddo will, for large values of HUGE, end with t=16777216 as 16777216+1=16777216. A sum function using sixteen partial sums might keep going until about 268435456, which is 16x better. The recursive pairs method would eventually overflow the real datatype, which is very much better. However, if the data are such that the sum is expected to be approximately zero, then these methods are all equally good. A lot depends on what one thinks that the values in a typical array to sum would be. As for using a library function, there might be a degree of overhead in the call, then deciding which instruction set to use, whereas the compiled code may have fewer instruction sets to choose between. For the product code one would want SSE, AVX, AVX+FMA and AVX-512 paths in a library, before one worries about detailed tuning for specific CPUs. I tend to be biased towards inline code for simple things, although I admit that there are good arguments on both sides. Inline code also wins if information about the array size is known at compile-time, e.g. perhaps so that one can skip checks for very small arrays.