Instead of automatically allocating the temporary array in heap, it would be wise to perform a few checks to determine a temporary array is actually required, whether to reserve memory in stack instead etc.
The code produced by GCC for the following subroutine does the following. i. Start the loop. ii. Malloc memory required to hold Ry(:,n) * Rx(:) iii. Perform Ry(:,n)* Rx(:), store result in malloced memory iv. Copy result from malloced memory to Ry(:,n) v. Free malloced memory vi. go to loop start. This is very inefficient. subroutine malloc_test(Ry, Rx, ny) implicit none integer(kind=kind(1)), intent(in) :: ny real(kind=kind(1.0d0)), dimension(:,:), pointer :: Ry real(kind=kind(1.0d0)), dimension(:), pointer :: Rx integer(kind=kind(1)) :: n do n = 1,ny Ry(:,n) = Ry(:,n) * Rx(:) end do end subroutine malloc_test Other relevant information: 1. Compile flags: -O3 -ffast-math -m64 -march=amdfam10 2. gfortran version: gfortran -v Using built-in specs. Target: x86_64-unknown-linux-gnu Configured with: /tmp/src/gcc-4.3.0/configure --prefix=/opt/amd/gcc-4.3.0 --enable-languages=c,c++,fortran --enable-stage1-checking --with-as=/opt/amd/gcc-4.3.0/bin/as --with-ld=/opt/amd/gcc-4.3.0/bin/ld --with-mpfr=/tmp/install/mpfr-2.3.0 --with-gmp=/tmp/install/gmp-4.2.2 Thread model: posix gcc version 4.3.1 20080312 (prerelease) (GCC) 3. model name: AMD Phenom(tm) 8650 Triple-Core Processor 4. flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow constant_tsc pni cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy altmovcr8 abm sse4a misalignsse 3dnowprefetch osvw -- Summary: Fortran: Minimize heap allocation of temporary arrays. Product: gcc Version: 4.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: fortran AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: rajiv dot adhikary at amd dot com http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36842