Raw Magick DOT COM wrote: > Hi All, > > My name is Peter Dove, I am new to FPC and Lazarus. I come from a > mainly Delphi background but I use C, C++ and assembler as needed to > improve performance on the imagining app we are working on. > > Like Delphi, FPC has a poor floating point optimisation situation in > comparison to similar compiles in C. For instance the following code > in Pascal > > A := 0; > B := 0.9; > For X := 0 to 10000000 do > begin > A := A + X; > A := A * B; > end; > > Takes some 220ms to perform. The major problem with the performance is > the poor loop optimisation and register usage, also with wasted push > and pulls from memory. Below is the result from the assembler output > from FPC - all optimisations were enabled..
The problem with such optimizations is that usually the compiler knows too little about a program so such optimizations apply to seldom and aren't worse the affort to be implemented. E.g. your assembler assumes that a:=a*b can't throw an exception, but it can and the compiler isn't allowed to assume that it doesn't. > > # Var A located at ebp-4 > # Var B located at ebp-8 > # Var X located at ebp-12 > > //A + B are set up before here - its the loop thats interrsting > > # [44] For X := 0 to 10000000 do > movl $0,-12(%ebp) > decl -12(%ebp) > .balign 4 > .L31: > incl -12(%ebp) > # [46] A := A + X; > flds -4(%ebp) > fildl -12(%ebp) > faddp %st,%st(1) > fstps -4(%ebp) > # [47] A := A * B; > flds -8(%ebp) > fmuls -4(%ebp) > fstps -4(%ebp) > cmpl $10000000,-12(%ebp) > jl .L31 > > My comments on this are that > > a) The loop counter is basically a comparison against a memory area = > slow Well, you need to write the counter to the memory as well, so this shouldn't count much. > b) There are some unnessary loads from memory occuring = slow > > The above code takes about 210ms to perform on my machine. Below is my > own assembler which takes about 100ms ( apologies it is in a slightly > different format ) > > asm > mov eax, 0; //Set up loop counter > @StartOfLoop: > mov dword ptr[x], eax; // Move its value into X ( on stack ) > FILD dword ptr[x]; //Load into floating point > FADD dword ptr[A]; // Add A ( on Stack ) to it > FMUL dword ptr[B]; //Multiply by B ( on Stack ) > FSTP dword ptr[A]; // Pop into A > add eax, 1; //Inc loop counter > cmp eax, 10000000; // Test Jump condition > jl @StartOfLoop; > end; > > My question is, what needs to be done to the compiler to make it The compiler needs a proper lifetime analysis of expressions. > optimise as well as C compilers, See above, this is often not possible in pascal. > or perhaps I am missing some compiler > switches. _______________________________________________ fpc-pascal maillist - fpc-pascal@lists.freepascal.org http://lists.freepascal.org/mailman/listinfo/fpc-pascal