On 05/23/2017 01:20 AM, nore...@z505.com wrote:
On 2017-05-18 19:54, Ryan Joseph wrote:
On May 18, 2017, at 10:40 PM, Jon Foster <jon-li...@jfpossibilities.com> wrote:

62.44      1.33     1.33 fpc_frac_real
26.76      1.90     0.57 MATH_$$_FLOOR$EXTENDED$$LONGINT
10.33      2.12     0.22 FPC_DIV_INT64

Thanks for profiling this.

Floor is there as I expected and 26% is pretty extreme but the others
are floating point division? How does Java handle this so much better
than FPC and what are the work arounds? Just curious. As it stands I
can only reason that I need to avoid dividing floats in FPC like the
plague.


Isn't java just a wrapper around C?
No. Java compilers generate code for a virtual machine, called JVM (Java Virtual Machine). They do not generate code for x86 CPUs or any other real CPU. The JVM is like a fictional CPU, that does not exist in a silicon implementation anywhere, but is implemented in software only.

C compilers usually generate native code for real CPUs, just like FPC does.

Why does it matter? The x86 instruction set architecture has gone through quite a long evolution and there are many instruction set extensions, that were added along the way: 32-bit extensions (x86 originally started as 16-bit), the x87 FPU instructions (this was a separate coprocessor in the beginning, but later became integrated into the main CPU starting from the 486DX onwards), MMX, SSE, SSE2, the 64-bit extensions (x86_64), SSE3, AVX, etc.

There are generally two ways to do floating point on the x86:
- the x87 FPU - this is used by default by the FPC compiler on 32-bit (and 16-bit) x86 - the SSE2 instruction set extension - this can replace the FPU and generally works faster on modern CPUs. This is used by default by the 64-bit FPC compiler. That's because all 64-bit x86 CPUs support this extension.

There is one disadvantage to using SSE2 instead of the x87 FPU - the SSE2 instructions don't support the 80-bit extended precision float type. There's no support for it in any of the later x86 instruction set extensions either. If you need the 80-bit precision, the x87 FPU is the only way to go, even on x86_64.

There's another disadvantage to using SSE2 by default on 32-bit x86 - programs, compiled for SSE2 will not run on older CPUs, which don't support SSE2. There's simply no way around that. Therefore, we cannot make use of SSE2 by default, without sacrificing backwards compatibility. The only exception to that are certain RTL routines, like Move() or FillChar() which take advantage of the SSE2 extensions, because they check the CPU capabilities at runtime and internally dispatch to several different implementations, for different CPU types, which are all compiled and linked in. But you simply cannot take this approach for every FPU operation, because if you do a CPU check on every floating point calculation, the overhead of all the checks will make your program slower that it would be, if you simply used the x87 FPU instructions for example.

Virtual machines like the JVM don't have this problem and they can always take advantage of newer instruction set extensions, without sacrificing backward compatibility with older CPUs. Why? Because the JVM bytecode has nothing to do with any processor at all. When you run your program, the JVM bytecode is converted ("Just-In-Time" compiled) to native code for the CPU the user has. So, if the user is running your Java program on a CPU, that has SSE3, the JIT compiler will know it can use SSE2 and SSE3 instructions. If another person runs it on an older CPU, which doesn't have SSE2, the JIT compiler will compile it to use x87 FPU instructions. Sounds so great, you're going to ask if there are any disadvantages to this approach? And, of course, there are - since the program is essentially recompiled every time the user runs it, starting Java programs take a long time. There's also limited time that the JIT compiler can spend on optimization (otherwise programs will start even slower). There are ways to combat that, by using some sort of cache (.NET has the global assembly cache), but they are far from perfect either - these caches eat a lot of disk space and then either program installation or the first time it is run (when the JIT compiled assembly hasn't been added to the cache) becomes slow. In general native programs (FPC and C programs) feel a lot snappier to most users, because they start fast. But in the highly specific case of heavy floating point code (where SSE2 vs x87 FPU instruction sets matter), a native program (C or Pascal) compiled for the x87 FPU will be slower than the JVM, because the JVM will use SSE2 and SSE3 on modern CPUs.

Does this mean that it's always better to use the JVM? No. I mean, if it suits you, go ahead and use it, there's nothing wrong with it (even FPC supports it as a target: http://wiki.freepascal.org/FPC_JVM ), but there are a lot of options for using native code as well: - if SSE2 and SSE3 make a huge performance difference for your program, and you don't need to support old CPUs (e.g. your users are happy about it or your program would be too slow to be usable on these CPUs anyway, since you need a lot of CPU performance), then enable {$fputype sse3} and probably recompile the RTL with it, to take full advantage of it. - if SSE2 and SSE3 (or AVX or whatever new extension) make a huge performance difference, but old CPU support is still valuable for your users, then compile and provide two .exe files - one for old CPUs and one for new ones. - if SSE2 and SSE3 don't make a difference, then you're not writing floating point heavy code and you're happy with the default settings :) The compatibility with older CPUs is only a bonus in this case and isn't hurting your performance on new CPUs.

And, of course, it is easy to give examples, where a Java program would be a lot slower than a FPC program. I know comparing different IDEs is a little apples-to-oranges comparison (because they may have different features and vastly different implementation details), but compare the speed of e.g. Lazarus to any IDE, written in Java, even the fastest one. :)

Anyhow, enough ranting, already. Just remember the golden rule of optimization: never assume.

Always measure and try to understand why something is slow. In 99% of the cases it's not what people initially think.

Nikolay
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Reply via email to