Re: [fpc-pascal] FPC Graphics options?

Nikolay Nikolov Mon, 22 May 2017 17:05:00 -0700


On 05/23/2017 01:20 AM, nore...@z505.com wrote:

On 2017-05-18 19:54, Ryan Joseph wrote:

On May 18, 2017, at 10:40 PM, Jon Foster<jon-li...@jfpossibilities.com> wrote:
62.44      1.33     1.33 fpc_frac_real
26.76      1.90     0.57 MATH_$$_FLOOR$EXTENDED$$LONGINT
10.33      2.12     0.22 FPC_DIV_INT64


Thanks for profiling this.

Floor is there as I expected and 26% is pretty extreme but the others
are floating point division? How does Java handle this so much better
than FPC and what are the work arounds? Just curious. As it stands I
can only reason that I need to avoid dividing floats in FPC like the
plague.


Isn't java just a wrapper around C?

No. Java compilers generate code for a virtual machine, called JVM (JavaVirtual Machine). They do not generate code for x86 CPUs or any otherreal CPU. The JVM is like a fictional CPU, that does not exist in asilicon implementation anywhere, but is implemented in software only.


C compilers usually generate native code for real CPUs, just like FPC does.

Why does it matter? The x86 instruction set architecture has gonethrough quite a long evolution and there are many instruction setextensions, that were added along the way: 32-bit extensions (x86originally started as 16-bit), the x87 FPU instructions (this was aseparate coprocessor in the beginning, but later became integrated intothe main CPU starting from the 486DX onwards), MMX, SSE, SSE2, the64-bit extensions (x86_64), SSE3, AVX, etc.


There are generally two ways to do floating point on the x86:

- the x87 FPU - this is used by default by the FPC compiler on 32-bit(and 16-bit) x86- the SSE2 instruction set extension - this can replace the FPU andgenerally works faster on modern CPUs. This is used by default by the64-bit FPC compiler. That's because all 64-bit x86 CPUs support thisextension.

There is one disadvantage to using SSE2 instead of the x87 FPU - theSSE2 instructions don't support the 80-bit extended precision floattype. There's no support for it in any of the later x86 instruction setextensions either. If you need the 80-bit precision, the x87 FPU is theonly way to go, even on x86_64.

There's another disadvantage to using SSE2 by default on 32-bit x86 -programs, compiled for SSE2 will not run on older CPUs, which don'tsupport SSE2. There's simply no way around that. Therefore, we cannotmake use of SSE2 by default, without sacrificing backwardscompatibility. The only exception to that are certain RTL routines, likeMove() or FillChar() which take advantage of the SSE2 extensions,because they check the CPU capabilities at runtime and internallydispatch to several different implementations, for different CPU types,which are all compiled and linked in. But you simply cannot take thisapproach for every FPU operation, because if you do a CPU check on everyfloating point calculation, the overhead of all the checks will makeyour program slower that it would be, if you simply used the x87 FPUinstructions for example.

Virtual machines like the JVM don't have this problem and they canalways take advantage of newer instruction set extensions, withoutsacrificing backward compatibility with older CPUs. Why? Because the JVMbytecode has nothing to do with any processor at all. When you run yourprogram, the JVM bytecode is converted ("Just-In-Time" compiled) tonative code for the CPU the user has. So, if the user is running yourJava program on a CPU, that has SSE3, the JIT compiler will know it canuse SSE2 and SSE3 instructions. If another person runs it on an olderCPU, which doesn't have SSE2, the JIT compiler will compile it to usex87 FPU instructions. Sounds so great, you're going to ask if there areany disadvantages to this approach? And, of course, there are - sincethe program is essentially recompiled every time the user runs it,starting Java programs take a long time. There's also limited time thatthe JIT compiler can spend on optimization (otherwise programs willstart even slower). There are ways to combat that, by using some sort ofcache (.NET has the global assembly cache), but they are far fromperfect either - these caches eat a lot of disk space and then eitherprogram installation or the first time it is run (when the JIT compiledassembly hasn't been added to the cache) becomes slow. In general nativeprograms (FPC and C programs) feel a lot snappier to most users, becausethey start fast. But in the highly specific case of heavy floating pointcode (where SSE2 vs x87 FPU instruction sets matter), a native program(C or Pascal) compiled for the x87 FPU will be slower than the JVM,because the JVM will use SSE2 and SSE3 on modern CPUs.

Does this mean that it's always better to use the JVM? No. I mean, if itsuits you, go ahead and use it, there's nothing wrong with it (even FPCsupports it as a target: http://wiki.freepascal.org/FPC_JVM ), but thereare a lot of options for using native code as well:- if SSE2 and SSE3 make a huge performance difference for your program,and you don't need to support old CPUs (e.g. your users are happy aboutit or your program would be too slow to be usable on these CPUs anyway,since you need a lot of CPU performance), then enable {$fputype sse3}and probably recompile the RTL with it, to take full advantage of it.- if SSE2 and SSE3 (or AVX or whatever new extension) make a hugeperformance difference, but old CPU support is still valuable for yourusers, then compile and provide two .exe files - one for old CPUs andone for new ones.- if SSE2 and SSE3 don't make a difference, then you're not writingfloating point heavy code and you're happy with the default settings :)The compatibility with older CPUs is only a bonus in this case and isn'thurting your performance on new CPUs.

And, of course, it is easy to give examples, where a Java program wouldbe a lot slower than a FPC program. I know comparing different IDEs is alittle apples-to-oranges comparison (because they may have differentfeatures and vastly different implementation details), but compare thespeed of e.g. Lazarus to any IDE, written in Java, even the fastest one. :)

Anyhow, enough ranting, already. Just remember the golden rule ofoptimization: never assume.

Always measure and try to understand why something is slow. In 99% ofthe cases it's not what people initially think.


Nikolay
_______________________________________________
fpc-pascal maillist  -  fpc-pascal@lists.freepascal.org
http://lists.freepascal.org/cgi-bin/mailman/listinfo/fpc-pascal

Re: [fpc-pascal] FPC Graphics options?

Reply via email to