To be fair, I was running on a 1GHz Cortex-A8, that only had vfp3 (
https://en.wikipedia.org/wiki/CHIP_(computer)#CHIP). I fully expect that
you're going to need to spend some time going through all the different
compiler flags (-Ofast, -march, -mcpu, -mtune, -ffast-math,
-fthis-that-and-the-other)
In my experience VOLK runs about 10% faster on Raspberry Pi 4 with
-ffast-math, depending on the flowgraph you might get more. Three
kernels had some minor problems with precision, but I think they can
be fixed so all will pass... The fun thing about these embedded
platforms is that the possibiliti
Yep, on some processors with some workloads it really makes a huge
difference. As others have commented, there may be significant trade-offs
with regard to numerical behaviro.
Some of my observations:
dump1090 loses accuracy below the 7th digit when calculating precision, but
consumes 10% less pow
Interesting! I'm experimenting with this right now. NEON on armv7 is
not always IEEE754 compliant by the way.
--Albin
On Mon, Sep 2, 2019 at 5:28 PM Müller, Marcus (CEL) wrote:
>
> -ffast-math is allowed to reorder calculations, e.g. it can do
> a*f + b*f + c*f as (a+b+c)*f (or vice versa)
>
-ffast-math is allowed to reorder calculations, e.g. it can do
a*f + b*f + c*f as (a+b+c)*f (or vice versa)
which of course isn't necessarily the same result, and can lead to
numerical instability.
The problem is really that for some kernels, things like zero-signage
and NaN / Inf handling / r
ffast-math disables signed zero by design I think...
I'm particularly interested in problems of numeric stability and loss of
dynamic range.
--Albin
On Mon, Sep 2, 2019, 15:27 Johannes Demel wrote:
> Hi Albin,
>
> one of my students reported a little oddity about `-ffast-math`. Assume
> you w
Hi Albin,
one of my students reported a little oddity about `-ffast-math`. Assume
you want to set a float to -0, i.e. set the sign bit only. In this case
`-ffast-math` seems to remove the sign bit.
In VOLK this might be an issue with `_mm256_conjugate_ps` in
`include/volk/volk_avx_intrinsics.h