Hi Jeff,
there's a good chance that your compiler outsmarted you. i.e. parts of
your test are optimized out. I suggest to use smth like "benchmark" for
tests. Also, make sure that the variables in your test cannot be
optimized out.
Cheers
Johannes
On 08.10.23 00:22, Jeff R wrote:
I modified a simple Volk sqrt program for an ARM1176JZ-S processor to
test performance, and the results are puzzling. The following program
prints:
dur_VolkSqrt=(0.000000)0.001721 dur_CRTLSqrt=(0.000000)0.000318
The following processor information is displayed. It appears as though
NEON is supported.
~/volk-3.0.0/build# cpu_features/list_cpu_features
arch : aarch64____
implementer : 65 (0x41)____
variant : 0 (0x00)____
part : 3336 (0xD08)____
revision : 3 (0x03)
flags : asimd,cpuid,crc32,fp
Why are the numbers so slow for Volk versus the CRTL? I may be missing
something obvious. Thank you in advance.
Here’s the test program:
// g++ -I /usr/local/include/volk volk_sqrt.cpp -o volk_sqrt -L
/usr/local/lib64/ -lvolk
// export LD_LIBRARY_PATH=/usr/local/lib64; ./volk_sqrt
#include <stdio.h>
#include <math.h>
#include <volk.h>
#include <limits.h>
#include <time.h>
#include <sys/time.h>
double get_wall_time()
{
struct timeval time;
if (gettimeofday(&time,NULL))
{
// Handle error
return 0;
}
return (double)time.tv_sec + (double)time.tv_usec * .000001;
}
int main(int argc, char* args[])
{
double walStop;
double walStart;
double dur_VolkSqrt;
double dur_CRTLSqrt;
int N = 1024*16;
unsigned int alignment = volk_get_alignment();
float* in = (float*)volk_malloc(sizeof(float)*N, alignment);
float* out = (float*)volk_malloc(sizeof(float)*N, alignment);
for(unsigned int ii = 0; ii < N; ++ii)
{
in[ii] = (float)(ii*ii);
}
walStart = get_wall_time();
volk_32f_sqrt_32f_a(out, in, N);
//volk_32f_sqrt_32f(out, in, N);
walStop = get_wall_time();
dur_VolkSqrt = walStop - walStart;
walStart = get_wall_time();
for(unsigned int ii = 0; ii < N; ++ii)
{
out[ii] = sqrt(in[ii]);
}
walStop = get_wall_time();
dur_CRTLSqrt = walStop - walStart;
printf("dur_VolkSqrt=(%f)%f dur_CRTLSqrt=(%f)%f\n", dur_VolkSqrt/N,
dur_VolkSqrt, dur_CRTLSqrt/N, dur_CRTLSqrt);
volk_free(in);
volk_free(out);
return 0;
}