Hi Fons,
sorry for the abbreviation PR == Pull Request. A request to merge your
contribution into the VOLK repository in this case. This includes a
forum to discuss the specifics and possible improvements before we merge
your code.
Did your Boost issue get solved? Boost was only required for old VOLK
and environment versions that did not support `std::filesystem` yet.
Your kernel versions imply you use a more recent system that supports
this C++17 feature.
For everyone else, libvolk is still a C library. We only use C++ for
tests etc.
For reference, these are the CPUs
Intel Core i5-3470
https://www.intel.com/content/www/us/en/products/sku/68316/intel-core-i53470-processor-6m-cache-up-to-3-60-ghz/specifications.html
With SSE and AVX
Intel Core i5-4300U
https://www.intel.com/content/www/us/en/products/sku/76308/intel-core-i54300u-processor-3m-cache-up-to-2-90-ghz/specifications.html
With SSE, AVX, and AVX2
I can't tell from a distance, why VOLK would not select AVX kernels on
an AVX capable CPU.
For your benchmarking needs:
https://github.com/google/benchmark
This might be a very important section of the docs:
https://github.com/google/benchmark/blob/main/docs/user_guide.md#preventing-optimization
Especially `DoNotOptimize` should be interesting.
It could very well happen, that your code was optimized out.
Cheers
Johannes
On 14.10.23 11:02, Fons Adriaensen wrote:
Hi Johannes,
Thanks for your response !
first off, we'd need to know a bit more about your setup. Could you share
the versions of VOLK and your host system, e.g. OS, version, etc.
Furthermore, do you use a VM, a container, or smth like this?
VOLK was 2.5.0, now upgraded to 3.0.0, same results.
No VM, container, etc used.
Machine info:
zita1 (desktop)
fons@zita1:~> lscpu
Architecture:x86_64
CPU op-mode(s):32-bit, 64-bit
Address sizes: 36 bits physical, 48 bits virtual
Byte Order:Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name:Intel(R) Core(TM) i5-3470 CPU @ 3.20GHz
CPU family: 6
Model: 58
Thread(s) per core: 1
Core(s) per socket: 4
Socket(s): 1
Stepping:9
CPU(s) scaling MHz: 45%
CPU max MHz: 3600.
CPU min MHz: 1600.
BogoMIPS:6387.26
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge
mca cmov
pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
pbe syscall
nx rdtscp lm constant_tsc arch_perfmon pebs bts
rep_good nopl
xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq
dtes64
monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm
pcid
sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes
xsave avx
f16c rdrand lahf_lm cpuid_fault epb pti tpr_shadow
flexp
riority ept vpid fsgsbase smep erms xsaveopt dtherm
ida arat
pln pts vnmi
Virtualization features:
Virtualization:VT-x
Caches (sum of all):
L1d: 128 KiB (4 instances)
L1i: 128 KiB (4 instances)
L2:1 MiB (4 instances)
L3:6 MiB (1 instance)
NUMA:
NUMA node(s): 1
NUMA node0 CPU(s): 0-3
Vulnerabilities:
Gather data sampling: Not affected
Itlb multihit: KVM: Mitigation: VMX disabled
L1tf: Mitigation; PTE Inversion; VMX conditional cache
flushes, SMT disabled
Mds: Vulnerable: Clear CPU buffers attempted, no
microcode; SMT disabled
Meltdown: Mitigation; PTI
Mmio stale data: Unknown: No mitigations
Retbleed: Not affected
Spec rstack overflow: Not affected
Spec store bypass: Vulnerable
Spectre v1:Mitigation; usercopy/swapgs barriers and __user
pointer sanitization
Spectre v2:Mitigation; Retpolines, STIBP disabled, RSB filling,
PBRSB-eIBRS Not affected
Srbds: Vulnerable: No microcode
Tsx async abort: Not affected
fons@zita1:~> uname -a
Linux zita1 6.5.5-arch1-1 #1 SMP PREEMPT_DYNAMIC Sat, 23 Sep 2023 22:55:13
+ x86_64 GNU/Linux
zita4 (laptop)
Architecture:x86_64
CPU op-mode(s):32-bit, 64-bit
Address sizes: 39 bits physical, 48 bits virtual
Byte Order:Little Endian
CPU(s): 4
On-line CPU(s) list: 0-3
Vendor ID: GenuineIntel
Model name:Intel(R) Core(TM) i5-4300U CPU @ 1.90GHz
CPU family: 6
Model: 69
Thread(s) per core: 2
Core(s) per socket: 2
Socket(s): 1
Stepping:1
CPU(s) scaling MHz: 46%
CPU max MHz: