Hi Konrad,
>> CPU detection is a bottomless can of worms. > > That sounds very credible. But what can we do about this? > > There is obviously a trade-off between reproducibility and performance > here. Can we support both, in a way that users can understand and manage? So far our default approach has been to use the lowest common set of CPU instructions, which generally leads to poorly performing code. Some packages are smarter and provide different code paths for different CPUs. The resulting binary is built the same, but at runtime different parts of the code run dependent on the features the CPU reports. The case of OpenBLAS is an anomaly in that this mechanism seems to produce different binaries dependent on where it is built. When I first encountered this problem I guessed that perhaps it can only build these different code paths up to the feature set of the CPU on the build machine, so if you’re building with an older CPU your binary will lack components that would be used on newer CPUs. This is just a guess, though. Your problem is that the OpenBLAS build system doesn’t recognize your modern CPU. Ideally, it wouldn’t need to know anything about the build-time CPU to build all the different code paths for different CPU features. The only way around this — retroactively — is to pretend to have an older CPU, e.g. by using qemu. In the long term it would be great if we could patch OpenBLAS to not attempt to detect CPU features at build time. I’m not sure this will work if it does indeed use the currently available CPU features to determine “how far up” to build modules in support of certain CPU features / instruction sets. > There is of course the issue that we can never be sure if a build will > be reproducible in the future. But we can at least take care of the > cases where the packager is aware of non-reproducibility issues, and > make them transparent and manageable. The new “--tune” feature is supposed to take care of cases like this. We would still patch the code so that by default you’d get a package that is reproducible (= you get the same exact binary no matter when or where you build it) but that many not have optimal performance. With “--tune” you could opt to replace that generic build with one that uses features of your current CPU, using grafts to swap the generic library for the more performant library. -- Ricardo