I spent a bit more time researching the portability implications of this patch. I think that we should check __ARM_NEON before #including <arm_neon.h>; there is authoritative documentation out there telling you to, eg [1], and I can see no upside at all to not checking. We cannot check *only* __ARM_NEON, though. I found it to get defined by clang 8.0.0 in my Fedora 30 32-bit image, although that does not provide all the instructions we want (I see "undefined function" complaints for vmaxvq_u8 etc if I try to make it use the patch). Looking into that installation's <arm_neon.h>, those functions are defined conditionally if "__ARM_FP & 2", which is kind of interesting --- per [1], that bit indicates support for 16-bit floating point, which seems a mite unrelated.
It appears from the info at [2] that there are at least some 32-bit ARM platforms that set that bit, implying (if the clang authors are well informed) that they have the instructions we want. But we could not realistically make 32-bit builds that try to use those instructions without a run-time test; such a build would fail for too many people. I doubt that a run-time test is worth the trouble, so I concur with the idea of selecting NEON on aarch64 only and hoping to thereby avoid a runtime test. In short, I think the critical part of 0002 needs to look more like this: +#elif defined(__aarch64__) && defined(__ARM_NEON) +/* + * We use the Neon instructions if the compiler provides access to them + * (as indicated by __ARM_NEON) and we are on aarch64. While Neon support is + * technically optional for aarch64, it appears that all available 64-bit + * hardware does have it. Neon exists in some 32-bit hardware too, but + * we could not realistically use it there without a run-time check, + * which seems not worth the trouble for now. + */ +#include <arm_neon.h> +#define USE_NEON ... Coding like this appears to work on both my Apple M1 and my Raspberry Pi, with several different OSes checked on the latter. regards, tom lane [1] https://developer.arm.com/documentation/101754/0618/armclang-Reference/Other-Compiler-specific-Features/Predefined-macros [2] http://micro-os-plus.github.io/develop/predefined-macros/