On Thu, Jul 18, 2024 at 10:31 AM Uros Bizjak <ubiz...@gmail.com> wrote: > > On Thu, Jul 18, 2024 at 10:21 AM Jakub Jelinek <ja...@redhat.com> wrote: > > > > On Thu, Jul 18, 2024 at 10:12:46AM +0200, Uros Bizjak wrote: > > > On Thu, Jul 18, 2024 at 9:50 AM Jakub Jelinek <ja...@redhat.com> wrote: > > > > > > > > On Thu, Jul 18, 2024 at 09:34:14AM +0200, Uros Bizjak wrote: > > > > > > > + unsigned int ecx2 = 0, family = 0; > > > > > > > > > > No need to initialize these two variables. > > > > > > > > The function ignores __get_cpuid result, so at least the > > > > FEAT1_REGISTER = 0; is needed before the first __get_cpuid. > > > > Do you mean the ecx2 = 0 initialization is useless because > > > > __get_cpuid (0, ...) on x86_64 will always succeed (especially > > > > when __get_cpuid (1, ...) had to succeed otherwise FEAT1_REGISTER > > > > would be zero)? > > > > I guess that is true, but won't that cause -Wmaybe-uninitialized > > > > warnings? > > > > > > Yes, if the __get_cpuid (1, ...) works OK, then we are sure that > > > __get_cpuid (0, ...) will also work. > > > > > > > I agree initializing family to 0 is not needed, but I don't understand > > > > why it isn't just > > > > unsigned family = (eax >> 8) & 0x0f; > > > > Though, guess even that might fail with -Wmaybe-uninitialized too, as > > > > eax isn't unconditionally initialized. > > > > > > Perhaps we should check the result of __get_cpuid (1, ...) and use eax > > > only if the function returns 1? IMO, this would solve the > > > uninitialized issue, and we could use __cpuid in the second case (we > > > would know that leaf 0 is supported, because leaf 1 support was > > > checked with __get_cpuid (1, ...)). > > > > We know the code is ok if FEAT1_REGISTER = 0; is done before __get_cpuid (1, > > ...). > > Everything else is implied from it, all we need to ensure is that > > -Wmaybe-uninitialized is happy about it. > > Whatever doesn't report the warning and ideally doesn't increase the size of > > the function. > > I think the reason it is written the way it is before the AVX hacks in it > > is that we need to handle even the case when __get_cpuid (1, ...) returns 0, > > and we want in that case FEAT1_REGISTER = 0. > > So it could be > > Yes, I think this is better, see below. > > > FEAT1_REGISTER = 0; > > #ifdef __x86_64__ > > if (__get_cpuid (1, &eax, &ebx, &ecx, &edx) > > && (FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B)) > > == (bit_AVX | bit_CMPXCHG16B)) > > { > > Here we can simply use
Attached patch illustrates the proposed improvement with nested cpuid calls. Bootstrapped and teased with libatomic testsuite. Jakub, WDYT? Uros.
diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c index a75be3f175c..94d45683567 100644 --- a/libatomic/config/x86/init.c +++ b/libatomic/config/x86/init.c @@ -32,22 +32,25 @@ unsigned int __libat_feat1_init (void) { unsigned int eax, ebx, ecx, edx; - FEAT1_REGISTER = 0; - __get_cpuid (1, &eax, &ebx, &ecx, &edx); -#ifdef __x86_64__ - if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B)) - == (bit_AVX | bit_CMPXCHG16B)) + if (__get_cpuid (1, &eax, &ebx, &ecx, &edx)) { - /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address - is atomic, and AMD is going to do something similar soon. - We don't have a guarantee from vendors of other CPUs with AVX, - like Zhaoxin and VIA. */ - unsigned int ecx2 = 0; - __get_cpuid (0, &eax, &ebx, &ecx2, &edx); - if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx) - FEAT1_REGISTER &= ~bit_AVX; - } +#ifdef __x86_64__ + if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B)) + == (bit_AVX | bit_CMPXCHG16B)) + { + /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned + address is atomic, and AMD is going to do something similar soon. + We don't have a guarantee from vendors of other CPUs with AVX, + like Zhaoxin and VIA. */ + unsigned int ecx2; + __cpuid (0, eax, ebx, ecx2, edx); + if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx) + FEAT1_REGISTER &= ~bit_AVX; + } #endif + } + else + FEAT1_REGISTER = 0; /* See the load in load_feat1. */ __atomic_store_n (&__libat_feat1, FEAT1_REGISTER, __ATOMIC_RELAXED); return FEAT1_REGISTER;