On Thu, Jul 18, 2024 at 10:31 AM Uros Bizjak <ubiz...@gmail.com> wrote:
>
> On Thu, Jul 18, 2024 at 10:21 AM Jakub Jelinek <ja...@redhat.com> wrote:
> >
> > On Thu, Jul 18, 2024 at 10:12:46AM +0200, Uros Bizjak wrote:
> > > On Thu, Jul 18, 2024 at 9:50 AM Jakub Jelinek <ja...@redhat.com> wrote:
> > > >
> > > > On Thu, Jul 18, 2024 at 09:34:14AM +0200, Uros Bizjak wrote:
> > > > > > > +      unsigned int ecx2 = 0, family = 0;
> > > > >
> > > > > No need to initialize these two variables.
> > > >
> > > > The function ignores __get_cpuid result, so at least the
> > > > FEAT1_REGISTER = 0; is needed before the first __get_cpuid.
> > > > Do you mean the ecx2 = 0 initialization is useless because
> > > > __get_cpuid (0, ...) on x86_64 will always succeed (especially
> > > > when __get_cpuid (1, ...) had to succeed otherwise FEAT1_REGISTER
> > > > would be zero)?
> > > > I guess that is true, but won't that cause -Wmaybe-uninitialized 
> > > > warnings?
> > >
> > > Yes, if the __get_cpuid (1, ...) works OK, then we are sure that
> > > __get_cpuid (0, ...) will also work.
> > >
> > > > I agree initializing family to 0 is not needed, but I don't understand
> > > > why it isn't just
> > > >       unsigned family = (eax >> 8) & 0x0f;
> > > > Though, guess even that might fail with -Wmaybe-uninitialized too, as
> > > > eax isn't unconditionally initialized.
> > >
> > > Perhaps we should check the result of __get_cpuid (1, ...) and use eax
> > > only if the function returns 1? IMO, this would solve the
> > > uninitialized issue, and we could use __cpuid in the second case (we
> > > would know that leaf 0 is supported, because leaf 1 support was
> > > checked with __get_cpuid (1, ...)).
> >
> > We know the code is ok if FEAT1_REGISTER = 0; is done before __get_cpuid (1,
> > ...).
> > Everything else is implied from it, all we need to ensure is that
> > -Wmaybe-uninitialized is happy about it.
> > Whatever doesn't report the warning and ideally doesn't increase the size of
> > the function.
> > I think the reason it is written the way it is before the AVX hacks in it
> > is that we need to handle even the case when __get_cpuid (1, ...) returns 0,
> > and we want in that case FEAT1_REGISTER = 0.
> > So it could be
>
> Yes, I think this is better, see below.
>
> >   FEAT1_REGISTER = 0;
> > #ifdef __x86_64__
> >   if (__get_cpuid (1, &eax, &ebx, &ecx, &edx)
> >       && (FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
> >           == (bit_AVX | bit_CMPXCHG16B))
> >     {
>
> Here we can simply use

Attached patch illustrates the proposed improvement with nested cpuid
calls. Bootstrapped and teased with libatomic testsuite.

Jakub, WDYT?

Uros.
diff --git a/libatomic/config/x86/init.c b/libatomic/config/x86/init.c
index a75be3f175c..94d45683567 100644
--- a/libatomic/config/x86/init.c
+++ b/libatomic/config/x86/init.c
@@ -32,22 +32,25 @@ unsigned int
 __libat_feat1_init (void)
 {
   unsigned int eax, ebx, ecx, edx;
-  FEAT1_REGISTER = 0;
-  __get_cpuid (1, &eax, &ebx, &ecx, &edx);
-#ifdef __x86_64__
-  if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
-      == (bit_AVX | bit_CMPXCHG16B))
+  if (__get_cpuid (1, &eax, &ebx, &ecx, &edx))
     {
-      /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned address
-        is atomic, and AMD is going to do something similar soon.
-        We don't have a guarantee from vendors of other CPUs with AVX,
-        like Zhaoxin and VIA.  */
-      unsigned int ecx2 = 0;
-      __get_cpuid (0, &eax, &ebx, &ecx2, &edx);
-      if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
-       FEAT1_REGISTER &= ~bit_AVX;
-    }
+#ifdef __x86_64__
+      if ((FEAT1_REGISTER & (bit_AVX | bit_CMPXCHG16B))
+         == (bit_AVX | bit_CMPXCHG16B))
+       {
+         /* Intel SDM guarantees that 16-byte VMOVDQA on 16-byte aligned
+            address is atomic, and AMD is going to do something similar soon.
+            We don't have a guarantee from vendors of other CPUs with AVX,
+            like Zhaoxin and VIA.  */
+         unsigned int ecx2;
+         __cpuid (0, eax, ebx, ecx2, edx);
+         if (ecx2 != signature_INTEL_ecx && ecx2 != signature_AMD_ecx)
+           FEAT1_REGISTER &= ~bit_AVX;
+       }
 #endif
+    }
+  else
+    FEAT1_REGISTER = 0;
   /* See the load in load_feat1.  */
   __atomic_store_n (&__libat_feat1, FEAT1_REGISTER, __ATOMIC_RELAXED);
   return FEAT1_REGISTER;

Reply via email to