On Fri, Aug 4, 2023 at 10:52 AM Jan Hubicka <hubi...@ucw.cz> wrote: > > Hi, > so I found the problem. We duplicate multiple paths and end up with: > > ;; basic block 6, loop depth 0, count 365072224 (estimated locally, freq > 0.3400) > ;; prev block 12, next block 7, flags: (NEW, REACHABLE, VISITED) > ;; pred: 4 [never (guessed)] count:0 (estimated locally, freq 0.0000) > (TRUE_VALUE,EXECUTABLE) > ;; 10 [always] count:365072224 (estimated locally, freq 0.3400) > (FALLTHRU,EXECUTABLE) > # _18 = PHI <0(4), 0(10)> > # d_39 = PHI <d_40(D)(4), d_40(D)(10)> > if (_18 == 0) > goto <bb 8>; [97.06%] > else > goto <bb 7>; [2.94%] > ;; succ: 8 [97.1% (guessed)] count:354334801 (estimated locally, freq > 0.3300) (TRUE_VALUE,EXECUTABLE) > ;; 7 [2.9% (guessed)] count:10737423 (estimated locally, freq > 0.0100) (FALSE_VALUE,EXECUTABLE) > > Here goto bb 7 is never taken but profile is wrong. > > Before threading we have chain of conditionals: > > __asm__("pushf{l|d} > pushf{l|d} > pop{l} %0 > mov{l} {%0, %1|%1, %0} > xor{l} {%2, %0|%0, %2} > push{l} %0 > popf{l|d} > pushf{l|d} > pop{l} %0 > popf{l|d} > " : "=&r" __eax_19, "=&r" __ebx_20 : "i" 2097152); > _21 = __eax_19 ^ __ebx_20; > _22 = _21 & 2097152; > if (_22 == 0) > goto <bb 4>; [34.00%] > else > goto <bb 3>; [66.00%] > > <bb 3> [local count: 708669602 freq: 0.660000]: > __asm__ __volatile__("cpuid > " : "=a" __eax_24, "=b" __ebx_25, "=c" __ecx_26, "=d" __edx_27 : "0" > 0); > > <bb 4> [local count: 1073741826 freq: 1.000000]: > # _33 = PHI <0(2), __eax_24(3)> > _16 = _33 == 0; > if (_33 == 0) > goto <bb 6>; [34.00%] > else > goto <bb 5>; [66.00%] > > <bb 5> [local count: 708669600 freq: 0.660000]: > __asm__ __volatile__("cpuid > " : "=a" a_44, "=b" b_45, "=c" c_46, "=d" d_47 : "0" 1, "2" 0); > > <bb 6> [local count: 1073741824 freq: 1.000000]: > # _18 = PHI <0(4), 1(5)> > # d_39 = PHI <d_40(D)(4), d_47(5)> > if (_18 == 0) > goto <bb 8>; [33.00%] > else > goto <bb 7>; [67.00%] > > > If first _22 == 0 then also _33 == 0 and _18 == 0 but the last case has > probability 33% while the first 34%, so the profile guess is not > consistent with the threaded path. So threading is right to end up with > profile inconsistency, but it should print reason for doing it. > > One option is to disable optimization for the check. Other option is to > get the first conditional predicted right. > Would this be OK?
Yeah, that looks OK. > gcc/ChangeLog: > > * config/i386/cpuid.h: Mark CPUs not supporting cpuid as unlikely. > > diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h > index 03fd6fc9478..9c768ac0b6d 100644 > --- a/gcc/config/i386/cpuid.h > +++ b/gcc/config/i386/cpuid.h > @@ -295,7 +295,7 @@ __get_cpuid_max (unsigned int __ext, unsigned int *__sig) > : "i" (0x00200000)); > #endif > > - if (!((__eax ^ __ebx) & 0x00200000)) > + if (__builtin_expect (!((__eax ^ __ebx) & 0x00200000), 0)) > return 0; > #endif >