On Fri, 15 Aug 2025 at 01:03, Colin Watson <[email protected]> wrote:
>
> On Fri, Aug 15, 2025 at 12:47:35AM +1200, [email protected] wrote:
> >On Wed, 13 Aug 2025 at 23:06, Colin Watson <[email protected]> wrote:
> >...
> >> I'm downgrading this for the moment as I can't currently find evidence
> >> that it's a baseline violation.  I've tried this in various ancient qemu
> >> CPU models ("-cpu Conroe", "-cpu qemu64", "-cpu core2duo"), and it seems
> >> fine there.  I'm prepared to believe that I've missed something, but
> >> figuring it out seems like a bit of a fishing expedition.
> >
> >Hi Colin! It's a baseline violation. Your analysis of the build files
> >was helpful but ultimately I just had to check the dmesg log for the
> >segfault and look up the offset in the shared library:
> >
> >traps: mtxrun[62011] trap invalid opcode ip:7fe6e4f64988
> >sp:7ffe42301c80 error:0 in libmimalloc.so.3.0[c988,7fe6e4f5e000+15000]
> >
> >c988:       f3 48 0f b8 c2          popcnt rax,rdx
>
> OK.  I think this is coming from supposedly CPUID-guarded code:
>
>    $ git grep -i popcnt
>    include/mimalloc/bits.h:extern bool _mi_cpu_has_popcnt;
>    include/mimalloc/bits.h:    if mi_unlikely(!_mi_cpu_has_popcnt) { return 
> _mi_popcount_generic(x); }
>    include/mimalloc/bits.h:    __asm ("popcnt\t%1,%0" : "=r"(r) : "r"(x) : 
> "cc");
>    include/mimalloc/bits.h:    if mi_unlikely(!_mi_cpu_has_popcnt) { return 
> _mi_popcount_generic(x); }
>    include/mimalloc/bits.h:    return (size_t)mi_msc_builtinz(__popcnt)(x);
>    include/mimalloc/bits.h:    return (size_t)mi_msc_builtinz(__popcnt)(x);
>    src/init.c:mi_decl_cache_align bool _mi_cpu_has_popcnt = false;
>    src/init.c:    _mi_cpu_has_popcnt = ((cpu_info[2] & (1 << 23)) != 0); // 
> bit 23 of ECX : see 
> <https://en.wikipedia.org/wiki/CPUID#EAX=1:_Processor_Info_and_Feature_Bits>
>    src/init.c:  _mi_cpu_has_popcnt = true;
>
> Here's the relevant code (for GCC/amd64):
>
>      #if !defined(__BMI1__)
>      if mi_unlikely(!_mi_cpu_has_popcnt) { return _mi_popcount_generic(x); }
>      #endif
>      size_t r;
>      __asm ("popcnt\t%1,%0" : "=r"(r) : "r"(x) : "cc");
>      return r;
>
> And:
>
>    static void mi_detect_cpu_features(void) {
>      // FSRM for fast short rep movsb/stosb support (AMD Zen3+ (~2020) or 
> Intel Ice Lake+ (~2017))
>      // EMRS for fast enhanced rep movsb/stosb support
>      uint32_t cpu_info[4];
>      if (mi_cpuid(cpu_info, 7)) {
>        _mi_cpu_has_fsrm = ((cpu_info[3] & (1 << 4)) != 0); // bit 4 of EDX : 
> see <https://en.wikipedia.org/wiki/CPUID#EAX=7,_ECX=0:_Extended_Features>
>        _mi_cpu_has_erms = ((cpu_info[1] & (1 << 9)) != 0); // bit 9 of EBX : 
> see <https://en.wikipedia.org/wiki/CPUID#EAX=7,_ECX=0:_Extended_Features>
>      }
>      if (mi_cpuid(cpu_info, 1)) {
>        _mi_cpu_has_popcnt = ((cpu_info[2] & (1 << 23)) != 0); // bit 23 of 
> ECX : see 
> <https://en.wikipedia.org/wiki/CPUID#EAX=1:_Processor_Info_and_Feature_Bits>
>      }
>    }
>
> In principle this sort of thing should be OK.  But maybe __BMI1__ is
> defined and so the generic fallback isn't present, or maybe the CPUID
> check is incorrect for your CPU?  I'll check the former when I have a
> little more time, but if you could check the latter that would be
> helpful.

This observation may help. The relevant code (for GCC/amd64) should
include a runtime check before executing the popcnt instruction if
__BMI1__ is not defined.

If we look at the surrounding assembly before popcnt is executed there
should be a simple compare for the boolean value _mi_cpu_has_popcnt
and a jump skipping the popcnt instruction if the result is false:

    c976:       f0 49 0f b1 0c 24       lock cmpxchg QWORD PTR [r12],rcx
    c97c:       75 f2                   jne    c970
<mi_arena_reload@@Base+0x7a0>
    c97e:       48 8d 0d bb ce 01 00    lea    rcx,[rip+0x1cebb]
 # 29840 <mi_stats_get_json@@Base+0x10a70>
    c985:       48 21 c2                and    rdx,rax
    c988:       f3 48 0f b8 c2          popcnt rax,rdx

Here's another snippet from the binary (no check is apparent):

    ded9:       b8 01 00 00 00          mov    eax,0x1
    dede:       48 d3 e0                shl    rax,cl
    dee1:       48 83 e8 01             sub    rax,0x1
    dee5:       89 d1                   mov    ecx,edx
    dee7:       48 d3 e0                shl    rax,cl
    deea:       49 89 c7                mov    r15,rax
    deed:       48 8d 05 4c b9 01 00    lea    rax,[rip+0x1b94c]
 # 29840 <mi_stats_get_json@@Base+0x10a70>
    def4:       f3 4c 0f b8 df          popcnt r11,rdi

Locally there is no indication these simple checks are being performed.

This indicates __BMI1__ is incorrectly defined.

(Or perhaps there is no runtime detection going on and what we are
seeing is output from intrinsics compiled for a platform that supports
the popcnt instruction.)

Regards,
Adam

Reply via email to