Re: SMP on Intel SE7500CW2

Terry Lambert Tue, 27 Aug 2002 01:50:31 -0700

Craig Hawco wrote:
> I've been looking into PR i386/40564 as I'm the owner of an Intel
> SE7500CW2. I managed to track it down to start_ap in mp_machdep.c.
> 
> snippet from start_ap():
> 
>          while (read_apic_timer())
>                  if (mp_ncpus > cpus)
>                          return 1;       /* return SUCCESS */
> 
> After a bit of poking around I found mpboot.s as the location of where
> mp_ncpus gets increased (mp_begin) after the AP has been started. The
> startup code for the AP is also in mpboot.s. Not being a kernel hacker I'm
> kind of stuck at this point. Windows and Linux work with this board, so
> it's probably not a hardware problem. Start_ap also appears to follow the
> Intel MP Spec very closely, after a quick glance, so I'm at a loss. Is an
> interrupt being lost somewhere? Is the problem occuring before the AP even
> executes its startup code and thus never executing mp_begin? I'm not an
> assembly programmer, and only have a very loose understanding of assembly,
> so actually understanding anything going on in mpboot.s is not very likely.
> Any help would be greatly appreciated.


Assuming you are running the FreeBSD version 4.6 referenced by
that PR, what it's telling you when it panic's is that the BP
has waited over 5 seconds for the incl of mp_ncpus to occur,
and it never happened.

Basically, the only way this can happen, according to the code
is for the second CPU to totally crash, or for the increment to
occur in cache, and not have its value updated in the place that
the other CPU is looping on.  Unless you have cooked your ship by
overclocking or static, let's assume that it's the latter: the
notification is taking place, but is having no effect.

Neither you nor the PR indicate which compiler you used, or if
cranking the number of seconds up helps at all.  There are a
couple of things you can try; which (if any) will work depends
on information you aren't providing:

1)      Change:

        int     mp_ncpus;               /* # of CPUs, including BSP */

        To:

        volatile int     mp_ncpus;      /* # of CPUs, including BSP */

        In mp_machdep.c; it could be that the compiler options
        you are using are causing the value to end up being cached
        in a register in the loop (you would have to examine the
        assembly code to see if this were the case).

2)      In locore.s, change:

        begin:
        /* set up bootstrap stack */
        movl    _proc0paddr,%esp        /* location of in-kernel pages */
        addl    $UPAGES*PAGE_SIZE,%esp  /* bootstrap stack end location */
        xorl    %eax,%eax                       /* mark end of frames */
        movl    %eax,%ebp
        movl    _proc0paddr,%eax
        movl    _IdlePTD, %esi
        movl    %esi,PCB_CR3(%eax)
 
        testl   $CPUID_PGE, R(_cpu_feature)
        jz      1f
        movl    %cr4, %eax
        orl     $CR4_PGE, %eax
        movl    %eax, %cr4
1:

        To:

begin:
        /* set up bootstrap stack */
        movl    _proc0paddr,%esp        /* location of in-kernel pages */
        addl    $UPAGES*PAGE_SIZE,%esp  /* bootstrap stack end location */
        xorl    %eax,%eax                       /* mark end of frames */
        movl    %eax,%ebp
        movl    _proc0paddr,%eax
        movl    _IdlePTD, %esi
        movl    %esi,PCB_CR3(%eax)
 
        testl   $CPUID_PGE, R(_cpu_feature)
        jz      1f
/*
 * DISABLE PGE/GPE on Intel/AMD; eat performance hit on CR3 reload
 * in exchange for non-stale TLB contents on bogus motherboard with
 * bad MMU hardware and/or wiring and/or undocumented hardware bug.
 */
/*        movl    %cr4, %eax     */
/*        orl     $CR4_PGE, %eax */
/*        movl    %eax, %cr4     */
1:

3)      Add "options DISABLE_PSE" to the config file, and rebuild the
        kernel.

YMMV (of course).

-- Terry

To Unsubscribe: send mail to [EMAIL PROTECTED]
with "unsubscribe freebsd-hackers" in the body of the message

Re: SMP on Intel SE7500CW2

Reply via email to