Craig Hawco wrote: > I've been looking into PR i386/40564 as I'm the owner of an Intel > SE7500CW2. I managed to track it down to start_ap in mp_machdep.c. > > snippet from start_ap(): > > while (read_apic_timer()) > if (mp_ncpus > cpus) > return 1; /* return SUCCESS */ > > After a bit of poking around I found mpboot.s as the location of where > mp_ncpus gets increased (mp_begin) after the AP has been started. The > startup code for the AP is also in mpboot.s. Not being a kernel hacker I'm > kind of stuck at this point. Windows and Linux work with this board, so > it's probably not a hardware problem. Start_ap also appears to follow the > Intel MP Spec very closely, after a quick glance, so I'm at a loss. Is an > interrupt being lost somewhere? Is the problem occuring before the AP even > executes its startup code and thus never executing mp_begin? I'm not an > assembly programmer, and only have a very loose understanding of assembly, > so actually understanding anything going on in mpboot.s is not very likely. > Any help would be greatly appreciated.
Assuming you are running the FreeBSD version 4.6 referenced by that PR, what it's telling you when it panic's is that the BP has waited over 5 seconds for the incl of mp_ncpus to occur, and it never happened. Basically, the only way this can happen, according to the code is for the second CPU to totally crash, or for the increment to occur in cache, and not have its value updated in the place that the other CPU is looping on. Unless you have cooked your ship by overclocking or static, let's assume that it's the latter: the notification is taking place, but is having no effect. Neither you nor the PR indicate which compiler you used, or if cranking the number of seconds up helps at all. There are a couple of things you can try; which (if any) will work depends on information you aren't providing: 1) Change: int mp_ncpus; /* # of CPUs, including BSP */ To: volatile int mp_ncpus; /* # of CPUs, including BSP */ In mp_machdep.c; it could be that the compiler options you are using are causing the value to end up being cached in a register in the loop (you would have to examine the assembly code to see if this were the case). 2) In locore.s, change: begin: /* set up bootstrap stack */ movl _proc0paddr,%esp /* location of in-kernel pages */ addl $UPAGES*PAGE_SIZE,%esp /* bootstrap stack end location */ xorl %eax,%eax /* mark end of frames */ movl %eax,%ebp movl _proc0paddr,%eax movl _IdlePTD, %esi movl %esi,PCB_CR3(%eax) testl $CPUID_PGE, R(_cpu_feature) jz 1f movl %cr4, %eax orl $CR4_PGE, %eax movl %eax, %cr4 1: To: begin: /* set up bootstrap stack */ movl _proc0paddr,%esp /* location of in-kernel pages */ addl $UPAGES*PAGE_SIZE,%esp /* bootstrap stack end location */ xorl %eax,%eax /* mark end of frames */ movl %eax,%ebp movl _proc0paddr,%eax movl _IdlePTD, %esi movl %esi,PCB_CR3(%eax) testl $CPUID_PGE, R(_cpu_feature) jz 1f /* * DISABLE PGE/GPE on Intel/AMD; eat performance hit on CR3 reload * in exchange for non-stale TLB contents on bogus motherboard with * bad MMU hardware and/or wiring and/or undocumented hardware bug. */ /* movl %cr4, %eax */ /* orl $CR4_PGE, %eax */ /* movl %eax, %cr4 */ 1: 3) Add "options DISABLE_PSE" to the config file, and rebuild the kernel. YMMV (of course). -- Terry To Unsubscribe: send mail to [EMAIL PROTECTED] with "unsubscribe freebsd-hackers" in the body of the message