On Thu, 2 Jun 2005, Eric Faurot wrote: > On 5/31/05, Otto Moerbeek <[EMAIL PROTECTED]> wrote: > > > Well, after some really deep digging, this commit by Dale Rahn fixes > > the problem. > > > > http://www.openbsd.org/cgi-bin/cvsweb/src/sys/arch/powerpc/powerpc/trap.c.diff?r1=1.67&r2=1.68&f=h > > > > Thanks for the report. > > I'm glad this can make openbsd better. Tracking the bug was probably much > more difficult. Thanks for that. > > What puzzles me is that although it looks like a really "fundamental" flaw, > it does not seem to have much impact on the overall system stability. > Could you possibly post your canonical code exposing the bug?
The scenario happens if a process uses floating point between fork and exec. ksh(1) does that. What happens is that the new process' floating point context did not get initialized properly, and after a context switch, it would get the wrong context restored. Depending on other processes using floating point or not, you sometimes were lucky, sometimes not. Why we did not spot this earlier, I do not know. My guess is that not a lot of programs use floating point for a long time where the values are stored in registers and not loaded from memory. If they use floating point, it's mostly for some quick short calculation, not some number "crunching" in a tight loop like jot(1) does with -r. The minimal test program was very simple: #include <stdio.h> #include <stdlib.h> int main(void) { double x = 1; int i = 0; while (1) { if (x != 1) { printf("%d %f\n", i, x); abort(); } i++; } return 0; } Note: it only fails when compiled with -O2 or -O1, since without optimization x is stored on the stack and reloaded all the time into a register. The chanches of the first context switch being between the load and use are very tiny, and the next context switches all would be fine, if I see things correctly. -Otto