Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Michal Jaegermann
On Tue, Apr 24, 2001 at 06:56:32PM +0200, Christian Ehrhardt wrote: > On Tue, Apr 24, 2001 at 09:10:07AM -0700, Linus Torvalds wrote: > > ptrace only operates on processes that are stopped. So there are no > > locking issues - we've synchronized on a much higher level than a > > spinlock or semaph

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Alan Cox
> Alan Cox <[EMAIL PROTECTED]> writes: > > The preferable one for performance is certainly to backport the 2.4 changes > > Is it any more substantial than changing all uses of the ptrace flags > to the new variable? It affects asm blocks and offsets on some ports. Its not too bad tho - To unsub

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy
Alan Cox <[EMAIL PROTECTED]> writes: > The preferable one for performance is certainly to backport the 2.4 changes Is it any more substantial than changing all uses of the ptrace flags to the new variable? - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Alan Cox
> > child->flags |= PF_PTRACED; > > > > without waiting for the child to have stopped. > > I can see how this could case PF_USEDFPU to be cleared inadvertently, > but I do not have any ideas for testing this. Is it clear that this > is the source of the problem? There is no guarantee

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy
Linus Torvalds writes: > Ahh.. This actually _does_ look like a race on "current->flags": > PTRACE_ATTACH will do a > > child->flags |= PF_PTRACED; > > without waiting for the child to have stopped. I can see how this could case PF_USEDFPU to be cleared inadvertently, but I do not

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy
"Christian Ehrhardt" <[EMAIL PROTECTED]> writes: > Victor: Could you try to reproduce the system wide corruption if you > add an explicit call to stts(); at the very end of __switch_to? > This should prevent the FPU corruption from spreading. After adding this call, I cannot reproduce the global

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Christian Ehrhardt
On Tue, Apr 24, 2001 at 09:10:07AM -0700, Linus Torvalds wrote: > ptrace only operates on processes that are stopped. So there are no > locking issues - we've synchronized on a much higher level than a > spinlock or semaphore. This is only true for requests other than PTRACE_ATTACH and PTRACE_ATT

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Christian Ehrhardt
On Tue, Apr 24, 2001 at 08:05:15AM -0500, Victor Zandy wrote: > > He found that PF_USEDFPU was always set before the machine was broken. > After he found that it was set about 70% of the time. If I'm not mistaken this actully can cause GLOBAL FPU corruption. Here's why: Assyme for a moment that

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Linus Torvalds
[ Alan, I'm lazy and only have 2.2.14 sources on-line. Maybe this has been fixed already and there's something else going on. Worth a look ] In article <[EMAIL PROTECTED]>, Victor Zandy <[EMAIL PROTECTED]> wrote: > >Someone else here traced the process flags of a FP-intensive program >on a mac

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Alan Cox
> >1.) If I'm not mistaken switch_to changes current->flags without > >atomic operations and without any locks and sys_ptrace changes > >child->flags only protected by the big kernel lock. > > ptrace only operates on processes that are stopped. So there are no > locking issues - we've synchronize

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Linus Torvalds
In article <[EMAIL PROTECTED]>, Christian Ehrhardt <[EMAIL PROTECTED]> wrote: > >1.) If I'm not mistaken switch_to changes current->flags without >atomic operations and without any locks and sys_ptrace changes >child->flags only protected by the big kernel lock. ptrace only operates on processes

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread Victor Zandy
Someone else here traced the process flags of a FP-intensive program on a machine before and after it is put in the faulty FPU state. He periodically sampled /proc/pid/stat while the program was running. He found that PF_USEDFPU was always set before the machine was broken. After he found that

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread alad
oating point state with fnsave i.e. current->tss.i387 is 'invalid' after fnsave current->tss.i387 fwait; Thanks Amol David Konerding <[EMAIL PROTECTED]> on 04/23/2001 01:09:27 AM To: Ulrich Drepper <[EMAIL PROTECTED]> cc: [EMAIL PROTECTED], [EMAIL

Re: BUG: Global FPU corruption in 2.2

2001-04-24 Thread alad
int state with fnsave i.e. current->tss.i387 is 'invalid' after fnsave current->tss.i387 fwait; Thanks Amol David Konerding <[EMAIL PROTECTED]> on 04/23/2001 01:09:27 AM To: Ulrich Drepper <[EMAIL PROTECTED]> cc: [EMAIL PROTECTED], [EMAIL

Re: BUG: Global FPU corruption in 2.2

2001-04-23 Thread alad
Erik Paulson <[EMAIL PROTECTED]> on 04/24/2001 01:14:27 AM To: Christian Ehrhardt <[EMAIL PROTECTED]> cc: [EMAIL PROTECTED], [EMAIL PROTECTED] (bcc: Amol Lad/HSS) Subject: Re: BUG: Global FPU corruption in 2.2 On 23 Apr 2001 18:11:48 +0200, Christian Ehrhardt wro

Re: BUG: Global FPU corruption in 2.2

2001-04-23 Thread Erik Paulson
On 23 Apr 2001 18:11:48 +0200, Christian Ehrhardt wrote: > On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote: > > > > We have found that one of our programs can cause system-wide > > corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we > > run this program, the FPU giv

Re: BUG: Global FPU corruption in 2.2

2001-04-23 Thread Christian Ehrhardt
On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote: > > We have found that one of our programs can cause system-wide > corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we > run this program, the FPU gives bad results to all subsequent > processes. A few comments, not

Re: BUG: Global FPU corruption in 2.2

2001-04-22 Thread kees
Hello, Linux 2.2.19 SMP, confirm report. Even games are going weird after running this test, (my wife is complaining :-)) Have to reboot. Kees - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to [EMAIL PROTECTED] More majordomo info at http:/

Re: BUG: Global FPU corruption in 2.2

2001-04-22 Thread Alan Cox
> OK, regardless of how the linux kernel actually manages the FPU for user-space > > programs, does anybody have any comments on the original bugreport? Complete mystification. > >of pi begins to look wrong. Then kill everything and run pi by itself > >again. It will no longer produce good re

Re: BUG: Global FPU corruption in 2.2

2001-04-22 Thread David Konerding
Ulrich Drepper wrote: > "Richard B. Johnson" <[EMAIL PROTECTED]> writes: > > > The kernel doesn't know if a process is going to use the FPU when > > a new process is created. Only the user's code, i.e., the 'C' runtime > > library knows. > > Maybe you should try to understand the kernel code and

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Ulrich Drepper
"Richard B. Johnson" <[EMAIL PROTECTED]> writes: > The kernel doesn't know if a process is going to use the FPU when > a new process is created. Only the user's code, i.e., the 'C' runtime > library knows. Maybe you should try to understand the kernel code and the features of the processor first

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Victor Zandy
It looks to me like the kernel sets a trap for FP operations when a process is switched in. Then when the process executes an FP op, the kernel clears the trap and either loads the FP context or initializes it, depending on whether it is the process' first FP operation. So no help is need from

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Richard B. Johnson
On 20 Apr 2001, Victor Zandy wrote: > > No dice. Your program does not fix the problem. > > If it were a hardware problem, I would expect the problem to occur > under 2.4.2 as well as 2.2.*, and I would be surprised that we can > consistently produce the behavior across our 64 node cluster. B

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Richard B. Johnson
On 20 Apr 2001, Ulrich Drepper wrote: > "Richard B. Johnson" <[EMAIL PROTECTED]> writes: > > > If it "fixes" it, there is no problem with the FPU, but with the > > 'C' runtime library which doesn't initialize the FPU to a known > > state before it uses it. > > It's the kernel which initializes

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Ulrich Drepper
"Richard B. Johnson" <[EMAIL PROTECTED]> writes: > If it "fixes" it, there is no problem with the FPU, but with the > 'C' runtime library which doesn't initialize the FPU to a known > state before it uses it. It's the kernel which initializes the FPU. This was always the case and necessary to i

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Victor Zandy
No dice. Your program does not fix the problem. If it were a hardware problem, I would expect the problem to occur under 2.4.2 as well as 2.2.*, and I would be surprised that we can consistently produce the behavior across our 64 node cluster. But we are keeping the possibility in mind. Thank

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Richard B. Johnson
On 20 Apr 2001, Victor Zandy wrote: > > Victor Zandy <[EMAIL PROTECTED]> writes: > > We have found that one of our programs can cause system-wide > > corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we > > run this program, the FPU gives bad results to all subsequent > > proces

Re: BUG: Global FPU corruption in 2.2

2001-04-20 Thread Victor Zandy
Victor Zandy <[EMAIL PROTECTED]> writes: > We have found that one of our programs can cause system-wide > corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we > run this program, the FPU gives bad results to all subsequent > processes. We have now tested 2.4.2 and 2.2.19. 2.2.1

Re: BUG: Global FPU corruption in 2.2

2001-04-19 Thread Michal Jaegermann
On Thu, Apr 19, 2001 at 11:05:03AM -0500, Victor Zandy wrote: > > We have found that one of our programs can cause system-wide > corruption of the x86 FPU under 2.2.16 and 2.2.17. > > We see this problem on dual 550MHz Xeons with 1GB RAM. Hm, I started to wonder if this is not somewhat rel

BUG: Global FPU corruption in 2.2

2001-04-19 Thread Victor Zandy
We have found that one of our programs can cause system-wide corruption of the x86 FPU under 2.2.16 and 2.2.17. That is, after we run this program, the FPU gives bad results to all subsequent processes. We see this problem on dual 550MHz Xeons with 1GB RAM. We have 64 of these things, and we s