RE: Silent corruption on AMD64

2007-04-02 Thread Stuart MacDonald
From: On Behalf Of Aaron Lehmann > I've been able to narrow it down to the Realtek Ethernet card. I can't > reproduce the problem using onboard Ethernet, whereas the Realtek card > causes trouble in any slot. However, I still don't know whether it's a > hardware or software issue, or whether it's c

Re: Silent corruption on AMD64

2007-04-01 Thread Andi Kleen
Aaron Lehmann <[EMAIL PROTECTED]> writes: [adding netdev] [meta-comment: I wish people wouldn't use such unnecessarily broad subjects -- how is it the x86-64 port's or AMD's fault when you have broken hardware? Would anybody write "Silent corruption on i386" or "Silent corruption on Intel" or "

Re: Silent corruption on AMD64

2007-03-31 Thread Aaron Lehmann
On Sat, Mar 31, 2007 at 08:03:16PM -0700, Jim Paris wrote: > Since it shows up under heavy load that includes unrelated devices, I > think ruling out hardware problems is important. Some suggestions: I've been able to narrow it down to the Realtek Ethernet card. I can't reproduce the problem usin

Re: Silent corruption on AMD64

2007-03-31 Thread Jim Paris
Aaron Lehmann wrote: > I discovered a reproducible way of causing silent file corruption. ... > 1. Heavy Ethernet load (nc remotehost < /dev/zero) > 2. Heavy disk write load on any non-sata_sil drive (cat /dev/zero > /path) > 3. Heavy disk read load on any other drive (tar c /path | cat > /dev/null

Re: Silent corruption on AMD64

2007-03-31 Thread Aaron Lehmann
On Sat, Mar 31, 2007 at 07:52:36PM -0700, Andrew Morton wrote: > Are you able to provide us with some before-and-after data so we > can see this corruption. > > See, if it's dropped-bits or shifted-data or eight-byte-aligned > kernel addresses or whatever, that helps us generate theories.. Sure.

Re: Silent corruption on AMD64

2007-03-31 Thread Andrew Morton
> On Sat, 31 Mar 2007 18:27:36 -0700 Aaron Lehmann <[EMAIL PROTECTED]> wrote: > I have spent a lot of time trying to find a simpler test case. So far, > as far as I can tell, there are three conditions that must be > satisfied for corruption to occur: > > 1. Heavy Ethernet load (nc remotehost < /d