David Gwynne wrote:
On 29/11/2007, at 4:51 AM, Daniel Ouellet wrote:
David Gwynne wrote:
this diff cannot affect the behavior of your system. the code below
deals with domain validation on SPI mpi variants while the x4100 uses
SAS mpi. the code you patched isnt run on your machine.
Not sure I understand your statement, but as a test, I did exactly
that and I have no more crash at all.
the code in mpi.c you are modifying does not get run on the x4100s.
I am sure you know better then me for sure.
in my opinion the only way it can affect your systems is by causing
something to be moved around in memory, perhaps out of the way of
something else that is borked.
So, how would you suggest a way to try to trace it then?
I know that for sure, if I keep the writing under 425KB/sec, it doesn't
crash, but > then that, it does every time.
Even something as simple as when the transfer at 425KB/sec is going on,
if I only try to ssh as an example, I can, but as soon as I press return
and it save the log to the /var/log/authlog it will crash right away.
This is how close to the max writing speed to the drive it is.
Even echo 'test' >/tmp/test
when I do that transfer does it.
So, if things are moved of memory create the problem, then where can I
possible look?
To me, it sure look like somehow the drive fill a buffer, or something
like that as it is way to precise in speed if you want, to not be able
to be found.
After removing everything I possibly could for testing I was left only
with mpi driver that would be a logical place to look.
May well be else where, but then where?
I only know it is happening only on the amd64.mp kernel, but amd64, or
i38, or i386.mp, but they all use the same mpi driver.
So, it's causing me pain in trying to think of possible place to look.
acpi enable or disable doesn't make a difference either, so that can't
be that code either can it?
I really don't mind digging, but I am running out of ideas where to digg.
If you have a suggestion, I would be more then happy to try it.
Best,
Daniel