David Gwynne wrote:
this diff cannot affect the behavior of your system. the code below
deals with domain validation on SPI mpi variants while the x4100 uses
SAS mpi. the code you patched isnt run on your machine.
Not sure I understand your statement, but as a test, I did exactly that
and I have no more crash at all.
do you have these crashes on all x4100s running amd64 mp, or only on
this one machine?
All of them. I have a few treads on misc on the subject, and send lots
of stats and open a bug as well that doesn't show up yet, but yes that's
100% reproducibles. As soon as I put the data to > 420KB/sec using scp
as a way to limit the transfer to the drive for lack of a petter way, I
crash it.
It's totally impossible to do something as simple as
dd if=/dev/zero of=/var/test bs=1m count=100
without crashing. I can do may be a count of 5 at time as long as I have
nothing else writing to the drive, but that's about it. There is a
buffer someplace that fill up or something and then the server
crash/reboot right away.
But, now, if I force the mpi.c drive to work in U80 mode, I have no more
of these and I can do
dd if=/dev/zero of=/var/test bs=1m count=10000
Without a problem, access, write multiple times, do multiple read/write,
etc and it simply do not crash at all anymore.
And I have 4 of them, all doing the exact same thing and I am not the
only one either, last week someone from Rutger also posted to tech@ that
I reply and in his case, it was working as long as he was writing to nfs
remote server.
The bottom line is you can read, but as long as you try to write to the
drives, as soon as you send a payload that is a little big in size and
that fill up a buffer I guess somehow in the driver, or that you
definitely write constantly to the drive at hight speed, you crash it
right away.
I try to dig as much as I can in the kernel, but honestly, I would love
and like to find the problem, but I am reaching my understanding of the
various parts in the kernel and how the deal with one an other right now.
I can spend a few eek learning more and may be fine it over time for
sure, I was hoping that someone that understand the driver much better
then me, might be able to put lights into this, or if no time allow,
then explain it a but more and I can try to continue more.
I would be more then happy to fix it believe me. I am putting the time
and the effort in it and I definitely have a few treads on this and keep
making progress weeks after weeks, but I need some help to go ahead more
however, or to do it in a more timely fashion. I give up using it months
ago in productions, but never give up at trying to fix it and find the
problem, it;s just a very slow process as I need to learn a lots of
stuff in the process, with I am happy to do and it's fun, but some help
would be very welcome right now. I am however getting very close I
think, but again, I could be wrong.
Best,
Daniel