David Gwynne wrote:
this diff cannot affect the behavior of your system. the code below deals with domain validation on SPI mpi variants while the x4100 uses SAS mpi. the code you patched isnt run on your machine.

Not sure I understand your statement, but as a test, I did exactly that and I have no more crash at all.

do you have these crashes on all x4100s running amd64 mp, or only on this one machine?

All of them. I have a few treads on misc on the subject, and send lots of stats and open a bug as well that doesn't show up yet, but yes that's 100% reproducibles. As soon as I put the data to > 420KB/sec using scp as a way to limit the transfer to the drive for lack of a petter way, I crash it.


It's totally impossible to do something as simple as

dd if=/dev/zero of=/var/test bs=1m count=100

without crashing. I can do may be a count of 5 at time as long as I have nothing else writing to the drive, but that's about it. There is a buffer someplace that fill up or something and then the server crash/reboot right away.

But, now, if I force the mpi.c drive to work in U80 mode, I have no more of these and I can do

dd if=/dev/zero of=/var/test bs=1m count=10000

Without a problem, access, write multiple times, do multiple read/write, etc and it simply do not crash at all anymore.

And I have 4 of them, all doing the exact same thing and I am not the only one either, last week someone from Rutger also posted to tech@ that I reply and in his case, it was working as long as he was writing to nfs remote server.

The bottom line is you can read, but as long as you try to write to the drives, as soon as you send a payload that is a little big in size and that fill up a buffer I guess somehow in the driver, or that you definitely write constantly to the drive at hight speed, you crash it right away.

I try to dig as much as I can in the kernel, but honestly, I would love and like to find the problem, but I am reaching my understanding of the various parts in the kernel and how the deal with one an other right now.

I can spend a few eek learning more and may be fine it over time for sure, I was hoping that someone that understand the driver much better then me, might be able to put lights into this, or if no time allow, then explain it a but more and I can try to continue more.

I would be more then happy to fix it believe me. I am putting the time and the effort in it and I definitely have a few treads on this and keep making progress weeks after weeks, but I need some help to go ahead more however, or to do it in a more timely fashion. I give up using it months ago in productions, but never give up at trying to fix it and find the problem, it;s just a very slow process as I need to learn a lots of stuff in the process, with I am happy to do and it's fun, but some help would be very welcome right now. I am however getting very close I think, but again, I could be wrong.

Best,

Daniel

Reply via email to