[ Email attempt #3 and counting... ]
Alexander Motin wrote:
Warner Losh wrote:
I don't suppose that your driver could cause the hardware to
interrupt after a little time? That would be more resource friendly...
Otherwise, 1ms is long enough that a msleep or tsleep would likely
work quite nicely.
It's not his driver, it's mine. Actually, unlike AHCI, this hardware
even has interrupt for ready transition (second, biggest of sleeps).
But
it is not used in present situation.
On Apr 11, 2011, at 1:43 PM, dieter...@engineer.com wrote:
FreeBSD 8.2 amd64 uniprocessor
kernel: siisch1: DISCONNECT requested
kernel: siisch1: SIIS reset...
kernel: siisch1: siis_sata_connect() calling DELAY(1000)
last message repeated 59 times
kernel: siisch1: SATA connect time=60ms status=00000123
kernel: siisch1: SIIS reset done: devices=00000001
kernel: siisch1: DISCONNECT requested
kernel: siisch1: SIIS reset...
kernel: siisch1: siis_sata_connect() calling DELAY(1000)
last message repeated 58 times
kernel: siisch1: SATA connect time=59ms status=00000123
...
kernel: siisch0: siis_wait_ready() calling DELAY(1000)
last message repeated 1300 times
kernel: siisch0: port is not ready (timeout 10000ms) status =
001f2000
Meanwhile, *everything* comes to a screeching halt. Device
drivers are locked out, and thus incoming data is lost.
Losing incoming data is unacceptable.
Need an alternative to DELAY() that does not lock out
other device drivers. There must be a way to reset one
bit of hardware without locking down the entire machine.
Hans Petter Selasky writes:
An alternative to DELAY() is the simplest solution. You probably
need
to do some redesign in the SCSI layer to find a better solution.
I keep coming back to the idea that a device driver for one
controller should not have to lock out *all* the hardware.
RS-232 locks out Ethernet. Disk drivers lock out Ethernet.
And so on. Why? Is there some fundamental reason that this
*has* to be? I thought the conversion from spl() to mutex()
was supposed to fix this?
I'm making progress on my project converting printf(9) calls
to log(9), and fixing some bugs along the way. Eventually I'll
have patches to submit. But this is really a workaround, not
a fix to the underlying problem.
Redesigning the SCSI layer sounds like a job for someone who took
a lot more CS classes than I did. /dev/brain returns ENOCLUE. :-(
CAM is not completely innocent in this situation indeed. CAM defines
XPT_RESET_BUS request as synchronous. It is not queued, and called
under
the SIM mutex lock. I don't think lock can be safely dropped in the
middle there.
Now I think that I could try to move readiness waiting out of the
siis_reset() to do it asynchronously. I'll think about it.
I've fixed this problem for ahci(4) in HEAD, there should be no sleeps
longer then 100ms now (typical 1-2ms).
With siis(4) the situation is different. There by default should be no
sleeps longer then 100ms (typical 1-2ms). Longer sleep means that
either
controller is not responding, or it can't establish link to device it
sees. I've reduced waiting timeout from 10s to 1s. It should improve
situation a bit, but I would look for the original problem cause. Have
you done something specific to trigger it? Are your drive/cables OK?
Thank you for your prompt attention to this problem, it is very much
appreciated. (losing data sucks)
However, 100 ms is still way too long. (assuming ms = milliseconds)
1 millisecond is dangerous, if Ethernet is locked out for approx 4
milliseconds there is guaranteed data loss. I'd like to see
something more like 100 microseconds worst case (for TCP). Closed
source closed hardware black box generates data, has a very small
output buffer, cannot be changed. In some cases it insists on using
UDP rather than TCP so dropping even a single packet screws up the
data. I have cranked the TCP and UDP receive buffer sizes way up,
I'm reading the ports at rtprio into a large buffer locked into main
memory, etc. etc. Most of the time it works.
But if a device driver takes too long, incoming Ethernet packets do
not get serviced in time, and I lose data. A device driver doing
printf(9) to the RS-232 console is too slow. Changing printf to
log(9) works around this. If a disk controller, port multiplier,
or disk has a hiccup, I lose data. Siis(4) is the current problem,
but IIRC I've had problems from ahci(4) and ata(4) in the past.
I'm currently using all three drivers.
Is there any way I can keep the Ethernet from being locked out
by other drivers?
_______________________________________________
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"