Paul Gortmaker wrote:
>
> Jorge Nerin wrote:
>
> >
> > Ok, I reported it several times, but it gets ignored. I have a Realtek
> > 8029 (ne2k-pci), and with both drivers ne and ne2k-pci I can easily get
> > it stuck by doing a ping -f to a host in the local net, and sometimes it
> > happens doing copies to/from nfs shared resources.
> >
> > rmmod & insmod don't cure the problem, it seems that no interrupts are
> > delivered from the card, and there are no log messages, so a reboot is
> > needed to restore net access.
> >
> > System is dual 2x200mmx 96Mb ide discs no interrupts shared, and as far
> > as I can remember all kernel from 2.2.x, 2.3.x up to 2.4.0-testx exhibit
> > this problem.
>
> Any messages from the driver whatsoever? Does running a non-SMP
> kernel make the problem go away?
>
> Paul.
>
Well, I have tried it with 2.4.0-test10, both SMP and non-SMP, and the
result is a little confusing.
Under SMP a ping -s 50000 -f other_host takes down the network access
with no messages (ne2k-pci), and no possibility of being restored
without a reboot.
Under UP the same command works ok, but after a while the dots stop for
30sec, then ping prints an 'E' and the dots continue. strace revealed
this:
sendmsg(4, {msg_name(16)={sin_family=AF_INET, sin_port=htons(0),
sin_addr=inet_addr("192.168.1.20")}},
msg_iov(1)=[{"\10\0\305~|\23\231\0\v\317\2:\177\236\r\0\10\t\n\v\f\r"...,
50008}], msg_controllen=0, msg_flags=0}, 0x800) = 50008 <30.016855>
ping has been waiting for sendmsg to end for 30 seconds! I don't know if
this could be caused by filling the network buffers, and then waiting a
while while the nic sends it out. As the packet size increases (the -s
option) the stops are more frequent, there is still activity on the
network, but I don't know if they are my packets or the replies.
When ping is stopped it's stuck in sock_wait_for_wmem, and when it's
running it's (usually) in wait_for_packet.
So I think that it could be a little window near sock_wait_for_wmem that
could be SMP insecure wich is affecting me.
The code of sock_wait_for_wmem in 2.4.0-test10 is this:
static long sock_wait_for_wmem(struct sock * sk, long timeo)
{
DECLARE_WAITQUEUE(wait, current);
clear_bit(SOCK_ASYNC_NOSPACE, &sk->socket->flags);
add_wait_queue(sk->sleep, &wait);
for (;;) {
if (signal_pending(current))
break;
set_bit(SOCK_NOSPACE, &sk->socket->flags);
set_current_state(TASK_INTERRUPTIBLE);
if (atomic_read(&sk->wmem_alloc) < sk->sndbuf)
break;
if (sk->shutdown & SEND_SHUTDOWN)
break;
if (sk->err)
break;
timeo = schedule_timeout(timeo);
}
__set_current_state(TASK_RUNNING);
remove_wait_queue(sk->sleep, &wait);
return timeo;
}
Does someone see something SMP insecure? Perhaps I'm totally wrong, this
could also be somewhere in the interrupt handling, don't know.
--
Jorge Nerin
<[EMAIL PROTECTED]>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Please read the FAQ at http://www.tux.org/lkml/