Is this possibly related to http://lkml.org/lkml/2010/2/12/41 ?
If so, there seems to be a patch available.
--
Stephen Mulcahy, DI2, Digital Enterprise Research Institute,
NUI Galway, IDA Business Park, Lower Dangan, Galway, Ireland
http://di2.deri.iehttp://webstar.deri.iehttp
Package: linux-2.6
Version: 2.6.32-21
Severity: important
I've seen this problem on a number of machines with different processor arch
(intel and amd). I guess it can be triggered by more than just dpkg but this is
the case I've seen most.
dmesg contains the following
[60600.580047] INFO: tas
Ayaz Abdulla wrote:
Attached fix has been submitted to netdev.
I've run my reproducer with this patch applied to be Debian 2.6.32
kernel and so far the problem with nodes becoming unresponsive hasn't
occurred.
NIC settings were left the default so this looks positive
r...@node23:~# ethtool
Ayaz Abdulla wrote:
This patch fixes the TX_LIMIT feature flag. The previous logic check for
TX_LIMIT2 also took into account a device that only had TX_LIMIT set.
Signed-off-by: Ayaz Abdulla
This is a fix for bug 572201 @ bugs.debian.org
Hi,
Thanks! I'll rebuild my Debian kernel with this
Eric Dumazet wrote:
OK, thanks for clarification.
Last question, did you tried a vanilla kernel, aka 2.6.33.2 for
example ?
I built a Debian package from the vanilla 2.6.33.2 and installed that on
all nodes and tried my reproducer with the same results - nodes becoming
unresponsive.
I didn
Eric Dumazet wrote:
I am scratching my head, but I thought you told me that
ethtool -K eth0 tso off
ethtool -K eth0 tx on
was working ?
No, sorry for the confusion.
ethtool -K eth0 tx off
fixes the problem.
Setting only
ethtool -K eth0 tso off
ethtool -K eth0 tx on
still results in f
Hi Martin,
Just came across a similar bug you logged a while back - thought you
might be interested.
-stephen
--
Stephen Mulcahy Atlantic Linux http://www.atlanticlinux.ie
Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway)
--- Begin Message ---
Eric Dumazet wrote
stephen mulcahy wrote:
Now some brave fouls to check the 6410 lines of this driver ? ;)
Question of the day : Why TSO is broken in forcedeth ?
Is it generically broken or is it broken for specific NICS ?
Actually, it is only when tx-checksumming is turned off that the problem
doesn't
Eric Dumazet wrote:
Le mardi 13 avril 2010 à 15:27 +0100, stephen mulcahy a écrit :
Ok, I've tried both of the following with my reproducer
1. ethtool -K eth0 tso off
RESULT: reproducer causes multiple hosts to be come unresponsive on
first run.
2. ethtool -K eth0 tx off
R
Ok, I've tried both of the following with my reproducer
1. ethtool -K eth0 tso off
RESULT: reproducer causes multiple hosts to be come unresponsive on
first run.
2. ethtool -K eth0 tx off
RESULT: reproducer runs three times without any hosts becoming unresponsive.
-stephen
--
To UNSUBSCR
Eric Dumazet wrote:
Le mardi 13 avril 2010 à 11:03 +0100, stephen mulcahy a écrit :
Eric Dumazet wrote:
OK it seems forcedeth has problem with checksums ?
Try to change "ethtool -k eth0" settings ?
ethtool -K eth0 tso off tx off
Yes, that makes an unresponsive system respon
Eric Dumazet wrote:
OK it seems forcedeth has problem with checksums ?
Try to change "ethtool -k eth0" settings ?
ethtool -K eth0 tso off tx off
Yes, that makes an unresponsive system responsive again immediately, nice!
Should the driver default to disabling this until we problem is correcte
Eric Dumazet wrote:
Le lundi 12 avril 2010 à 14:19 +0100, stephen mulcahy a écrit :
Do you have some netfilters rules ?
Hi Eric,
I don't have any netfilters rules:
r...@node34:~# for table in filter nat mangle raw; do iptables -t $table
-L; done
Chain INPUT (policy ACCEPT)
t
stephen mulcahy wrote:
Are both way non functional (RX and TX), or only one side ?
Whats the best way of testing this? (tcpdump listening on both hosts and
then running pings between the systems?)
stephen mulcahy wrote:
>> Are both way non functional (RX and TX), or only on
Eric Dumazet wrote:
Le lundi 12 avril 2010 à 13:39 +0100, stephen mulcahy a écrit :
I am not sure I understand. Are you saying that using 2.6.30-2-amd64
kernel also makes your forcedeth adapter being not functional ?
Hi Eric,
If I run my tests with the 2.6.30-2-amd64 kernel the network
stephen mulcahy wrote:
It doesn't - further testing over the weekend saw 6 of 45 machines drop
off the network with this problem. Nothing in dmesg or system logs.
Happy to run more tests if someone can advise on what should be run.
I also just tried using the 2.6.30-2-amd64 (Debian) forc
Ben Hutchings wrote:
Stephen Mulcahy reported a regression in forcedeth at
<http://bugs.debian.org/572201>. The system information and some
diagnostic information can be found there. Anyone able to help?
Incidentally, I also tried the 2.6.33.2 kernel with
CONFIG_FORCEDETH_NAPI set to
patch? Will there be any further updates to 2.6.30?
-stephen
--
Stephen Mulcahy Atlantic Linux http://www.atlanticlinux.ie
Registered in Ireland, no. 376591 (144 Ros Caoin, Roscam, Galway)
--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "u
Ben Hutchings wrote:
On Tue, 2010-03-16 at 10:33 +, stephen mulcahy wrote:
[...]
We will shortly update the official kernel packages to incorporate this
release, so you could just wait a day or two and update. However I'm not
aware of any changes in 2.6.32.10 that would fix this so
Forgot to add - this seems related to the closed #516374
--
To UNSUBSCRIBE, email to debian-kernel-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4bb31ac2.30...@atlanticlinux.ie
Ben Hutchings wrote:
On Mon, Mar 15, 2010 at 05:20:32PM +, stephen mulcahy wrote:
All pause frames should be dropped, either by the hardware or the driver.
So it's not unexpected that these are equal.
Ok, thanks for the clarification.
It might be interesting to see what happens i
bers to be the same?
As I said, I'm not seeing the behaviour with the 2.6.30 kernel - so
wondering what has changed.
I see Linux 2.6.32.10 was just released, is it worth my while building
that and seeing if I can reproduce the problem?
-stephen
--
Stephen Mulcahy Atlantic Lin
: 0
rx_pause: 46798
rx_drop_frame: 46798
tx_unicast: 2284
tx_multicast: 3008
tx_broadcast: 16510200339
If I ifdown eth0 and then ifup eth0, I can again connect to the system
without problems.
Thanks,
-stephen
--
Stephen Mulcahy, DI2, Digital Enterprise Research In
o see the kernel log (output from dmesg) after this happens,
even if you can't spot anything in it.
The device statistics (output from ethtool -S eth0) might also be
informative.
Ok, will post both of those when I manage to reproduce the problem again.
-stephen
--
Stephen Mulcahy, DI
image-2.6.32-trunk-amd64/postinst/depmod-error-initrd-2.6.32-trunk-amd64:
false
linux-image-2.6.32-trunk-amd64/prerm/removing-running-kernel-2.6.32-trunk-amd64:
true
linux-image-2.6.32-trunk-amd64/postinst/bootloader-test-error-2.6.32-trunk-amd64:
linux-image-2.6.32-trunk-amd64/postinst/
25 matches
Mail list logo