On Oct 24, 2008, at 12:10 PM, V. Ram wrote:
Resuscitating this thread...
Well, we spent some time testing the various options, and Leonardo's
suggestion seems to work!
We disabled TCP Segment Offloading on the e1000 NICs using "ethtool -K
eth tso off" and this type of crash no longer happens.
Resuscitating this thread...
Well, we spent some time testing the various options, and Leonardo's
suggestion seems to work!
We disabled TCP Segment Offloading on the e1000 NICs using "ethtool -K
eth tso off" and this type of crash no longer happens.
I hope this message can help anyone else exper
On Oct 10, 2008, at 12:42 PM, V. Ram wrote:
Can anyone else suggest why the code might be crashing when running
over
ethernet and not over shared memory? Any suggestions on how to debug
this or interpret the error message issued from btl_tcp_frag.c ?
Unfortunately this is a standard error
Leonardo,
These nodes are all using intel e1000 chips. As the nodes are AMD
K7-based, these are the older chips, not the new ones with all the
eeprom issues with the newer kernel.
The kernel in use is from the 2.6.22 family, and the e1000 driver is the
one shipped with the kernel. I am running
Sorry for replying to this so late, but I have been away. Reply
below...
On Wed, 1 Oct 2008 11:58:30 -0400, "Aurélien Bouteiller"
said:
> If you have several network cards in your system, it can sometime get
> the endpoints confused. Especially if you don't have the same number
> of cards or
Ram,
What is the name and version of the kernel module for your NIC? I have
experimented some similar with my tg3 module. The error which appeared
for my was different:
[btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv
failed: No route to host (113)
I solved it changi
If you have several network cards in your system, it can sometime get
the endpoints confused. Especially if you don't have the same number
of cards or don't use the same subnet for all "eth0, eth1". You should
try to restrict Open MPI to use only one of the available networks by
using the -