Greetings! I'll be doing channel bonding here soon on our cluster as well, and was thinking of packaging ifenslave. I've read somewhere that on redhat boxes, this tool is not necessary. Is this still the most 'modern' way of getting this done? In any case, I hope to have more information with which to help you soon.
Take care, Anders Lennartsson <[EMAIL PROTECTED]> writes: > Hi > > I'm setting up a Debian GNU/Linux based cluster, currently with 4 nodes, > each a PPro 200 :( but there may be more/other stuff coming :). > Considering the costs, we settled for Netgear 311 ethernet cards, for > which there is support in 2.4.x kernels. Patches for 2.2.x, > but since 2.4 is here... By the way I'm running unstable on these. > > Initially we have put 2 ethernet cards in each node, and today was spent > getting bonding to work. > This is supported in late 2.2.x kernels and 2.4.x of course. > But it was a bit tricky to find the correct ifenslave.c to compile and > use. > Once that was done (http://pdsf.nersc.gov/linux/), everything seemed to > work as planned after doing > > ifconfig bond0 192.168.1.x netmask 255.255.255.0 up > ./ifenslave bond0 eth0 > (bond0 gets the MAC adress from eth0) > ./ifenslave bond0 eth1 > > But when testing the setup by ftping a large file between two nodes, > each configured as above (x=101,103 respectively), > messages of the following type was output repeatedly on the console: > > ethX ... Something wicked happened! 0YYY > > X was 0 or 1 > YYY was one of 500, 700, 740, 749, 749 as far as I can tell > > Same thing happened when running NPtcp as package size came above a few > kbytes, speeds approx 50MBits per second. > > > I also tested the network cards eth0 to eth0 and eth1 to eth1 in normal > mode (no bonding) > with NPtcp and both lines asymptotically went up to some 89.7Mbits per > second. > By the way where are the last 10? > > Anyone got ideas as to the nature/solution of this problem? > I did locate the error string in drivers/net/natsemi.c in the function > netdev_error but I don't know what to make of it. > Does anyone have experience of this with for instance 3c905 which I in > my opinion is very stable etc? > It is also about three times more expensive which isn't that much for > one or two, although I could imagine substantial savings > for a large cluster. But if my hours are included ... > > > > Regards, > Anders > > > > > > PS Some detailed info: > > >From syslog, identifying network cards: (eth2 is for accessing from > outside the dedicated networks) > > Mar 1 21:30:53 beo101 kernel: > http://www.scyld.com/network/natsemi.html > Mar 1 21:30:53 beo101 kernel: (unofficial 2.4.x kernel port, version > 1.0.3, January 21, 2001 Jeff Garzik, Tjeerd Mulder) > Mar 1 21:30:53 beo101 kernel: eth0: NatSemi DP83815 at 0xc4800000, > 00:02:e3:03:da:87, IRQ 12. > Mar 1 21:30:53 beo101 kernel: eth0: Transceiver status 0x7869 > advertising 05e1. > Mar 1 21:30:53 beo101 kernel: eth1: NatSemi DP83815 at 0xc4802000, > 00:02:e3:03:de:43, IRQ 10. > Mar 1 21:30:53 beo101 kernel: eth1: Transceiver status 0x7869 > advertising 05e1. > Mar 1 21:30:53 beo101 kernel: eth2: NatSemi DP83815 at 0xc4804000, > 00:02:e3:03:dc:2c, IRQ 11. > Mar 1 21:30:53 beo101 kernel: eth2: Transceiver status 0x7869 > advertising 05e1. > > some lines of the wicked message: (above those are the two lines where > eth0 and eth1 are reported when ifenslave is run) > > Mar 1 21:30:56 beo101 /usr/sbin/cron[189]: (CRON) STARTUP (fork ok) > Mar 1 21:35:26 beo101 kernel: eth0: Setting full-duplex based on > negotiated link capability. > Mar 1 21:35:32 beo101 ntpd[182]: time reset -0.474569 s > Mar 1 21:35:32 beo101 ntpd[182]: kernel pll status change 41 > Mar 1 21:35:32 beo101 ntpd[182]: synchronisation lost > Mar 1 21:35:37 beo101 kernel: eth1: Setting full-duplex based on > negotiated link capability. > Mar 1 21:38:01 beo101 /USR/SBIN/CRON[211]: (mail) CMD ( if [ -x > /usr/sbin/exim -a -f /etc/exim.conf ]; then /usr/sbin/exim -q >/dev/null > 2>&1; fi) > Mar 1 21:39:49 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:04 beo101 kernel: eth0: Something Wicked happened! 0700. > Mar 1 21:40:08 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:08 beo101 kernel: eth0: Something Wicked happened! 0700. > Mar 1 21:40:12 beo101 last message repeated 2 times > Mar 1 21:40:12 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:13 beo101 last message repeated 2 times > Mar 1 21:40:15 beo101 kernel: eth0: Something Wicked happened! 0700. > Mar 1 21:40:16 beo101 kernel: eth0: Something Wicked happened! 0700. > Mar 1 21:40:18 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:19 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:19 beo101 kernel: eth0: Something Wicked happened! 0700. > Mar 1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:20 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:21 beo101 kernel: eth0: Something Wicked happened! 0700. > Mar 1 21:40:22 beo101 last message repeated 3 times > Mar 1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700. > Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0700. > Mar 1 21:40:22 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0500. > Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740. > Mar 1 21:40:22 beo101 kernel: eth0: Something Wicked happened! 0740. > Mar 1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0700. > Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0740. > Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0740. > Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0500. > Mar 1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0500. > Mar 1 21:40:23 beo101 kernel: eth0: Something Wicked happened! 0700. > Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700. > Mar 1 21:40:23 beo101 kernel: eth1: Something Wicked happened! 0700. > > > The result of ifconfig: > > bond0 Link encap:Ethernet HWaddr 00:02:E3:03:DA:87 > inet addr:192.168.1.101 Bcast:192.168.1.255 > Mask:255.255.255.0 > UP BROADCAST RUNNING MASTER MULTICAST MTU:1500 Metric:1 > RX packets:0 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1834429 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:0 (0.0 b) TX bytes:986886789 (941.1 Mb) > > eth0 Link encap:Ethernet HWaddr 00:02:E3:03:DA:87 > inet addr:192.168.1.101 Bcast:192.168.1.255 > Mask:255.255.255.0 > UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 > RX packets:907798 errors:0 dropped:0 overruns:0 frame:0 > TX packets:915439 errors:1776 dropped:0 overruns:1776 > carrier:1776 > collisions:0 txqueuelen:100 > RX bytes:435552233 (415.3 Mb) TX bytes:491795214 (469.0 Mb) > Interrupt:12 > > eth1 Link encap:Ethernet HWaddr 00:02:E3:03:DA:87 > inet addr:192.168.1.101 Bcast:192.168.1.255 > Mask:255.255.255.0 > UP BROADCAST RUNNING SLAVE MULTICAST MTU:1500 Metric:1 > RX packets:907768 errors:0 dropped:0 overruns:0 frame:0 > TX packets:915466 errors:1748 dropped:0 overruns:1748 > carrier:1748 > collisions:0 txqueuelen:100 > RX bytes:434992308 (414.8 Mb) TX bytes:489766183 (467.0 Mb) > Interrupt:10 Base address:0x2000 > > eth2 Link encap:Ethernet HWaddr 00:02:E3:03:DC:2C > inet addr:150.227.64.210 Bcast:150.227.64.255 > Mask:255.255.255.0 > UP BROADCAST RUNNING MTU:1500 Metric:1 > RX packets:13122 errors:0 dropped:0 overruns:0 frame:0 > TX packets:1182 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:100 > RX bytes:1032660 (1008.4 Kb) TX bytes:943713 (921.5 Kb) > Interrupt:11 Base address:0x4000 > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > UP LOOPBACK RUNNING MTU:3904 Metric:1 > RX packets:8 errors:0 dropped:0 overruns:0 frame:0 > TX packets:8 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:552 (552.0 b) TX bytes:552 (552.0 b) > > > -- > To UNSUBSCRIBE, email to [EMAIL PROTECTED] > with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED] > > > -- Camm Maguire [EMAIL PROTECTED] ========================================================================== "The earth is but one country, and mankind its citizens." -- Baha'u'llah