> Hi all, > > please excuse (and ignore) this mail when you think its not appropriate for > this list or to faq. > > We had our servers all connected via one gigabit switch and used bonds to > have 2GB links for each of them (using drbd and pacemaker/corosync to keep > our data distributed and services/machines up and running). > As the switch constitudes a SPOF, we wanted to eliminate this and put a > second GB-switch into the rack. > Now I/we can't use the real bonding-modes anymore, only fail-over, tlb and > alb. We don't really like the idea of fail-over because that means going > back to 1GB data-rates. Using tlb we get nearly 2GBits total rates with > 1GB per connection so that looks nice throughput wise. But for simple > icmp-pings, 50-90% of pings are lost propably due to the switches > re-learning the mac- addresses all the time. Also some tcp-connections > seem to stall due to this. Not really a nice situation when > desktop-virtualization and terminal servers are used in this scenario. > > My questions: > Is there something obvious I missed in the above configuration?(*) > Would it improve the situation stability- and performance-wise when I use > bridges instead of bonds to connect to the switches and let stp do its job? > Would that work with clusters and drbd? > Obviously the cleanest solution would be to use two stackable switches and > make sure that they still do their job when one fails. But that is out of > question due to the prices attached to the switches.. > > Thanks for your input on this and have a nice remaining weekend, > > Arnold > > (*) I haven't yet looked into the switches configuration if they have > special options for such a scenario...
Hi, please look of your switches are 802.3ad compatible. If the switches do not support 802.3ad over stacks you have to stay with active passive. But even load balancing does not provide for real 50-50 traffic distribution. Traffic is loadbalanced according to MAC addresses or to layer3/4 parameters (IP address and tcp port). But sure what the setup in a mostly switched environment or a where the traffic comes in via one router. See: http://www.linuxfoundation.org/collaborate/workgroups/networking/bonding Be sure that the switch speaks 802.3ad over switch stacks. Cisco Nexus does vitural port channel, but that is perhaps not the cheapest option. Be sure what mode for traffic distribution the switch supports. Please check modes 5 or more 6 carefully. Perhaps there are some problems using different switches. tcpdump is your friend. Please also check if line errors are recognized reliably by the bonding module. Considering all the said, think about sticking with plain active-backup! Do you really need more that 1 Gbit/s? Linux bridging module only speaks spanning tree, no rapid spanning tree. So you have outages of 30 seconds. This is too long for the most cluster applications like DRBD, corosync, etc. You would have to tune the options there. Greetings, -- Dr. Michael Schwartzkopff Guardinistr. 63 81375 München Tel: (0163) 172 50 98 Fax: (089) 620 304 13
signature.asc
Description: This is a digitally signed message part.
_______________________________________________ Pacemaker mailing list: Pacemaker@oss.clusterlabs.org http://oss.clusterlabs.org/mailman/listinfo/pacemaker Project Home: http://www.clusterlabs.org Getting started: http://www.clusterlabs.org/doc/Cluster_from_Scratch.pdf Bugs: http://bugs.clusterlabs.org