On Sun, 2006-01-01 at 18:06 +0000, Jason George wrote:

> First, define the context of "great instability".  Within the Cisco context?
> The Linux LVS context?  The CARP context?  Overall?

The Cisco and CARP context.  Primarily noticed was that one of the three
catalysts did appear to reboot or disappear temporarily.  Additionally
one of the original fw's had a kernel panic.  Memory was tested and came
up clean, and this is one of two identical machines that been previously
running CARP for in excess of a year and a half.  The Cisco "ring" or
"cluster" is setup to perform logging to a remove server, and yet during
these periods of great instability the switch reports nothing and
appears to be oblivious to the goings on around or inside of it.

Additionally during the "great instability" on a separate occaison that
the CARP/DNS pair were reintroduced, one of them managed to kernel
panic.  Against both units are new with good hardware and tested
memory.

The existing fw's use separate vhid's and behave well.  The introduced
CARP/DNS pair also have unique vhid's to keep things clear.  All boxes
employing CARP are running 3.8.

The LVS boxes have never misbehaved, I only mentioned their existence
because I felt it warranted raising as an informative fact about the
network setup.

The CARP misbehaviours noticed, were seeing the existing fw's become
confused and both become master.  This was prior to ifstated being
employed so now at least something can be done in the event this
happens, its just that this behaviour obviously shouldn't be happening.

> Does this old thread makes sense?
> http://www.monkey.org/openbsd/archive/misc/0410/msg00867.html

Yeah it does.  Our upstream provider employs HSRP and advertisements
from this bleed down into our network.

> Anecdote:  A few years ago, a large clustered Solaris environment I worked
on
> started crashing when additional independent clusters were added.  The
cluster
> nodes talked via multicast.  For a month, the network guys kept claiming
"but
> the VLANs are private!" as an excuse and the server group retorted "but
> clearly the VLANs are not multicast-private!"  When the mishandling of
> multicast on the 6513s was finally determined to be due to pilot error, the
> server team was able to stop using 2924s as cluster interconnects.

Interesting.  Thank you for sharing your experience.  Its in the general
area of the problem(s) I'm seeing, but I'm still interested in hearing
from someone who perhaps has a similar setup.

Thanks again.

Cheers,

James

--
James Couzens,
Programmer

-----------------------------------------------------------------
PGP: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0x7A7C7DCF

"This is not quite as crazy as it sounds, since people knew how
 to write small, efficient programs in those days, a skill that
 has subsequently been lost." -- Andrew S. Tanenbaum

[demime 1.01d removed an attachment of type application/pgp-signature which had 
a name of signature.asc]

Reply via email to