Re: Cluster fragility

Dave Gardner Fri, 12 Nov 2010 00:19:11 -0800

We never have to reboot our production cluster. However we're not
running a beta version but a release version (0.6.6). If your aim is
to avoid fragility, it would seem sensible to run a release version as
a good starting point.


dave

On Friday, November 12, 2010, Reverend Chip <[email protected]> wrote:
> I've been running tests with a first four-node, then eight-node
> cluster.  I started with 0.7.0 beta3, but have since updated to a more
> recent Hudson build.  I've been happy with a lot of things, but I've had
> some really surprisingly unpleasant experiences with operational fragility.
>
> For example, when adding four nodes to a four-node cluster (at 2x
> replication), I had two nodes that insisted they were streaming data,
> but no progress was made in the stream for over a day (this was with
> beta3).  I had to reboot the cluster to clear that condition.  For the
> purpose of making progress on other tests I decided just to reload the
> data at eight-wide (with the more recent build), but if I had data I
> couldn't reload or the cluster were serving in production, that would
> have been a very inconvenient failure.
>
> I also had a node that refused to bootstrap immediately, but after I
> waited a day, it finally got its act together.
>
> I write this, not to complain per se, but to ask whether these failures
> are known & expected, and rebooting a cluster is just a Thing You Have
> To Do once in a while; or if not, what techniques can be used to clear
> such cluster topology and streaming/replication problems without rebooting.
>
>

-- 
*Dave Gardner*
Technical Architect

[image: imagini_58mmX15mm.png]   [image: VisualDNA-Logo-small.png]

*Imagini Europe Limited*
7 Moor Street, London W1D 5NB

[image: phone_icon.png] +44 20 7734 7033
[image: skype_icon.png] daveg79
[image: emailIcon.png] [email protected]
[image: icon-web.png] http://www.visualdna.com

Imagini Europe Limited, Company number 5565112 (England
and Wales), Registered address: c/o Bird & Bird,
90 Fetter Lane, London, EC4A 1EQ, United Kingdom

Re: Cluster fragility

Reply via email to