Hi, I've added some comments and questions inline. Cheers, Dan
On 8 February 2011 10:00, Jonathan Ellis <jbel...@gmail.com> wrote: > On Mon, Feb 7, 2011 at 1:51 AM, TSANG Yiu Wing <ywts...@gmail.com> wrote: > > cassandra version: 0.7 > > > > client library: scale7-pelops / 1.0-RC1-0.7.0-SNAPSHOT > > > > cluster: 3 machines (A, B, C) > > > > details: > > it works perfectly when all 3 machines are up and running > > > > but if the seed machine is down, the problems happen: > > > > 1) new client connection cannot be established > > sounds like pelops relies on the seed node to introduce it to the > cluster. you should configure it either with a hardcoded list of > nodes or use something like RRDNS instead. I don't use pelops so I > can't help other than that. (I believe there is a mailing list for > Pelops though.) > When dynamic node discovery is turned on (off by default) it doesn't (shouldn't) rely on the initial seed node once past initialization. So either make sure you have dynamic node discovery turned on or seed Pelops with all nodes in your cluster... It would be helpful if you provided more information about the errors you're seeing preferably with debug level logging turned on. > > > 2) if a client keeps connecting to and operating at (issue get and > > update) the cluster, when the seed is down, the working client will > > throw exception upon the next operation > > I know Hector supports transparent failover to another Cassandra node. > Perhaps Pelops does not. > Pelops will validate connections at a configurable period (60 seconds by default) and remove them from the pool. Pelops will also retry the operation three times (configurable) against a different node in the pool each time. If you want Pelops to take more agressive actions when it detects downed nodes then check out org.scale7.cassandra.pelops.pool.CommonsBackedPool.INodeSuspensionStrategy. > > > 3) using cassandra-cli to connect the remaining nodes in the cluster, > > "Internal error processing get_range_slices" will happen when querying > > column family > >> list <cf>; > > Cassandra always logs the cause of internal errors in system.log, so > you should look there. > > -- > Jonathan Ellis > Project Chair, Apache Cassandra > co-founder of DataStax, the source for professional Cassandra support > http://www.datastax.com >