Re: Restarting cluster

Jonathan Ellis Fri, 24 Jun 2011 06:43:45 -0700

Did you try netcat to verify that you can get to the internal port on
machine X from machine Y?


On Fri, Jun 24, 2011 at 8:20 AM, David McNelis
<dmcne...@agentisenergy.com> wrote:
> Running on Centos.
> We had a massive power failure and our UPS wasn't up to 48 hours without
> power...
> In this situation the IP addresses have all stayed the same.  I can still
> connect to the "other" node from cli, so I don't think its an issue where
> the iptables settings weren't saved and started blocking traffic.
> In terms of the log files, the only related line from the log files is
> saying:
>  INFO [main] 2011-06-24 07:48:44,750 StorageService.java (line 382) Loading
> persisted ring state
>  INFO [main] 2011-06-24 07:48:44,757 StorageService.java (line 418) Starting
> up server gossip
> When I turn on debugging and restart the non-seed node I get this line:
> DEBUG [WRITE-/192.168.80.XXX] 2011-06-24 08:04:48,798
> OutboundTcpConnection.java (line 161) attempting to connect to
> /192.168.80.XXX
> But no errors after it.
>
> On Fri, Jun 24, 2011 at 7:58 AM, Sasha Dolgy <sdo...@gmail.com> wrote:
>>
>> Normally, no.  What you've done is fine.  What is the environment?
>>
>> On amazon EC2 for example, the instance could have crashed, a new one
>> is brought online and has a different internal IP ...
>>
>> in the cassandra/logs/system.log are there any messages on the 2nd
>> node and how it relates to the seed node?
>>
>> On Fri, Jun 24, 2011 at 2:49 PM, David McNelis
>> <dmcne...@agentisenergy.com> wrote:
>> > I am running 0.8.0 on CentOS.  I have a 2 nodes in my cluster, one is a
>> > seed, the other is autobootstrapped.
>> > After having an unexpected shutdown of both of the physical machines I
>> > am
>> > trying to restart the cluster.  I first started the seed node, it went
>> > through the normal startup process and finished without error.  Once
>> > that
>> > was complete I started the second node, again no errors in the log as it
>> > was
>> > starting, it started the gossip server, ect.
>> > However when I look at the ring using nodetool, both machines  show
>> > their
>> > own status as up, then show the other machine as Down with a state of
>> > Normal
>> > and a load of ?.  I have tried restarting the individual nodes in
>> > different
>> > orders, waiting a while after restarting a node, but still the 'other'
>> > node
>> > always has a status of "down".  nodetool repair [keyspace] did not make
>> > any
>> > difference either and nodetool join just told me that the nodes were
>> > already
>> > a part of the ring.
>> > I can't imagine this is how it *should* be behaving... is there a piece
>> > I'm
>> > missing in terms of getting one node to recognize the other as being Up?
>
>
>
> --
> David McNelis
> Lead Software Engineer
> Agentis Energy
> www.agentisenergy.com
> o: 630.359.6395
> c: 219.384.5143
> A Smart Grid technology company focused on helping consumers of energy
> control an often under-managed resource.
>
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of DataStax, the source for professional Cassandra support
http://www.datastax.com

Re: Restarting cluster

Reply via email to