Node decomission failed

2012-06-06 Thread Marc Canaleta
Hi,

We are testing Cassandra and tried to remove a node from the cluster using
nodetool decomission. The node transferred the data, then "died" for about
20 minutes without responding, then came back to life with a load of
50-100, was in a heavy load during about 1 hour and then returned to normal
load. It seems to have stopped receiving new data but it is still in the
cluster.

The node we tried to remove is the third one:

root@dc-cassandra-03:~# nodetool ring
Note: Ownership information does not include topology, please specify a
keyspace.
Address DC  RackStatus State   LoadOwns
   Token

   113427455640312821154458202477256070484
10.70.147.62datacenter1 rack1   Up Normal  7.14 GB
33.33%  0
10.208.51.64datacenter1 rack1   Up Normal  3.68 GB
33.33%  56713727820156410577229101238628035242
10.190.207.185  datacenter1 rack1   Up Normal  3.54 GB
33.33%  113427455640312821154458202477256070484


It seems it is still part of the cluster. What should we do? decomission
again?

How can we know the current state of the cluster?

Thanks!


0.7 live schema updates

2010-09-16 Thread Marc Canaleta
Hi!

I like the new feature of making live schema updates. You can add, drop and
rename columns and keyspaces via thrift, but how do you modify column
attributes like key_cache_size or rows_cached?

Thank you.


Re: Best strategy for adding new nodes to the cluster

2010-09-27 Thread Marc Canaleta
What do you mean by "running live"? I am also planning to use cassandra on
EC2 using small nodes. Small nodes have 1/4 cpu of the large ones, 1/4 cost,
but I/O is more than 1/4 (amazon does not give explicit I/O numbers...), so
I think 4 small instances should perform better than 1 large one (and the
cost is the same), am I wrong?

El 27 de septiembre de 2010 18:09:14 UTC+2, Jonathan Ellis <
jbel...@gmail.com> escribió:

> I strongly recommend not running live on Small nodes.  So in your case
> I would recommend starting up Large instances with raid0'd disks, shut
> down cassandra on the Small ones, rsync to the Large, and start up on
> Large.
>
> On Mon, Sep 27, 2010 at 6:46 AM, Utku Can TopƧu  wrote:
> > Hi All,
> >
> > We're currently running a cassandra cluster with Replication Factor 3,
> > consisting of 4 nodes.
> >
> > The current situation is:
> >
> > - The nodes are all identical (AWS small instances)
> > - Data directory is in the partition (/mnt) which has 150G capacity and
> each
> > node has around 90 GB load, so 60 G free space per node is left.
> >
> > So adding a new node to the cluster will seem to cause problems for us. I
> > think the node which will stream the data to the new bootstrapping node,
> > will not have enough disk space for anticompacting its data.
> >
> > What should be the best practice for such scenarios?
> >
> > Regards,
> >
> > Utku
> >
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>


Re: New nodes won't bootstrap on .66

2010-11-07 Thread Marc Canaleta
Hi,

Did you solve this problem? I'm having the same poblem. I'm trying to
bootstrap a third node in a 0.66 cluster. It has two keyspaces: Keyspace1
and KeyspaceLogs, both with replication factor 2.

It starts bootstrapping, receives some streams but it keeps waiting for
streams. I enabled the debug mode. This lines may be useful:

DEBUG [main] 2010-11-07 17:39:50,052 BootStrapper.java (line 70) Beginning
bootstrap process
DEBUG [main] 2010-11-07 17:39:50,082 StorageService.java (line 160) Added /
10.204.93.16/Keyspace1 as a bootstrap source
...
DEBUG [main] 2010-11-07 17:39:50,090 StorageService.java (line 160) Added /
10.204.93.16/KeyspaceLogs as a bootstrap source
... (streaming mesages)
DEBUG [Thread-56] 2010-11-07 17:45:51,706 StorageService.java (line 171)
Removed /10.204.93.16/Keyspace1 as a bootstrap source; remaining is [/
10.204.93.16]
...
(and never ends).

It seems it is waiting for  [/10.204.93.16] when it should be waiting for /
10.204.93.16/KeyspaceLogs.

The third node is 64 bits, while the two existing nodes are 32 bits. Can
this be a problem?

Thank you.


2010/10/28 Dimitry Lvovsky 

> Maybe your7000 is being blocked by iptables
> or some firewall or maybe you have it bound ( tag )  to
> localhost instead an ip address.
>
> Hope this helps,
> Dimitry.
>
>
>
> On Thu, Oct 28, 2010 at 5:35 PM, Thibaut Britz <
> thibaut.br...@trendiction.com> wrote:
>
>> Hi,
>>
>> I have the same problem with 0.6.5
>>
>> New nodes will hang forever in bootstrap mode (no streams are being
>> opened) and the receiver thread just waits for data forever:
>>
>>
>>  INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line 120)
>> Sampling index for /hd2/cassandra/data/table_xyz/
>> table_xyz-3-Data.db
>>  INFO [Thread-53] 2010-10-27 20:33:37,444 StreamCompletionHandler.java
>> (line 64) Streaming added /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db
>>
>> Stacktracke:
>>
>> "pool-1-thread-53" prio=10 tid=0x412f2800 nid=0x215c runnable
>> [0x7fd7cf217000]
>>java.lang.Thread.State: RUNNABLE
>> at java.net.SocketInputStream.socketRead0(Native Method)
>> at java.net.SocketInputStream.read(SocketInputStream.java:129)
>> at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>> at java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>> at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>> - locked <0x7fd7e77e0520> (a java.io.BufferedInputStream)
>> at
>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:126)
>> at
>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>> at
>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314)
>> at
>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262)
>> at
>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192)
>> at
>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1154)
>> at
>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>> at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>> at java.lang.Thread.run(Thread.java:662)
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> On Thu, Oct 28, 2010 at 12:44 PM, aaron morton 
>> wrote:
>>
>>> The best approach is to manually select the tokens, see the Load
>>> Balancing section http://wiki.apache.org/cassandra/Operations Also
>>>
>>> Are there any log messages in the existing nodes or the new one which
>>> mention each other?
>>>
>>> Is this a production system? Is it still running ?
>>>
>>> Sorry there is not a lot to go on, it sounds like you've done the right
>>> thing. I'm assuming things like the Cluster Name, seed list and port numbers
>>> are set correct as the new node got some data.
>>>
>>> You'll need to dig through the logs a bit more to see that the boot
>>> strapping started and what was the last message it logged.
>>>
>>> Good Luck.
>>> Aaron
>>>
>>> On 27 Oct 2010, at 22:40, Dimitry Lvovsky wrote:
>>>
>>> Hi Aaron,
>>> Thanks for your reply.
>>>
>>> We still haven't solved this unfortunately.
>>>
>>>  How did you start the bootstrap for the .18 node ?
>>>
>>>
>>> Standard way: we set "AutoBootstrap" to true and added all the servers
>>> from the working ring as seeds.
>>>
>>>
 Was it the .18 or the .17 node you tried to add
>>>
>>>
>>> We first tried adding .17, it streamed for a while, took on a 50GB of
>>> load, stopped streaming but then didn't enter into the ring.  We left it for
>>> a few days to see if it would come in, but no luck.  After that we did
>>>  decommission and  removeToken ( in that order) operations.
>>> Since we couldn't get .17 in we tried again with .18.  Before doing so we
>>> increas

Re: New nodes won't bootstrap on .66

2010-11-08 Thread Marc Canaleta
I have just solved the problem removing the second keyspace (manually moving
its column families to the first). So it seems the problem appears when
having multiple keyspaces.

2010/11/8 Thibaut Britz 

> Hi,
>
> No I didn't solve the problem. I reinitialized the cluster and gave each
> node manually a token before adding data. There are a few messages in
> multiple threads related to this, so I suspect it's very common and I hope
> it's gone with 0.7.
>
> Thibaut
>
>
>
>
>
> On Sun, Nov 7, 2010 at 6:57 PM, Marc Canaleta  wrote:
>
>> Hi,
>>
>> Did you solve this problem? I'm having the same poblem. I'm trying to
>> bootstrap a third node in a 0.66 cluster. It has two keyspaces: Keyspace1
>> and KeyspaceLogs, both with replication factor 2.
>>
>> It starts bootstrapping, receives some streams but it keeps waiting for
>> streams. I enabled the debug mode. This lines may be useful:
>>
>> DEBUG [main] 2010-11-07 17:39:50,052 BootStrapper.java (line 70) Beginning
>> bootstrap process
>> DEBUG [main] 2010-11-07 17:39:50,082 StorageService.java (line 160) Added
>> /10.204.93.16/Keyspace1 as a bootstrap source
>> ...
>> DEBUG [main] 2010-11-07 17:39:50,090 StorageService.java (line 160) Added
>> /10.204.93.16/KeyspaceLogs as a bootstrap source
>> ... (streaming mesages)
>> DEBUG [Thread-56] 2010-11-07 17:45:51,706 StorageService.java (line 171)
>> Removed /10.204.93.16/Keyspace1 as a bootstrap source; remaining is [/
>> 10.204.93.16]
>> ...
>> (and never ends).
>>
>> It seems it is waiting for  [/10.204.93.16] when it should be waiting for
>> /10.204.93.16/KeyspaceLogs.
>>
>> The third node is 64 bits, while the two existing nodes are 32 bits. Can
>> this be a problem?
>>
>> Thank you.
>>
>>
>> 2010/10/28 Dimitry Lvovsky 
>>
>> Maybe your7000 is being blocked by iptables
>>> or some firewall or maybe you have it bound ( tag )  to
>>> localhost instead an ip address.
>>>
>>> Hope this helps,
>>> Dimitry.
>>>
>>>
>>>
>>> On Thu, Oct 28, 2010 at 5:35 PM, Thibaut Britz <
>>> thibaut.br...@trendiction.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> I have the same problem with 0.6.5
>>>>
>>>> New nodes will hang forever in bootstrap mode (no streams are being
>>>> opened) and the receiver thread just waits for data forever:
>>>>
>>>>
>>>>  INFO [Thread-53] 2010-10-27 20:33:37,399 SSTableReader.java (line 120)
>>>> Sampling index for /hd2/cassandra/data/table_xyz/
>>>> table_xyz-3-Data.db
>>>>  INFO [Thread-53] 2010-10-27 20:33:37,444 StreamCompletionHandler.java
>>>> (line 64) Streaming added /hd2/cassandra/data/table_xyz/table_xyz-3-Data.db
>>>>
>>>> Stacktracke:
>>>>
>>>> "pool-1-thread-53" prio=10 tid=0x412f2800 nid=0x215c runnable
>>>> [0x7fd7cf217000]
>>>>java.lang.Thread.State: RUNNABLE
>>>> at java.net.SocketInputStream.socketRead0(Native Method)
>>>> at java.net.SocketInputStream.read(SocketInputStream.java:129)
>>>> at
>>>> java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
>>>> at
>>>> java.io.BufferedInputStream.read1(BufferedInputStream.java:258)
>>>> at
>>>> java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>>>> - locked <0x7fd7e77e0520> (a java.io.BufferedInputStream)
>>>> at
>>>> org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:126)
>>>> at
>>>> org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
>>>> at
>>>> org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:314)
>>>> at
>>>> org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:262)
>>>> at
>>>> org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:192)
>>>> at
>>>> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:1154)
>>>> at
>>>> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:167)
>>>> at
>>>> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>>>> at
>&g