RE: Schematool

2011-12-11 Thread Michael Vaknine
Hi,

I have a keyspace called Index.

 

I am trying to create it when I create a new cluster from the script that
was created on the old cluster.

 

create keyspace Index

with placement_strategy = 'SimpleStrategy'

and strategy_options = {replication_factor : 3

and durable_writes = true;

 

I get an error

 

[default@City] create keyspace Index

...   with placement_strategy = 'SimpleStrategy'

...   and strategy_options = {replication_factor : 3}

...   and durable_writes = true;

Syntax error at position 16: mismatched input 'Index' expecting set null

 

My question is

Is Index a reserved word?

How can I create this keyspace? I tried 'Index' and "Index" but I still get
the error and I am not able to create it.

 

Thanks

Michael

 

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] 
Sent: Thursday, December 08, 2011 11:52 AM
To: user@cassandra.apache.org
Subject: Re: Schematool

 

You should be able to use the CLI "show schema yourkeyspace" if your
cassandra is recent enough ( >= 0.8 if I remember well. I think it is better
if you are in 0.8.7 because this command was fixed a couple of times in 8.6
and 8.7).

 

You can put the "show schema" command into a file and call it with :
"cassandra-cli -h yourhost -f yourfile > 20111208schema" then open your
output file and remove the 2 or 3 useless lines that are written before
"create yourkeyspace".

 

Hope this will be helpful.

 

Alain

 

2011/12/8 Michael Vaknine 

Hi,

Since schematool has been removed from Cassandra is there a way to extract
the schema from a working cluster in order to create a new empty cluster?

Thanks
Michael



 



Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH

2011-12-11 Thread Caleb Rackliffe
Hi All,

I'm trying to start up Cassandra 1.0.5 on a Cent OS 6 machine.  I installed JNA 
through yum and made a symbolic link to jna.jar in my Cassandra lib directory.  
When I run "bin/cassandra -f", I get the following:

 INFO 09:14:31,552 Logging initialized
 INFO 09:14:31,555 JVM vendor/version: Java HotSpot(TM) 64-Bit Server 
VM/1.6.0_29
 INFO 09:14:31,555 Heap size: 3405774848/3405774848
 INFO 09:14:31,555 Classpath: 
bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.0.5.jar:bin/../lib/apache-cassandra-clientutil-1.0.5.jar:bin/../lib/apache-cassandra-thrift-1.0.5.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.6.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/jamm-0.2.5.jar
Killed

If I remove the symlink to JNA, it starts up just fine.

Also, I do have entries in my limits.conf for JNA:

rootsoftmemlock unlimited
roothardmemlock unlimited

Has anyone else seen this behavior?

Thanks,

Caleb Rackliffe | Software Developer
M 949.981.0159 | ca...@steelhouse.com
[cid:51C98683-7807-40ED-992A-1FBF168FE2BD]
<>

Re: Meaning of values in tpstats

2011-12-11 Thread Philippe
Was't that on the 1.0 branch ? I'm still running 0.8x ?

@Peter: investigating a little more before answering. Thanks

2011/12/10 Edward Capriolo 

> There was a recent patch that fixed an issue where counters were hitting
> the same natural endpoint rather then being randomized across all of them.
>
>
> On Saturday, December 10, 2011, Peter Schuller <
> peter.schul...@infidyne.com> wrote:
> >> Pool NameActive   Pending  Completed   Blocked
>  All
> >> time blocked
> >> ReadStage27  2166 3565927301 0
> >
> > In general, "active" refers to work that is being executed right now,
> > "pending" refers to work that is waiting to be executed (go into
> > "active"), and completed is the cumulative all-time (since node start)
> > count of the number of tasks executed.
> >
> > With the slicing, I'm not sure off the top of my head. I'm sure
> > someone else can chime in. For e.g. a multi-get, they end up as
> > independent tasks.
> >
> > Typically having pending persistently above 0 for ReadStage or
> > MutationStage, especially if more than a hand-ful, means that you are
> > having a performance issue - either capacity problem or something
> > else, as incoming requests will have to wait to be services. Typically
> > the most common effect is that you are bottlenecking on I/O and
> > ReadStage pending shoots through the roof.
> >
> > There are exceptions. If you e.g. submit a really large multi-get of
> > 5000, that will naturally lead to a spike (and if all 5000 of them
> > need to go down to disk, the spike will survive for a bit). If you are
> > ONLY doing these queries, that's not a problem per se. But if you are
> > also expecting other requests to have low latency, then you want to
> > avoid it.
> >
> > In general, batching is good - but don't overdo it, especially for
> > reads, and especially if you're going to disk for the workload.
> >
> > --
> > / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
> >
>


RE: Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH

2011-12-11 Thread Michael Vaknine
Try 

root   -   MEMLOCK  14155776

on /etc/security/limits.conf

 

Michael

 

From: Caleb Rackliffe [mailto:ca...@steelhouse.com] 
Sent: Sunday, December 11, 2011 11:24 AM
To: user@cassandra.apache.org
Subject: Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH

 

Hi All,

 

I'm trying to start up Cassandra 1.0.5 on a Cent OS 6 machine.  I installed
JNA through yum and made a symbolic link to jna.jar in my Cassandra lib
directory.  When I run "bin/cassandra -f", I get the following:

 

 INFO 09:14:31,552 Logging initialized

 INFO 09:14:31,555 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
VM/1.6.0_29

 INFO 09:14:31,555 Heap size: 3405774848/3405774848

 INFO 09:14:31,555 Classpath:
bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib
/antlr-3.2.jar:bin/../lib/apache-cassandra-1.0.5.jar:bin/../lib/apache-cassa
ndra-clientutil-1.0.5.jar:bin/../lib/apache-cassandra-thrift-1.0.5.jar:bin/.
./lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../li
b/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-la
ng-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhash
map-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar
:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.j
ar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:
bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.6.jar:bin/../lib/log4j
-1.2.16.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6
.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/.
./lib/snappy-java-1.0.4.1.jar:bin/../lib/jamm-0.2.5.jar

Killed

 

If I remove the symlink to JNA, it starts up just fine.

 

Also, I do have entries in my limits.conf for JNA:

 

rootsoftmemlock unlimited

roothardmemlock unlimited

 

Has anyone else seen this behavior?

 

Thanks,

 

Caleb Rackliffe | Software Developer 

M 949.981.0159 | ca...@steelhouse.com



<>

Re: Schematool

2011-12-11 Thread Alain RODRIGUEZ
This is quite a different subject and the question has already been asked a
few days ago (December, 1) :
http://www.mail-archive.com/user@cassandra.apache.org/msg19083.html

Anyways, you can test it by yourself changing the name of the column
family. I don't know more about it by myself.

Alain

2011/12/11 Michael Vaknine 

> Hi,
>
> I have a keyspace called Index.
>
> ** **
>
> I am trying to create it when I create a new cluster from the script that
> was created on the old cluster.
>
> ** **
>
> create keyspace Index
>
> with placement_strategy = 'SimpleStrategy'
>
> and strategy_options = {replication_factor : 3
>
> and durable_writes = true;
>
> ** **
>
> I get an error
>
> ** **
>
> [default@City] create keyspace Index
>
> ...   with placement_strategy = 'SimpleStrategy'
>
> ...   and strategy_options = {replication_factor : 3}
>
> ...   and durable_writes = true;
>
> Syntax error at position 16: mismatched input 'Index' expecting set null**
> **
>
> ** **
>
> My question is
>
> Is Index a reserved word?
>
> How can I create this keyspace? I tried 'Index' and "Index" but I still
> get the error and I am not able to create it.
>
> ** **
>
> Thanks
>
> Michael
>
> ** **
>
> *From:* Alain RODRIGUEZ [mailto:arodr...@gmail.com]
> *Sent:* Thursday, December 08, 2011 11:52 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: Schematool
>
> ** **
>
> You should be able to use the CLI "show schema yourkeyspace" if your
> cassandra is recent enough ( >= 0.8 if I remember well. I think it is
> better if you are in 0.8.7 because this command was fixed a couple of times
> in 8.6 and 8.7).
>
> ** **
>
> You can put the "show schema" command into a file and call it with :
> "cassandra-cli -h yourhost -f yourfile > 20111208schema" then open your
> output file and remove the 2 or 3 useless lines that are written before
> "create yourkeyspace".
>
> ** **
>
> Hope this will be helpful.
>
> ** **
>
> Alain
>
> ** **
>
> 2011/12/8 Michael Vaknine 
>
> Hi,
>
> Since schematool has been removed from Cassandra is there a way to extract
> the schema from a working cluster in order to create a new empty cluster?
>
> Thanks
> Michael
>
> 
>
> ** **
>


Re: memory leaks in 1.0.5

2011-12-11 Thread Radim Kolar



Possible, but unlikely.  See
https://issues.apache.org/jira/browse/CASSANDRA-3537 for an example of
a  "memory leak" that wasn't.
I didnt get the point. I have slowly increasing memory load on node, no 
flushable memtables. How it could not be memory leak? Also running 
nodetool upgradesstables fails with OOM.


https://rapidshare.com/files/2558346759/system.log.4.bz2


Re: Atomic Operations in Cassandra

2011-12-11 Thread Boris Yen
Hi Sylvain,

"Writes under the same row key are atomic (*even across column families*)
in the
sense that they are either all persisted or none are."

Is this new feature for 1.x, or it also applies to previous version of
Cassandra?

Boris

On Thu, Dec 8, 2011 at 6:40 PM, Sylvain Lebresne wrote:

> On Thu, Dec 8, 2011 at 12:57 AM, Christof Bornhoevd
>  wrote:
> > Hi All,
> >
> > I'm using Cassandra 1.0.3 (with Hector 0.7). What is the granularity of
> > atomic read and write operations with Cassandra. I.e. is the insert or
> > update of an individual column an atomic operation (in the sense that it
> > either fails or persists completely), or is the insert or update of an
> > entire row in a ColumnFamily atomic?
> >
> > Similarly, if I read multiple columns of the same row, could the read
> > operation interfere with a concurrent write operation on these same
> columns
> > in a way that I might see some old and some new column values?
>
> Writes under the same row key are atomic (even across column families) in
> the
> sense that they are either all persisted or none are. Note however that it
> is
> possible for a insertion to fail for the client (say you get a
> TimeoutException) but
> for the insertion to still be persisted.
> There is however no isolation currently. It is possible for a read to
> see a state
> where only part of an insertion (even within the same row key) has been
> applied.
> (CASSANDRA-2893 is open to try to add isolation).
>
> --
> Sylvain
>
> >
> > Cheers and thanks a lot for any kind help on this!
> > Chris
>


Moving existing nodes to a different network.

2011-12-11 Thread Henrik Schröder
I have an existing cluster of four Cassandra nodes. The machines have both
an internal and an external IP, and originally I set them up to use the
external network. A little while later I moved them to the internal network
by bringing all machines down, changing the config, and bringing them up
again. In the logs I found they all said "Changing ownership of token XXX",
and nodetool ring reported that the cluster consisted of those four
machines on their internal ips. After that, as part of a cleanup process, I
moved the tokens on all machines to make sure the cluster was balanced, and
it also worked perfectly.

However, now I have to temporarily move the cluster back to the external
network for a little while. I tried doing the same thing as last time,
bringing all nodes down, changing the config (rpc address, gossip address,
list of seeds) and bringing them up again, but this resulted in a very
confused cluster. When I ran nodetool ring, it reported eight nodes, the
four internal ips were marked as down, and the four external were marked as
up, but with the token they had when they previously used that ip. Checking
the logs, there was no token ownership change, all nodes picked the saved
token they had when they last used the external ip, and not the token they
should have, the one I moved each server to when on the internal ip.

I immediately moved all servers back to the internal IP, and then nodetool
reported the same as before, a cluster of four machines, all up, and all on
the token they're supposed to have. No mention of the external ips or the
old tokens they had there.

How do I reset this data? Where is it stored? Why does it store all of this
when nodetool doesn't report it? Why does a node store several saved
tokens? How do I change their ip without losing any data and without having
to do removetoken or similar?

One thought I have is to bring down one node, delete the system keyspace,
and bring it back up, at which point it would only use what's in the
config, but fetch the schema from the other nodes. Or would it also fetch
the old information of what token it had when it was on the external ip? Or
would something else go wrong?


/Henrik


Re: Moving existing nodes to a different network.

2011-12-11 Thread Henrik Schröder
I'm running Cassandra 1.0.1 if that makes any difference.


/Henrik

On Sun, Dec 11, 2011 at 13:16, Henrik Schröder  wrote:

> I have an existing cluster of four Cassandra nodes. The machines have both
> an internal and an external IP, and originally I set them up to use the
> external network. A little while later I moved them to the internal network
> by bringing all machines down, changing the config, and bringing them up
> again. In the logs I found they all said "Changing ownership of token XXX",
> and nodetool ring reported that the cluster consisted of those four
> machines on their internal ips. After that, as part of a cleanup process, I
> moved the tokens on all machines to make sure the cluster was balanced, and
> it also worked perfectly.
>
> However, now I have to temporarily move the cluster back to the external
> network for a little while. I tried doing the same thing as last time,
> bringing all nodes down, changing the config (rpc address, gossip address,
> list of seeds) and bringing them up again, but this resulted in a very
> confused cluster. When I ran nodetool ring, it reported eight nodes, the
> four internal ips were marked as down, and the four external were marked as
> up, but with the token they had when they previously used that ip. Checking
> the logs, there was no token ownership change, all nodes picked the saved
> token they had when they last used the external ip, and not the token they
> should have, the one I moved each server to when on the internal ip.
>
> I immediately moved all servers back to the internal IP, and then nodetool
> reported the same as before, a cluster of four machines, all up, and all on
> the token they're supposed to have. No mention of the external ips or the
> old tokens they had there.
>
> How do I reset this data? Where is it stored? Why does it store all of
> this when nodetool doesn't report it? Why does a node store several saved
> tokens? How do I change their ip without losing any data and without having
> to do removetoken or similar?
>
> One thought I have is to bring down one node, delete the system keyspace,
> and bring it back up, at which point it would only use what's in the
> config, but fetch the schema from the other nodes. Or would it also fetch
> the old information of what token it had when it was on the external ip? Or
> would something else go wrong?
>
>
> /Henrik
>


Re: Meaning of values in tpstats

2011-12-11 Thread Philippe
Answer below


> > Pool NameActive   Pending  Completed   Blocked
>  All
> > time blocked
> > ReadStage27  2166 3565927301 0
> With the slicing, I'm not sure off the top of my head. I'm sure
> someone else can chime in. For e.g. a multi-get, they end up as
> independent tasks.
>
So if I multiget 10 keys, they are fetched in //, consolidated by the
coodinator and then sent back ?
Can anyone confirm for multigetslice ? I want to know if batching is
counterproductive really.

Typically having pending persistently above 0 for ReadStage or
> MutationStage, especially if more than a hand-ful, means that you are
> having a performance issue - either capacity problem or something
> else, as incoming requests will have to wait to be services. Typically
> the most common effect is that you are bottlenecking on I/O and
> ReadStage pending shoots through the roof.

In general, batching is good - but don't overdo it, especially for
> reads, and especially if you're going to disk for the workload.
>
 Agreed, I followed someone suggestion some time ago to reduce my batch
sizes and it has helped tremendoulsy. I'm now doing multigetslices in
batchers of 512 instead of 5000 and I find I no longer have Pendings up so
high. The most I see now is a couple hundred.


Re: CPU bound workload

2011-12-11 Thread Philippe
Hi Peter,
I'm going to mix the response to your email along with my other email from
yesterday since they pertain to the same issue.
Sorry this is a little long, but I'm stomped and I'm trying to describe
what I've investigated.

In a nutshell, in case someone has encountered this and won't read it to
the end : a write-heavy process is going the ring to appear to "freeze" (=>
utilization = 0%). Its Hector speed4j logs indicate failures and success at
max=38s while other read/write processes are all indicating max=28s. It
looks like I've got a magic number I can't figure out.

You do say "nodes handling the requests". Two things to always keep in
> mind is to (1) spread the requests evenly across all members of the
> cluster, and (2) if you are doing a lot of work per row key, spread it
> around and be concurrent so that you're not hitting a single row at a
> time, which will be under the responsibility of a single set of RF
> nodes (you want to put load on the entire cluster evently if you want
> to maximize throughput).
>
I'm using Hector to connect to the cluster along with autoDiscover=true.
Furthermore, I see in my logs that updates do get sent to multiple nodes so
1) is ok.
Regarding 2), I may be running into this since data updates are very
localized by design. I've distributed the keys per storage load but I'm
going to have to distribute them by read/write load since the workload is
all but random and I'm using BOP. However, I never see an IO bottle neck
when using iostat, see below.


> For starters, what *is* the throughput? How many counter mutations are
> you submitting per second?
>
I've got two processes doing writes in parallel. The one we are currently
discussing ("Process A") only writes while the other one ("Process B")
reads 2 to 4x more data than it writes.

Process A typically looks like this (numbers come from Hector). Each line
below is one cassandra batch ie one Hector Mutator.execute():
15:15:53 Wrote 86 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160
(153 usecs)
15:15:53 Wrote 90 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160
(97 usecs)
15:15:54 Wrote 85 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160
(754 usecs)
15:15:54 Wrote 81 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160
(561 usecs)
15:15:54 Wrote 86 cassandra mutations using host
176.31.226.128(176.31.226.128):9160 (130 usecs)
15:15:54 Wrote 73 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160
(97 usecs)
15:15:54 Wrote 82 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160
(48 usecs)
15:15:56 Wrote 108 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160
(1653 usecs)
15:15:56 Wrote 84 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160
(23 usecs)
I'm pretty sure those are milli-seconds and not micro-seconds as per Hector
docs (see last two lines & timestamp) which would amount to 500 to 1000
mutations per second with a min at 65 and a max at 3652.
Clusterwide, opscenter is reporting 10 writes requests per second in the
20mn graph but that can't be right.

Exact number is somewhere in the thousands of keys read per second but my
problem with writes is really so big it doesn't matter what the actual
number, see below.


What's really puzzling is this, found in the logs created by Hector for
Process B:
Tag   Avg(ms)  Min  Max  Std
Dev 95th   Count
WRITE.success_1709.64 0.83 28982.61
 6100.55 21052.93 267
READ.success_  262.6417.25  1343.53
191.79   610.99 637
(+hardly ever any failures)

At the same time, for process A, I see this
15:29:07 Tag   Avg(ms)  Min
 Max  Std Dev 95th   Count
15:29:07 WRITE.success_ 584.7613.23
38042.17  4242.24   334.8479
15:29:07 WRITE.fail_  38008.16 38008.16
38008.16 0.00 38008.16 1
(failures every minute)

So there is at least one WRITE which is very very long : 28s for Process B
and 38s for Process A. In fact, it looks like a magic timeout number
because I see those two numbers all the time in the logs.
WRITE.success_1603.61 1.11 28829.06
 6069.97 21987.07 152
WRITE.success_ 307.56 0.81 29958.18
 2879.9139.98 918
WRITE.success_1664.64 0.88 29953.52
 6127.34 20023.88 276
However, I can't link it to anything. My Hector failover timeout is 2s and
everything else is just default install values. Even if Hector was
backing-off multiple times until it worked, why would I always get the same
28/38 value...

When I get a log like these, there always is a "cluster-freeze" during the
preceding minute. By "cluster-freeze", I mean that a couple of nodes go to
0% utilization (no cpu, no system, no io)

Once I noticed this, I shutdown Process A and watched Process B performance
logs. It's all back to normal now:
Tag

Re: Atomic Operations in Cassandra

2011-12-11 Thread Sylvain Lebresne
On Sun, Dec 11, 2011 at 12:01 PM, Boris Yen  wrote:
> Hi Sylvain,
>
> "Writes under the same row key are atomic (even across column families) in
> the
> sense that they are either all persisted or none are."
>
> Is this new feature for 1.x, or it also applies to previous version of
> Cassandra?

It applies to previous version of Cassandra.

--
Sylvain

>
> Boris
>
>
> On Thu, Dec 8, 2011 at 6:40 PM, Sylvain Lebresne 
> wrote:
>>
>> On Thu, Dec 8, 2011 at 12:57 AM, Christof Bornhoevd
>>  wrote:
>> > Hi All,
>> >
>> > I'm using Cassandra 1.0.3 (with Hector 0.7). What is the granularity of
>> > atomic read and write operations with Cassandra. I.e. is the insert or
>> > update of an individual column an atomic operation (in the sense that it
>> > either fails or persists completely), or is the insert or update of an
>> > entire row in a ColumnFamily atomic?
>> >
>> > Similarly, if I read multiple columns of the same row, could the read
>> > operation interfere with a concurrent write operation on these same
>> > columns
>> > in a way that I might see some old and some new column values?
>>
>> Writes under the same row key are atomic (even across column families) in
>> the
>> sense that they are either all persisted or none are. Note however that it
>> is
>> possible for a insertion to fail for the client (say you get a
>> TimeoutException) but
>> for the insertion to still be persisted.
>> There is however no isolation currently. It is possible for a read to
>> see a state
>> where only part of an insertion (even within the same row key) has been
>> applied.
>> (CASSANDRA-2893 is open to try to add isolation).
>>
>> --
>> Sylvain
>>
>> >
>> > Cheers and thanks a lot for any kind help on this!
>> > Chris
>
>


Re: Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH

2011-12-11 Thread Michael Vaknin
i tried my configuration which is working but i am with ubuntu and
cassandra 1.0.3 and i am running cassandra under user cassandra.
i did not try it with 1.0.5 because i was not able to work with this
version and i am waiting to 1.0.6


On Sun, Dec 11, 2011 at 7:50 PM, Caleb Rackliffe wrote:

> I changed the value in limit.conf as you suggested, and that seems to have
> no effect.  Were you thinking that the OS wasn't respecting the "unlimited"?
>
>
> *
> Caleb Rackliffe | Software Developer
> M 949.981.0159 | ca...@steelhouse.com
> **
> *
>
> From: Michael Vaknine 
> Reply-To: "user@cassandra.apache.org" 
> Date: Sun, 11 Dec 2011 05:15:32 -0500
> To: "user@cassandra.apache.org" 
> Subject: RE: Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH
>
> Try 
>
> root   -   MEMLOCK  14155776
>
> on /etc/security/limits.conf
>
> ** **
>
> Michael
>
> ** **
>
> *From:* Caleb Rackliffe [mailto:ca...@steelhouse.com]
>
> *Sent:* Sunday, December 11, 2011 11:24 AM
> *To:* user@cassandra.apache.org
> *Subject:* Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH
>
> ** **
>
> Hi All,
>
> ** **
>
> I'm trying to start up Cassandra 1.0.5 on a Cent OS 6 machine.  I
> installed JNA through yum and made a symbolic link to jna.jar in my
> Cassandra lib directory.  When I run "bin/cassandra -f", I get the
> following:
>
> ** **
>
>  INFO 09:14:31,552 Logging initialized
>
>  INFO 09:14:31,555 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
> VM/1.6.0_29
>
>  INFO 09:14:31,555 Heap size: 3405774848/3405774848
>
>  INFO 09:14:31,555 Classpath:
> bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.0.5.jar:bin/../lib/apache-cassandra-clientutil-1.0.5.jar:bin/../lib/apache-cassandra-thrift-1.0.5.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.6.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/jamm-0.2.5.jar
> 
>
> Killed
>
> ** **
>
> If I remove the symlink to JNA, it starts up just fine.
>
> ** **
>
> Also, I do have entries in my limits.conf for JNA:
>
> ** **
>
> rootsoftmemlock unlimited
>
> roothardmemlock unlimited
>
> ** **
>
> Has anyone else seen this behavior?
>
> ** **
>
> Thanks,
>
> ** **
>
> *Caleb Rackliffe | Software Developer *
>
> M 949.981.0159 | ca...@steelhouse.com
>
> 
>


Re: Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH

2011-12-11 Thread Brandon Williams
On Sun, Dec 11, 2011 at 3:23 AM, Caleb Rackliffe wrote:

> Hi All,
>
> I'm trying to start up Cassandra 1.0.5 on a Cent OS 6 machine.  I
> installed JNA through yum and made a symbolic link to jna.jar in my
> Cassandra lib directory.  When I run "bin/cassandra -f", I get the
> following:
>
>  INFO 09:14:31,552 Logging initialized
>  INFO 09:14:31,555 JVM vendor/version: Java HotSpot(TM) 64-Bit Server
> VM/1.6.0_29
>  INFO 09:14:31,555 Heap size: 3405774848/3405774848
>  INFO 09:14:31,555 Classpath:
> bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.0.5.jar:bin/../lib/apache-cassandra-clientutil-1.0.5.jar:bin/../lib/apache-cassandra-thrift-1.0.5.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.6.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/jamm-0.2.5.jar
> Killed
>

The 'Killed' line is your problem, the OOM killer decided to kill java.
 You can confirm this in dmesg.  You either need more memory or less heap,
the reason it's happening instantly with JNA is because all the memory is
being allocated up front, but without it you still have a timebomb waiting
to go off.

-Brandon


Re: CPU bound workload

2011-12-11 Thread Philippe
Interesting development : I changed the maximum size of the batches in
"Process A" to get them to go from about 90 per execute() to about 35. All
the weird 28s/38s maximum execution times are gone, all timeouts are gone
and everything is zipping along just fine. So moral of the story for me is
: only batch if you gain something because it might break stuff.

Given this work-around, can anyone explain to me why this was happening ?

2011/12/11 Philippe 

> Hi Peter,
> I'm going to mix the response to your email along with my other email from
> yesterday since they pertain to the same issue.
> Sorry this is a little long, but I'm stomped and I'm trying to describe
> what I've investigated.
>
> In a nutshell, in case someone has encountered this and won't read it to
> the end : a write-heavy process is going the ring to appear to "freeze" (=>
> utilization = 0%). Its Hector speed4j logs indicate failures and success at
> max=38s while other read/write processes are all indicating max=28s. It
> looks like I've got a magic number I can't figure out.
>
> You do say "nodes handling the requests". Two things to always keep in
>> mind is to (1) spread the requests evenly across all members of the
>> cluster, and (2) if you are doing a lot of work per row key, spread it
>> around and be concurrent so that you're not hitting a single row at a
>> time, which will be under the responsibility of a single set of RF
>> nodes (you want to put load on the entire cluster evently if you want
>> to maximize throughput).
>>
> I'm using Hector to connect to the cluster along with autoDiscover=true.
> Furthermore, I see in my logs that updates do get sent to multiple nodes so
> 1) is ok.
> Regarding 2), I may be running into this since data updates are very
> localized by design. I've distributed the keys per storage load but I'm
> going to have to distribute them by read/write load since the workload is
> all but random and I'm using BOP. However, I never see an IO bottle neck
> when using iostat, see below.
>
>
>> For starters, what *is* the throughput? How many counter mutations are
>> you submitting per second?
>>
> I've got two processes doing writes in parallel. The one we are currently
> discussing ("Process A") only writes while the other one ("Process B")
> reads 2 to 4x more data than it writes.
>
> Process A typically looks like this (numbers come from Hector). Each line
> below is one cassandra batch ie one Hector Mutator.execute():
> 15:15:53 Wrote 86 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160
> (153 usecs)
> 15:15:53 Wrote 90 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160
> (97 usecs)
> 15:15:54 Wrote 85 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160
> (754 usecs)
> 15:15:54 Wrote 81 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160
> (561 usecs)
> 15:15:54 Wrote 86 cassandra mutations using host
> 176.31.226.128(176.31.226.128):9160 (130 usecs)
> 15:15:54 Wrote 73 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160
> (97 usecs)
> 15:15:54 Wrote 82 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160
> (48 usecs)
> 15:15:56 Wrote 108 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160
> (1653 usecs)
> 15:15:56 Wrote 84 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160
> (23 usecs)
> I'm pretty sure those are milli-seconds and not micro-seconds as per
> Hector docs (see last two lines & timestamp) which would amount to 500 to
> 1000 mutations per second with a min at 65 and a max at 3652.
> Clusterwide, opscenter is reporting 10 writes requests per second in the
> 20mn graph but that can't be right.
>
> Exact number is somewhere in the thousands of keys read per second but my
> problem with writes is really so big it doesn't matter what the actual
> number, see below.
>
>
> What's really puzzling is this, found in the logs created by Hector for
> Process B:
> Tag   Avg(ms)  Min  Max  Std
> Dev 95th   Count
> WRITE.success_1709.64 0.83 28982.61
>  6100.55 21052.93 267
> READ.success_  262.6417.25  1343.53
> 191.79   610.99 637
> (+hardly ever any failures)
>
> At the same time, for process A, I see this
> 15:29:07 Tag   Avg(ms)  Min
>  Max  Std Dev 95th   Count
> 15:29:07 WRITE.success_ 584.7613.23
> 38042.17  4242.24   334.8479
> 15:29:07 WRITE.fail_  38008.16 38008.16
> 38008.16 0.00 38008.16 1
> (failures every minute)
>
> So there is at least one WRITE which is very very long : 28s for Process B
> and 38s for Process A. In fact, it looks like a magic timeout number
> because I see those two numbers all the time in the logs.
> WRITE.success_1603.61 1.11 28829.06
>  6069.97 21987.07 152
> WRITE.success_ 307.56 0.81 29958.18
>  2879.9139.98 918
> WRITE.success_ 

Re: 1.0.3 CLI oddities

2011-12-11 Thread Chris Burroughs
Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3558 and the
other tickets reference there.

On 11/28/2011 05:05 AM, Janne Jalkanen wrote:
> Hi!
> 
> (Asked this on IRC too, but didn't get anyone to respond, so here goes...)
> 
> Is it just me, or are these real bugs? 
> 
> On 1.0.3, from CLI: "update column family XXX with gc_grace = 36000;" just 
> says "null" with nothing logged.  Previous value is the default.
> 
> Also, on 1.0.3, "update column family XXX with 
> compression_options={sstable_compression:SnappyCompressor,chunk_length_kb:64};"
>  returns "Internal error processing system_update_column_family" and log says 
> "Invalid negative or null chunk_length_kb" (stack trace below)
> 
> Setting the compression options worked on 1.0.0 when I was testing (though my 
> 64 kB became 64 MB, but I believe this was fixed in 1.0.3.)
> 
> Did the syntax change between 1.0.0 and 1.0.3? Or am I doing something wrong? 
> 
> The database was upgraded from 0.6.13 to 1.0.0, then scrubbed, then 
> compression options set to some CFs, then upgraded to 1.0.3 and trying to set 
> compression on other CFs.
> 
> Stack trace:
> 
> ERROR [pool-2-thread-68] 2011-11-28 09:59:26,434 Cassandra.java (line 4038) 
> Internal error processing system_update_column_family
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> java.io.IOException: org.apache.cassandra.config.ConfigurationException: 
> Invalid negative or null chunk_length_kb
>   at 
> org.apache.cassandra.thrift.CassandraServer.applyMigrationOnStage(CassandraServer.java:898)
>   at 
> org.apache.cassandra.thrift.CassandraServer.system_update_column_family(CassandraServer.java:1089)
>   at 
> org.apache.cassandra.thrift.Cassandra$Processor$system_update_column_family.process(Cassandra.java:4032)
>   at 
> org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889)
>   at 
> org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:680)
> Caused by: java.util.concurrent.ExecutionException: java.io.IOException: 
> org.apache.cassandra.config.ConfigurationException: Invalid negative or null 
> chunk_length_kb
>   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
>   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
>   at 
> org.apache.cassandra.thrift.CassandraServer.applyMigrationOnStage(CassandraServer.java:890)
>   ... 7 more
> Caused by: java.io.IOException: 
> org.apache.cassandra.config.ConfigurationException: Invalid negative or null 
> chunk_length_kb
>   at 
> org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:78)
>   at org.apache.cassandra.db.migration.Migration.apply(Migration.java:156)
>   at 
> org.apache.cassandra.thrift.CassandraServer$2.call(CassandraServer.java:883)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   ... 3 more
> Caused by: org.apache.cassandra.config.ConfigurationException: Invalid 
> negative or null chunk_length_kb
>   at 
> org.apache.cassandra.io.compress.CompressionParameters.validateChunkLength(CompressionParameters.java:167)
>   at 
> org.apache.cassandra.io.compress.CompressionParameters.create(CompressionParameters.java:52)
>   at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:796)
>   at 
> org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:74)
>   ... 7 more
> ERROR [MigrationStage:1] 2011-11-28 09:59:26,434 AbstractCassandraDaemon.java 
> (line 133) Fatal exception in thread Thread[MigrationStage:1,5,main]
> java.io.IOException: org.apache.cassandra.config.ConfigurationException: 
> Invalid negative or null chunk_length_kb
>   at 
> org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:78)
>   at org.apache.cassandra.db.migration.Migration.apply(Migration.java:156)
>   at 
> org.apache.cassandra.thrift.CassandraServer$2.call(CassandraServer.java:883)
>   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
>   at java.lang.Thread.run(Thread.java:680)
> Caused by: org.apache.cassandra.config.ConfigurationException: Invalid 
> negative or null chunk_length_kb
>   at 
> org.apache.cassandra.io.compress.CompressionParameters.validateChunkLength(CompressionParameters.java:167)
>   at 

cassandra in production environment

2011-12-11 Thread Ramesh Natarajan
Hi,

 We are currently testing cassandra in RHEL 6.1 64 bit environment
running on ESXi 5.0 and are experiencing issues with data file
corruptions. If you are using linux for production environment can you
please share which OS/version you are using?

thanks
Ramesh


Re: cassandra in production environment

2011-12-11 Thread Peter Schuller
>  We are currently testing cassandra in RHEL 6.1 64 bit environment
> running on ESXi 5.0 and are experiencing issues with data file
> corruptions. If you are using linux for production environment can you
> please share which OS/version you are using?

It would probably be a good idea if you could be a bit more specific
about the nature of the corruption and the observations made, and the
version of Cassandra you are using.

As for production envs; lots of people are bound to use various
environments in production; I suppose the only interesting bit would
be if someone uses RHEL 6.1 specifically? I mean I can say that I've
run Cassandra on Debian Squeeze in production, but that doesn't really
help you ;)

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


read/write counts

2011-12-11 Thread Feng Qu
Hello,

When I use nodetool cfstas, I see read/write for both keyspace and column 
family. I assume both number are counted across the ring, but I saw different 
read/write counts showed on one node comparing to other 7 nodes.

node 1,2,4-8:
Keyspace: ks
        Read Count: 44285565
        Read Latency: 1.4984792287509485 ms.
        Write Count: 161096052
        Write Latency: 0.006321300412750028 ms.
        Pending Tasks: 0
                Column Family: Events
                Read Count: 44219534

                Read Latency: NaN ms.
                Write Count: 43245679
                Write Latency: 0.012 ms.

Node 3:
Keyspace: ks

        Read Count: 44190641
        Read Latency: 1.738313580810018 ms.
        Write Count: 136735281
        Write Latency: 0.007389342133285994 ms.
                Column Family: Events

                Read Count: 44125278

                Read Latency: NaN ms.
                Write Count: 36991005
                Write Latency: 0.010 ms.

So my questions are:
1) Are KS level counts and CF level counts for whole cluster or just for an 
individual node?
2) Why I see different counts from different nodes if counts are at KS level?
 
Feng

Consistence for node shutdown and startup

2011-12-11 Thread Jason Tang
Hi

   Here is the case, if we have only two nodes, which share the data (write
one, read one),
   node One  node Two
|  Stopped Continue working and update the
data.
|  stopped  stopped
|  start working   stopped
|  update data stopped
|  startedstart working
v

 How about the conflict data when the two node on line separately. How it
synchronized by two nodes when they both on line finally?

BRs
//Tang Weiqiang


Re: Consistence for node shutdown and startup

2011-12-11 Thread Peter Schuller
>  How about the conflict data when the two node on line separately. How it
> synchronized by two nodes when they both on line finally?

Briefly, it's based on timestamp conflict resolution. This may be a
good resource:

   http://www.datastax.com/docs/1.0/dml/about_writes#about-transactions

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: read/write counts

2011-12-11 Thread Peter Schuller
> 1) Are KS level counts and CF level counts for whole cluster or just for an
> individual node?

Individual node.

Also note that the CF level counts will refer to local reads/writes
submitted to the node, while the statistics you get from StorageProxy
(in JMX) are for requests routed. In general, you will see a
magnification by a factor of RF on the local statistics (in aggregate)
relative to the StorageProxy stats.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: Meaning of values in tpstats

2011-12-11 Thread Peter Schuller
>> With the slicing, I'm not sure off the top of my head. I'm sure
>> someone else can chime in. For e.g. a multi-get, they end up as
>> independent tasks.
>
> So if I multiget 10 keys, they are fetched in //, consolidated by the
> coodinator and then sent back ?

Took me a while to figure out that // == "parallel" :)

I'm pretty sure (but not entirely, I'd have to check the code) that
the request is forwarded as one request to the necessary node(s); what
I was saying rather was that the individual gets get queued up as
individual tasks to be executed internally in the different stages.
That does lead to parallelism locally on the node (subject to the
concurrent reader setting.


>  Agreed, I followed someone suggestion some time ago to reduce my batch
> sizes and it has helped tremendoulsy. I'm now doing multigetslices in
> batchers of 512 instead of 5000 and I find I no longer have Pendings up so
> high. The most I see now is a couple hundred.

In general, the best balance will depend on the situation. For example
the benefit of batching increases as the latency to the cluster (and
within it) increases, and the negative effects increase as you have
higher demands of low latency on other traffic to the cluster.


-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


Re: CPU bound workload

2011-12-11 Thread Peter Schuller
> Regarding 2), I may be running into this since data updates are very
> localized by design. I've distributed the keys per storage load but I'm
> going to have to distribute them by read/write load since the workload is
> all but random and I'm using BOP. However, I never see an IO bottle neck
> when using iostat, see below.

Ah, I keep always assuming random partitions since it is a very common
case (just to be sure: unless you specifically want the ordering
despite the downsides, you generally want to default to the random
partitioner).

> I've got two processes doing writes in parallel. The one we are currently
> discussing ("Process A") only writes while the other one ("Process B") reads
> 2 to 4x more data than it writes.
>
> Process A typically looks like this (numbers come from Hector). Each line
> below is one cassandra batch ie one Hector Mutator.execute():
> 15:15:53 Wrote 86 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160
> (153 usecs)
> 15:15:53 Wrote 90 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160 (97
> usecs)
> 15:15:54 Wrote 85 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160
> (754 usecs)
> 15:15:54 Wrote 81 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160
> (561 usecs)
> 15:15:54 Wrote 86 cassandra mutations using host
> 176.31.226.128(176.31.226.128):9160 (130 usecs)
> 15:15:54 Wrote 73 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160 (97
> usecs)
> 15:15:54 Wrote 82 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160 (48
> usecs)
> 15:15:56 Wrote 108 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160
> (1653 usecs)
> 15:15:56 Wrote 84 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160
> (23 usecs)
> I'm pretty sure those are milli-seconds and not micro-seconds as per Hector
> docs (see last two lines & timestamp) which would amount to 500 to 1000
> mutations per second with a min at 65 and a max at 3652.
> Clusterwide, opscenter is reporting 10 writes requests per second in the
> 20mn graph but that can't be right.

I'm not familiar with OpsCenter, but if they seem low I suspect it's
because it's counting requests to the StorageProxy. A batch of
multiple reads is still a single requests to the StorageProxy, so that
stat won't be a reflection of the number of columns (nor rows)
affected. (Again to clarify: I do not know if opscenter is using the
StorageProxy stat; that is my speculation).


> When I get a log like these, there always is a "cluster-freeze" during the
> preceding minute. By "cluster-freeze", I mean that a couple of nodes go to
> 0% utilization (no cpu, no system, no io)

An hypothesis here is that your workload is causing problem for a node
(for example, sudden spikes in memory allocation causing full GC
fallbacks that take time), and both the readers and the writers get
"stuck" on requests to those nodes (once a sufficient number of
requests happen to be destined to those). The result would be that all
other nodes are no longer seeing traffic because the clients aren't
making progress.

> I may be overloading the cluster when Process A runs but I would like to
> understand why so I can do something about it. What I'm trying to figure out
> is:
>  - why would counter writes timeout on 28 or 38s (5 node cluster)
>  - what could cause the cluster to "freeze" during those timeouts
>
> If you have any answers or ideas on how I could find an answer, that would
> be great.

I would first eliminate or confirm any GC hypothesis by running all
nodes with -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps
-XX:+PrintGCDateStamps.

If you can see this happen sufficiently often to
manually/interactively "wait for it", I suggest something as simple as
fireing up an top + iostat for each host and have them on the screen
at the same time, and look for what happens when you see this again.
If the problem is fallback to full GC for example, the affected nodes
should be churning 100% CPU (one core) for several seconds (assuming a
large heap). If there is a sudden burst of disk I/O that is causing a
hiccup (e.g. dirty buffer flushing by linux) this should be visibly
correlated with 'iostat -x -k 1'.

-- 
/ Peter Schuller (@scode, http://worldmodscode.wordpress.com)


plan to switch fro SimpleStrategy to NetworkTopologyStrategy

2011-12-11 Thread Igor

Hi,

This is my first post, so first of all - thanks to Cassandra authors and 
community for their excellent job!


Now to my question... I need a plan for transition from SimpleStrategy 
to NetworkSopologyStrategy (as I have to add two servers from remote 
datacenter with RTT up to 120ms to my cluster).


Cluster consists from 10 nodes in 5 datacenters (2 node per DC), each 
node carry about 4G of data in keyspace


Keyspace: meter:
  Replication Strategy: org.apache.cassandra.locator.SimpleStrategy
  Durable Writes: true
Options: [replication_factor:3]

Originally I planned (and tested) simply rolling restart nodes with 
NetworkTopologyStategy enabled in cassandra.yaml (and proper 
cassandra-topology.properties file in place) and after nodes restarted - 
update my keyspace with option {DC1:1,DC2:1,...}. But my concern is - 
will cassandra begin to move data location right after restart with 
enabled NTS (and should I wait for it during rolling restart?) or only 
after keyspace update with new options?


Or there is some other way?

Thanks!


node stuck "leaving" on 1.0.5

2011-12-11 Thread Bryce Godfrey
I have a dead node I need to remove from the cluster so that I can rebalance 
among the existing servers (can't replace it for a while).

I used nodetool removetoken and it's been stuck in the "leaving" state for over 
a day now.  I've tried a rolling restart, which kicks of some streaming for a 
while under netstats but now even that lists nothing going on.

I'm stuck on what to do next to get this node to finally leave so I can move 
the tokens around.

Only error I see in the system log:

ERROR [Thread-209] 2011-12-11 01:40:34,347 AbstractCassandraDaemon.java (line 
133) Fatal exception in thread Thread[Thread-209,5,main]
java.lang.AssertionError
at 
org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:178)
at 
org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:141)
at 
org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:481)
at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:275)
at org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:237)
at 
org.apache.cassandra.db.DataTracker.addStreamedSSTable(DataTracker.java:242)
at 
org.apache.cassandra.db.ColumnFamilyStore.addSSTable(ColumnFamilyStore.java:920)
at 
org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:141)
at 
org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:103)
at 
org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:184)
at 
org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)