RE: Schematool
Hi, I have a keyspace called Index. I am trying to create it when I create a new cluster from the script that was created on the old cluster. create keyspace Index with placement_strategy = 'SimpleStrategy' and strategy_options = {replication_factor : 3 and durable_writes = true; I get an error [default@City] create keyspace Index ... with placement_strategy = 'SimpleStrategy' ... and strategy_options = {replication_factor : 3} ... and durable_writes = true; Syntax error at position 16: mismatched input 'Index' expecting set null My question is Is Index a reserved word? How can I create this keyspace? I tried 'Index' and "Index" but I still get the error and I am not able to create it. Thanks Michael From: Alain RODRIGUEZ [mailto:arodr...@gmail.com] Sent: Thursday, December 08, 2011 11:52 AM To: user@cassandra.apache.org Subject: Re: Schematool You should be able to use the CLI "show schema yourkeyspace" if your cassandra is recent enough ( >= 0.8 if I remember well. I think it is better if you are in 0.8.7 because this command was fixed a couple of times in 8.6 and 8.7). You can put the "show schema" command into a file and call it with : "cassandra-cli -h yourhost -f yourfile > 20111208schema" then open your output file and remove the 2 or 3 useless lines that are written before "create yourkeyspace". Hope this will be helpful. Alain 2011/12/8 Michael Vaknine Hi, Since schematool has been removed from Cassandra is there a way to extract the schema from a working cluster in order to create a new empty cluster? Thanks Michael
Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH
Hi All, I'm trying to start up Cassandra 1.0.5 on a Cent OS 6 machine. I installed JNA through yum and made a symbolic link to jna.jar in my Cassandra lib directory. When I run "bin/cassandra -f", I get the following: INFO 09:14:31,552 Logging initialized INFO 09:14:31,555 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_29 INFO 09:14:31,555 Heap size: 3405774848/3405774848 INFO 09:14:31,555 Classpath: bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.0.5.jar:bin/../lib/apache-cassandra-clientutil-1.0.5.jar:bin/../lib/apache-cassandra-thrift-1.0.5.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.6.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/jamm-0.2.5.jar Killed If I remove the symlink to JNA, it starts up just fine. Also, I do have entries in my limits.conf for JNA: rootsoftmemlock unlimited roothardmemlock unlimited Has anyone else seen this behavior? Thanks, Caleb Rackliffe | Software Developer M 949.981.0159 | ca...@steelhouse.com [cid:51C98683-7807-40ED-992A-1FBF168FE2BD] <>
Re: Meaning of values in tpstats
Was't that on the 1.0 branch ? I'm still running 0.8x ? @Peter: investigating a little more before answering. Thanks 2011/12/10 Edward Capriolo > There was a recent patch that fixed an issue where counters were hitting > the same natural endpoint rather then being randomized across all of them. > > > On Saturday, December 10, 2011, Peter Schuller < > peter.schul...@infidyne.com> wrote: > >> Pool NameActive Pending Completed Blocked > All > >> time blocked > >> ReadStage27 2166 3565927301 0 > > > > In general, "active" refers to work that is being executed right now, > > "pending" refers to work that is waiting to be executed (go into > > "active"), and completed is the cumulative all-time (since node start) > > count of the number of tasks executed. > > > > With the slicing, I'm not sure off the top of my head. I'm sure > > someone else can chime in. For e.g. a multi-get, they end up as > > independent tasks. > > > > Typically having pending persistently above 0 for ReadStage or > > MutationStage, especially if more than a hand-ful, means that you are > > having a performance issue - either capacity problem or something > > else, as incoming requests will have to wait to be services. Typically > > the most common effect is that you are bottlenecking on I/O and > > ReadStage pending shoots through the roof. > > > > There are exceptions. If you e.g. submit a really large multi-get of > > 5000, that will naturally lead to a spike (and if all 5000 of them > > need to go down to disk, the spike will survive for a bit). If you are > > ONLY doing these queries, that's not a problem per se. But if you are > > also expecting other requests to have low latency, then you want to > > avoid it. > > > > In general, batching is good - but don't overdo it, especially for > > reads, and especially if you're going to disk for the workload. > > > > -- > > / Peter Schuller (@scode, http://worldmodscode.wordpress.com) > > >
RE: Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH
Try root - MEMLOCK 14155776 on /etc/security/limits.conf Michael From: Caleb Rackliffe [mailto:ca...@steelhouse.com] Sent: Sunday, December 11, 2011 11:24 AM To: user@cassandra.apache.org Subject: Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH Hi All, I'm trying to start up Cassandra 1.0.5 on a Cent OS 6 machine. I installed JNA through yum and made a symbolic link to jna.jar in my Cassandra lib directory. When I run "bin/cassandra -f", I get the following: INFO 09:14:31,552 Logging initialized INFO 09:14:31,555 JVM vendor/version: Java HotSpot(TM) 64-Bit Server VM/1.6.0_29 INFO 09:14:31,555 Heap size: 3405774848/3405774848 INFO 09:14:31,555 Classpath: bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib /antlr-3.2.jar:bin/../lib/apache-cassandra-1.0.5.jar:bin/../lib/apache-cassa ndra-clientutil-1.0.5.jar:bin/../lib/apache-cassandra-thrift-1.0.5.jar:bin/. ./lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../li b/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-la ng-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhash map-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar :bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.j ar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar: bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.6.jar:bin/../lib/log4j -1.2.16.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6 .1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/. ./lib/snappy-java-1.0.4.1.jar:bin/../lib/jamm-0.2.5.jar Killed If I remove the symlink to JNA, it starts up just fine. Also, I do have entries in my limits.conf for JNA: rootsoftmemlock unlimited roothardmemlock unlimited Has anyone else seen this behavior? Thanks, Caleb Rackliffe | Software Developer M 949.981.0159 | ca...@steelhouse.com <>
Re: Schematool
This is quite a different subject and the question has already been asked a few days ago (December, 1) : http://www.mail-archive.com/user@cassandra.apache.org/msg19083.html Anyways, you can test it by yourself changing the name of the column family. I don't know more about it by myself. Alain 2011/12/11 Michael Vaknine > Hi, > > I have a keyspace called Index. > > ** ** > > I am trying to create it when I create a new cluster from the script that > was created on the old cluster. > > ** ** > > create keyspace Index > > with placement_strategy = 'SimpleStrategy' > > and strategy_options = {replication_factor : 3 > > and durable_writes = true; > > ** ** > > I get an error > > ** ** > > [default@City] create keyspace Index > > ... with placement_strategy = 'SimpleStrategy' > > ... and strategy_options = {replication_factor : 3} > > ... and durable_writes = true; > > Syntax error at position 16: mismatched input 'Index' expecting set null** > ** > > ** ** > > My question is > > Is Index a reserved word? > > How can I create this keyspace? I tried 'Index' and "Index" but I still > get the error and I am not able to create it. > > ** ** > > Thanks > > Michael > > ** ** > > *From:* Alain RODRIGUEZ [mailto:arodr...@gmail.com] > *Sent:* Thursday, December 08, 2011 11:52 AM > *To:* user@cassandra.apache.org > *Subject:* Re: Schematool > > ** ** > > You should be able to use the CLI "show schema yourkeyspace" if your > cassandra is recent enough ( >= 0.8 if I remember well. I think it is > better if you are in 0.8.7 because this command was fixed a couple of times > in 8.6 and 8.7). > > ** ** > > You can put the "show schema" command into a file and call it with : > "cassandra-cli -h yourhost -f yourfile > 20111208schema" then open your > output file and remove the 2 or 3 useless lines that are written before > "create yourkeyspace". > > ** ** > > Hope this will be helpful. > > ** ** > > Alain > > ** ** > > 2011/12/8 Michael Vaknine > > Hi, > > Since schematool has been removed from Cassandra is there a way to extract > the schema from a working cluster in order to create a new empty cluster? > > Thanks > Michael > > > > ** ** >
Re: memory leaks in 1.0.5
Possible, but unlikely. See https://issues.apache.org/jira/browse/CASSANDRA-3537 for an example of a "memory leak" that wasn't. I didnt get the point. I have slowly increasing memory load on node, no flushable memtables. How it could not be memory leak? Also running nodetool upgradesstables fails with OOM. https://rapidshare.com/files/2558346759/system.log.4.bz2
Re: Atomic Operations in Cassandra
Hi Sylvain, "Writes under the same row key are atomic (*even across column families*) in the sense that they are either all persisted or none are." Is this new feature for 1.x, or it also applies to previous version of Cassandra? Boris On Thu, Dec 8, 2011 at 6:40 PM, Sylvain Lebresne wrote: > On Thu, Dec 8, 2011 at 12:57 AM, Christof Bornhoevd > wrote: > > Hi All, > > > > I'm using Cassandra 1.0.3 (with Hector 0.7). What is the granularity of > > atomic read and write operations with Cassandra. I.e. is the insert or > > update of an individual column an atomic operation (in the sense that it > > either fails or persists completely), or is the insert or update of an > > entire row in a ColumnFamily atomic? > > > > Similarly, if I read multiple columns of the same row, could the read > > operation interfere with a concurrent write operation on these same > columns > > in a way that I might see some old and some new column values? > > Writes under the same row key are atomic (even across column families) in > the > sense that they are either all persisted or none are. Note however that it > is > possible for a insertion to fail for the client (say you get a > TimeoutException) but > for the insertion to still be persisted. > There is however no isolation currently. It is possible for a read to > see a state > where only part of an insertion (even within the same row key) has been > applied. > (CASSANDRA-2893 is open to try to add isolation). > > -- > Sylvain > > > > > Cheers and thanks a lot for any kind help on this! > > Chris >
Moving existing nodes to a different network.
I have an existing cluster of four Cassandra nodes. The machines have both an internal and an external IP, and originally I set them up to use the external network. A little while later I moved them to the internal network by bringing all machines down, changing the config, and bringing them up again. In the logs I found they all said "Changing ownership of token XXX", and nodetool ring reported that the cluster consisted of those four machines on their internal ips. After that, as part of a cleanup process, I moved the tokens on all machines to make sure the cluster was balanced, and it also worked perfectly. However, now I have to temporarily move the cluster back to the external network for a little while. I tried doing the same thing as last time, bringing all nodes down, changing the config (rpc address, gossip address, list of seeds) and bringing them up again, but this resulted in a very confused cluster. When I ran nodetool ring, it reported eight nodes, the four internal ips were marked as down, and the four external were marked as up, but with the token they had when they previously used that ip. Checking the logs, there was no token ownership change, all nodes picked the saved token they had when they last used the external ip, and not the token they should have, the one I moved each server to when on the internal ip. I immediately moved all servers back to the internal IP, and then nodetool reported the same as before, a cluster of four machines, all up, and all on the token they're supposed to have. No mention of the external ips or the old tokens they had there. How do I reset this data? Where is it stored? Why does it store all of this when nodetool doesn't report it? Why does a node store several saved tokens? How do I change their ip without losing any data and without having to do removetoken or similar? One thought I have is to bring down one node, delete the system keyspace, and bring it back up, at which point it would only use what's in the config, but fetch the schema from the other nodes. Or would it also fetch the old information of what token it had when it was on the external ip? Or would something else go wrong? /Henrik
Re: Moving existing nodes to a different network.
I'm running Cassandra 1.0.1 if that makes any difference. /Henrik On Sun, Dec 11, 2011 at 13:16, Henrik Schröder wrote: > I have an existing cluster of four Cassandra nodes. The machines have both > an internal and an external IP, and originally I set them up to use the > external network. A little while later I moved them to the internal network > by bringing all machines down, changing the config, and bringing them up > again. In the logs I found they all said "Changing ownership of token XXX", > and nodetool ring reported that the cluster consisted of those four > machines on their internal ips. After that, as part of a cleanup process, I > moved the tokens on all machines to make sure the cluster was balanced, and > it also worked perfectly. > > However, now I have to temporarily move the cluster back to the external > network for a little while. I tried doing the same thing as last time, > bringing all nodes down, changing the config (rpc address, gossip address, > list of seeds) and bringing them up again, but this resulted in a very > confused cluster. When I ran nodetool ring, it reported eight nodes, the > four internal ips were marked as down, and the four external were marked as > up, but with the token they had when they previously used that ip. Checking > the logs, there was no token ownership change, all nodes picked the saved > token they had when they last used the external ip, and not the token they > should have, the one I moved each server to when on the internal ip. > > I immediately moved all servers back to the internal IP, and then nodetool > reported the same as before, a cluster of four machines, all up, and all on > the token they're supposed to have. No mention of the external ips or the > old tokens they had there. > > How do I reset this data? Where is it stored? Why does it store all of > this when nodetool doesn't report it? Why does a node store several saved > tokens? How do I change their ip without losing any data and without having > to do removetoken or similar? > > One thought I have is to bring down one node, delete the system keyspace, > and bring it back up, at which point it would only use what's in the > config, but fetch the schema from the other nodes. Or would it also fetch > the old information of what token it had when it was on the external ip? Or > would something else go wrong? > > > /Henrik >
Re: Meaning of values in tpstats
Answer below > > Pool NameActive Pending Completed Blocked > All > > time blocked > > ReadStage27 2166 3565927301 0 > With the slicing, I'm not sure off the top of my head. I'm sure > someone else can chime in. For e.g. a multi-get, they end up as > independent tasks. > So if I multiget 10 keys, they are fetched in //, consolidated by the coodinator and then sent back ? Can anyone confirm for multigetslice ? I want to know if batching is counterproductive really. Typically having pending persistently above 0 for ReadStage or > MutationStage, especially if more than a hand-ful, means that you are > having a performance issue - either capacity problem or something > else, as incoming requests will have to wait to be services. Typically > the most common effect is that you are bottlenecking on I/O and > ReadStage pending shoots through the roof. In general, batching is good - but don't overdo it, especially for > reads, and especially if you're going to disk for the workload. > Agreed, I followed someone suggestion some time ago to reduce my batch sizes and it has helped tremendoulsy. I'm now doing multigetslices in batchers of 512 instead of 5000 and I find I no longer have Pendings up so high. The most I see now is a couple hundred.
Re: CPU bound workload
Hi Peter, I'm going to mix the response to your email along with my other email from yesterday since they pertain to the same issue. Sorry this is a little long, but I'm stomped and I'm trying to describe what I've investigated. In a nutshell, in case someone has encountered this and won't read it to the end : a write-heavy process is going the ring to appear to "freeze" (=> utilization = 0%). Its Hector speed4j logs indicate failures and success at max=38s while other read/write processes are all indicating max=28s. It looks like I've got a magic number I can't figure out. You do say "nodes handling the requests". Two things to always keep in > mind is to (1) spread the requests evenly across all members of the > cluster, and (2) if you are doing a lot of work per row key, spread it > around and be concurrent so that you're not hitting a single row at a > time, which will be under the responsibility of a single set of RF > nodes (you want to put load on the entire cluster evently if you want > to maximize throughput). > I'm using Hector to connect to the cluster along with autoDiscover=true. Furthermore, I see in my logs that updates do get sent to multiple nodes so 1) is ok. Regarding 2), I may be running into this since data updates are very localized by design. I've distributed the keys per storage load but I'm going to have to distribute them by read/write load since the workload is all but random and I'm using BOP. However, I never see an IO bottle neck when using iostat, see below. > For starters, what *is* the throughput? How many counter mutations are > you submitting per second? > I've got two processes doing writes in parallel. The one we are currently discussing ("Process A") only writes while the other one ("Process B") reads 2 to 4x more data than it writes. Process A typically looks like this (numbers come from Hector). Each line below is one cassandra batch ie one Hector Mutator.execute(): 15:15:53 Wrote 86 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160 (153 usecs) 15:15:53 Wrote 90 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160 (97 usecs) 15:15:54 Wrote 85 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160 (754 usecs) 15:15:54 Wrote 81 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160 (561 usecs) 15:15:54 Wrote 86 cassandra mutations using host 176.31.226.128(176.31.226.128):9160 (130 usecs) 15:15:54 Wrote 73 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160 (97 usecs) 15:15:54 Wrote 82 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160 (48 usecs) 15:15:56 Wrote 108 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160 (1653 usecs) 15:15:56 Wrote 84 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160 (23 usecs) I'm pretty sure those are milli-seconds and not micro-seconds as per Hector docs (see last two lines & timestamp) which would amount to 500 to 1000 mutations per second with a min at 65 and a max at 3652. Clusterwide, opscenter is reporting 10 writes requests per second in the 20mn graph but that can't be right. Exact number is somewhere in the thousands of keys read per second but my problem with writes is really so big it doesn't matter what the actual number, see below. What's really puzzling is this, found in the logs created by Hector for Process B: Tag Avg(ms) Min Max Std Dev 95th Count WRITE.success_1709.64 0.83 28982.61 6100.55 21052.93 267 READ.success_ 262.6417.25 1343.53 191.79 610.99 637 (+hardly ever any failures) At the same time, for process A, I see this 15:29:07 Tag Avg(ms) Min Max Std Dev 95th Count 15:29:07 WRITE.success_ 584.7613.23 38042.17 4242.24 334.8479 15:29:07 WRITE.fail_ 38008.16 38008.16 38008.16 0.00 38008.16 1 (failures every minute) So there is at least one WRITE which is very very long : 28s for Process B and 38s for Process A. In fact, it looks like a magic timeout number because I see those two numbers all the time in the logs. WRITE.success_1603.61 1.11 28829.06 6069.97 21987.07 152 WRITE.success_ 307.56 0.81 29958.18 2879.9139.98 918 WRITE.success_1664.64 0.88 29953.52 6127.34 20023.88 276 However, I can't link it to anything. My Hector failover timeout is 2s and everything else is just default install values. Even if Hector was backing-off multiple times until it worked, why would I always get the same 28/38 value... When I get a log like these, there always is a "cluster-freeze" during the preceding minute. By "cluster-freeze", I mean that a couple of nodes go to 0% utilization (no cpu, no system, no io) Once I noticed this, I shutdown Process A and watched Process B performance logs. It's all back to normal now: Tag
Re: Atomic Operations in Cassandra
On Sun, Dec 11, 2011 at 12:01 PM, Boris Yen wrote: > Hi Sylvain, > > "Writes under the same row key are atomic (even across column families) in > the > sense that they are either all persisted or none are." > > Is this new feature for 1.x, or it also applies to previous version of > Cassandra? It applies to previous version of Cassandra. -- Sylvain > > Boris > > > On Thu, Dec 8, 2011 at 6:40 PM, Sylvain Lebresne > wrote: >> >> On Thu, Dec 8, 2011 at 12:57 AM, Christof Bornhoevd >> wrote: >> > Hi All, >> > >> > I'm using Cassandra 1.0.3 (with Hector 0.7). What is the granularity of >> > atomic read and write operations with Cassandra. I.e. is the insert or >> > update of an individual column an atomic operation (in the sense that it >> > either fails or persists completely), or is the insert or update of an >> > entire row in a ColumnFamily atomic? >> > >> > Similarly, if I read multiple columns of the same row, could the read >> > operation interfere with a concurrent write operation on these same >> > columns >> > in a way that I might see some old and some new column values? >> >> Writes under the same row key are atomic (even across column families) in >> the >> sense that they are either all persisted or none are. Note however that it >> is >> possible for a insertion to fail for the client (say you get a >> TimeoutException) but >> for the insertion to still be persisted. >> There is however no isolation currently. It is possible for a read to >> see a state >> where only part of an insertion (even within the same row key) has been >> applied. >> (CASSANDRA-2893 is open to try to add isolation). >> >> -- >> Sylvain >> >> > >> > Cheers and thanks a lot for any kind help on this! >> > Chris > >
Re: Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH
i tried my configuration which is working but i am with ubuntu and cassandra 1.0.3 and i am running cassandra under user cassandra. i did not try it with 1.0.5 because i was not able to work with this version and i am waiting to 1.0.6 On Sun, Dec 11, 2011 at 7:50 PM, Caleb Rackliffe wrote: > I changed the value in limit.conf as you suggested, and that seems to have > no effect. Were you thinking that the OS wasn't respecting the "unlimited"? > > > * > Caleb Rackliffe | Software Developer > M 949.981.0159 | ca...@steelhouse.com > ** > * > > From: Michael Vaknine > Reply-To: "user@cassandra.apache.org" > Date: Sun, 11 Dec 2011 05:15:32 -0500 > To: "user@cassandra.apache.org" > Subject: RE: Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH > > Try > > root - MEMLOCK 14155776 > > on /etc/security/limits.conf > > ** ** > > Michael > > ** ** > > *From:* Caleb Rackliffe [mailto:ca...@steelhouse.com] > > *Sent:* Sunday, December 11, 2011 11:24 AM > *To:* user@cassandra.apache.org > *Subject:* Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH > > ** ** > > Hi All, > > ** ** > > I'm trying to start up Cassandra 1.0.5 on a Cent OS 6 machine. I > installed JNA through yum and made a symbolic link to jna.jar in my > Cassandra lib directory. When I run "bin/cassandra -f", I get the > following: > > ** ** > > INFO 09:14:31,552 Logging initialized > > INFO 09:14:31,555 JVM vendor/version: Java HotSpot(TM) 64-Bit Server > VM/1.6.0_29 > > INFO 09:14:31,555 Heap size: 3405774848/3405774848 > > INFO 09:14:31,555 Classpath: > bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.0.5.jar:bin/../lib/apache-cassandra-clientutil-1.0.5.jar:bin/../lib/apache-cassandra-thrift-1.0.5.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.6.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/jamm-0.2.5.jar > > > Killed > > ** ** > > If I remove the symlink to JNA, it starts up just fine. > > ** ** > > Also, I do have entries in my limits.conf for JNA: > > ** ** > > rootsoftmemlock unlimited > > roothardmemlock unlimited > > ** ** > > Has anyone else seen this behavior? > > ** ** > > Thanks, > > ** ** > > *Caleb Rackliffe | Software Developer * > > M 949.981.0159 | ca...@steelhouse.com > > >
Re: Cannot Start Cassandra 1.0.5 with JNA on the CLASSPATH
On Sun, Dec 11, 2011 at 3:23 AM, Caleb Rackliffe wrote: > Hi All, > > I'm trying to start up Cassandra 1.0.5 on a Cent OS 6 machine. I > installed JNA through yum and made a symbolic link to jna.jar in my > Cassandra lib directory. When I run "bin/cassandra -f", I get the > following: > > INFO 09:14:31,552 Logging initialized > INFO 09:14:31,555 JVM vendor/version: Java HotSpot(TM) 64-Bit Server > VM/1.6.0_29 > INFO 09:14:31,555 Heap size: 3405774848/3405774848 > INFO 09:14:31,555 Classpath: > bin/../conf:bin/../build/classes/main:bin/../build/classes/thrift:bin/../lib/antlr-3.2.jar:bin/../lib/apache-cassandra-1.0.5.jar:bin/../lib/apache-cassandra-clientutil-1.0.5.jar:bin/../lib/apache-cassandra-thrift-1.0.5.jar:bin/../lib/avro-1.4.0-fixes.jar:bin/../lib/avro-1.4.0-sources-fixes.jar:bin/../lib/commons-cli-1.1.jar:bin/../lib/commons-codec-1.2.jar:bin/../lib/commons-lang-2.4.jar:bin/../lib/compress-lzf-0.8.4.jar:bin/../lib/concurrentlinkedhashmap-lru-1.2.jar:bin/../lib/guava-r08.jar:bin/../lib/high-scale-lib-1.1.2.jar:bin/../lib/jackson-core-asl-1.4.0.jar:bin/../lib/jackson-mapper-asl-1.4.0.jar:bin/../lib/jamm-0.2.5.jar:bin/../lib/jline-0.9.94.jar:bin/../lib/jna.jar:bin/../lib/json-simple-1.1.jar:bin/../lib/libthrift-0.6.jar:bin/../lib/log4j-1.2.16.jar:bin/../lib/servlet-api-2.5-20081211.jar:bin/../lib/slf4j-api-1.6.1.jar:bin/../lib/slf4j-log4j12-1.6.1.jar:bin/../lib/snakeyaml-1.6.jar:bin/../lib/snappy-java-1.0.4.1.jar:bin/../lib/jamm-0.2.5.jar > Killed > The 'Killed' line is your problem, the OOM killer decided to kill java. You can confirm this in dmesg. You either need more memory or less heap, the reason it's happening instantly with JNA is because all the memory is being allocated up front, but without it you still have a timebomb waiting to go off. -Brandon
Re: CPU bound workload
Interesting development : I changed the maximum size of the batches in "Process A" to get them to go from about 90 per execute() to about 35. All the weird 28s/38s maximum execution times are gone, all timeouts are gone and everything is zipping along just fine. So moral of the story for me is : only batch if you gain something because it might break stuff. Given this work-around, can anyone explain to me why this was happening ? 2011/12/11 Philippe > Hi Peter, > I'm going to mix the response to your email along with my other email from > yesterday since they pertain to the same issue. > Sorry this is a little long, but I'm stomped and I'm trying to describe > what I've investigated. > > In a nutshell, in case someone has encountered this and won't read it to > the end : a write-heavy process is going the ring to appear to "freeze" (=> > utilization = 0%). Its Hector speed4j logs indicate failures and success at > max=38s while other read/write processes are all indicating max=28s. It > looks like I've got a magic number I can't figure out. > > You do say "nodes handling the requests". Two things to always keep in >> mind is to (1) spread the requests evenly across all members of the >> cluster, and (2) if you are doing a lot of work per row key, spread it >> around and be concurrent so that you're not hitting a single row at a >> time, which will be under the responsibility of a single set of RF >> nodes (you want to put load on the entire cluster evently if you want >> to maximize throughput). >> > I'm using Hector to connect to the cluster along with autoDiscover=true. > Furthermore, I see in my logs that updates do get sent to multiple nodes so > 1) is ok. > Regarding 2), I may be running into this since data updates are very > localized by design. I've distributed the keys per storage load but I'm > going to have to distribute them by read/write load since the workload is > all but random and I'm using BOP. However, I never see an IO bottle neck > when using iostat, see below. > > >> For starters, what *is* the throughput? How many counter mutations are >> you submitting per second? >> > I've got two processes doing writes in parallel. The one we are currently > discussing ("Process A") only writes while the other one ("Process B") > reads 2 to 4x more data than it writes. > > Process A typically looks like this (numbers come from Hector). Each line > below is one cassandra batch ie one Hector Mutator.execute(): > 15:15:53 Wrote 86 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160 > (153 usecs) > 15:15:53 Wrote 90 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160 > (97 usecs) > 15:15:54 Wrote 85 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160 > (754 usecs) > 15:15:54 Wrote 81 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160 > (561 usecs) > 15:15:54 Wrote 86 cassandra mutations using host > 176.31.226.128(176.31.226.128):9160 (130 usecs) > 15:15:54 Wrote 73 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160 > (97 usecs) > 15:15:54 Wrote 82 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160 > (48 usecs) > 15:15:56 Wrote 108 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160 > (1653 usecs) > 15:15:56 Wrote 84 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160 > (23 usecs) > I'm pretty sure those are milli-seconds and not micro-seconds as per > Hector docs (see last two lines & timestamp) which would amount to 500 to > 1000 mutations per second with a min at 65 and a max at 3652. > Clusterwide, opscenter is reporting 10 writes requests per second in the > 20mn graph but that can't be right. > > Exact number is somewhere in the thousands of keys read per second but my > problem with writes is really so big it doesn't matter what the actual > number, see below. > > > What's really puzzling is this, found in the logs created by Hector for > Process B: > Tag Avg(ms) Min Max Std > Dev 95th Count > WRITE.success_1709.64 0.83 28982.61 > 6100.55 21052.93 267 > READ.success_ 262.6417.25 1343.53 > 191.79 610.99 637 > (+hardly ever any failures) > > At the same time, for process A, I see this > 15:29:07 Tag Avg(ms) Min > Max Std Dev 95th Count > 15:29:07 WRITE.success_ 584.7613.23 > 38042.17 4242.24 334.8479 > 15:29:07 WRITE.fail_ 38008.16 38008.16 > 38008.16 0.00 38008.16 1 > (failures every minute) > > So there is at least one WRITE which is very very long : 28s for Process B > and 38s for Process A. In fact, it looks like a magic timeout number > because I see those two numbers all the time in the logs. > WRITE.success_1603.61 1.11 28829.06 > 6069.97 21987.07 152 > WRITE.success_ 307.56 0.81 29958.18 > 2879.9139.98 918 > WRITE.success_
Re: 1.0.3 CLI oddities
Sounds like https://issues.apache.org/jira/browse/CASSANDRA-3558 and the other tickets reference there. On 11/28/2011 05:05 AM, Janne Jalkanen wrote: > Hi! > > (Asked this on IRC too, but didn't get anyone to respond, so here goes...) > > Is it just me, or are these real bugs? > > On 1.0.3, from CLI: "update column family XXX with gc_grace = 36000;" just > says "null" with nothing logged. Previous value is the default. > > Also, on 1.0.3, "update column family XXX with > compression_options={sstable_compression:SnappyCompressor,chunk_length_kb:64};" > returns "Internal error processing system_update_column_family" and log says > "Invalid negative or null chunk_length_kb" (stack trace below) > > Setting the compression options worked on 1.0.0 when I was testing (though my > 64 kB became 64 MB, but I believe this was fixed in 1.0.3.) > > Did the syntax change between 1.0.0 and 1.0.3? Or am I doing something wrong? > > The database was upgraded from 0.6.13 to 1.0.0, then scrubbed, then > compression options set to some CFs, then upgraded to 1.0.3 and trying to set > compression on other CFs. > > Stack trace: > > ERROR [pool-2-thread-68] 2011-11-28 09:59:26,434 Cassandra.java (line 4038) > Internal error processing system_update_column_family > java.lang.RuntimeException: java.util.concurrent.ExecutionException: > java.io.IOException: org.apache.cassandra.config.ConfigurationException: > Invalid negative or null chunk_length_kb > at > org.apache.cassandra.thrift.CassandraServer.applyMigrationOnStage(CassandraServer.java:898) > at > org.apache.cassandra.thrift.CassandraServer.system_update_column_family(CassandraServer.java:1089) > at > org.apache.cassandra.thrift.Cassandra$Processor$system_update_column_family.process(Cassandra.java:4032) > at > org.apache.cassandra.thrift.Cassandra$Processor.process(Cassandra.java:2889) > at > org.apache.cassandra.thrift.CustomTThreadPoolServer$WorkerProcess.run(CustomTThreadPoolServer.java:187) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > Caused by: java.util.concurrent.ExecutionException: java.io.IOException: > org.apache.cassandra.config.ConfigurationException: Invalid negative or null > chunk_length_kb > at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222) > at java.util.concurrent.FutureTask.get(FutureTask.java:83) > at > org.apache.cassandra.thrift.CassandraServer.applyMigrationOnStage(CassandraServer.java:890) > ... 7 more > Caused by: java.io.IOException: > org.apache.cassandra.config.ConfigurationException: Invalid negative or null > chunk_length_kb > at > org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:78) > at org.apache.cassandra.db.migration.Migration.apply(Migration.java:156) > at > org.apache.cassandra.thrift.CassandraServer$2.call(CassandraServer.java:883) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > ... 3 more > Caused by: org.apache.cassandra.config.ConfigurationException: Invalid > negative or null chunk_length_kb > at > org.apache.cassandra.io.compress.CompressionParameters.validateChunkLength(CompressionParameters.java:167) > at > org.apache.cassandra.io.compress.CompressionParameters.create(CompressionParameters.java:52) > at org.apache.cassandra.config.CFMetaData.apply(CFMetaData.java:796) > at > org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:74) > ... 7 more > ERROR [MigrationStage:1] 2011-11-28 09:59:26,434 AbstractCassandraDaemon.java > (line 133) Fatal exception in thread Thread[MigrationStage:1,5,main] > java.io.IOException: org.apache.cassandra.config.ConfigurationException: > Invalid negative or null chunk_length_kb > at > org.apache.cassandra.db.migration.UpdateColumnFamily.applyModels(UpdateColumnFamily.java:78) > at org.apache.cassandra.db.migration.Migration.apply(Migration.java:156) > at > org.apache.cassandra.thrift.CassandraServer$2.call(CassandraServer.java:883) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:680) > Caused by: org.apache.cassandra.config.ConfigurationException: Invalid > negative or null chunk_length_kb > at > org.apache.cassandra.io.compress.CompressionParameters.validateChunkLength(CompressionParameters.java:167) > at
cassandra in production environment
Hi, We are currently testing cassandra in RHEL 6.1 64 bit environment running on ESXi 5.0 and are experiencing issues with data file corruptions. If you are using linux for production environment can you please share which OS/version you are using? thanks Ramesh
Re: cassandra in production environment
> We are currently testing cassandra in RHEL 6.1 64 bit environment > running on ESXi 5.0 and are experiencing issues with data file > corruptions. If you are using linux for production environment can you > please share which OS/version you are using? It would probably be a good idea if you could be a bit more specific about the nature of the corruption and the observations made, and the version of Cassandra you are using. As for production envs; lots of people are bound to use various environments in production; I suppose the only interesting bit would be if someone uses RHEL 6.1 specifically? I mean I can say that I've run Cassandra on Debian Squeeze in production, but that doesn't really help you ;) -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
read/write counts
Hello, When I use nodetool cfstas, I see read/write for both keyspace and column family. I assume both number are counted across the ring, but I saw different read/write counts showed on one node comparing to other 7 nodes. node 1,2,4-8: Keyspace: ks Read Count: 44285565 Read Latency: 1.4984792287509485 ms. Write Count: 161096052 Write Latency: 0.006321300412750028 ms. Pending Tasks: 0 Column Family: Events Read Count: 44219534 Read Latency: NaN ms. Write Count: 43245679 Write Latency: 0.012 ms. Node 3: Keyspace: ks Read Count: 44190641 Read Latency: 1.738313580810018 ms. Write Count: 136735281 Write Latency: 0.007389342133285994 ms. Column Family: Events Read Count: 44125278 Read Latency: NaN ms. Write Count: 36991005 Write Latency: 0.010 ms. So my questions are: 1) Are KS level counts and CF level counts for whole cluster or just for an individual node? 2) Why I see different counts from different nodes if counts are at KS level? Feng
Consistence for node shutdown and startup
Hi Here is the case, if we have only two nodes, which share the data (write one, read one), node One node Two | Stopped Continue working and update the data. | stopped stopped | start working stopped | update data stopped | startedstart working v How about the conflict data when the two node on line separately. How it synchronized by two nodes when they both on line finally? BRs //Tang Weiqiang
Re: Consistence for node shutdown and startup
> How about the conflict data when the two node on line separately. How it > synchronized by two nodes when they both on line finally? Briefly, it's based on timestamp conflict resolution. This may be a good resource: http://www.datastax.com/docs/1.0/dml/about_writes#about-transactions -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: read/write counts
> 1) Are KS level counts and CF level counts for whole cluster or just for an > individual node? Individual node. Also note that the CF level counts will refer to local reads/writes submitted to the node, while the statistics you get from StorageProxy (in JMX) are for requests routed. In general, you will see a magnification by a factor of RF on the local statistics (in aggregate) relative to the StorageProxy stats. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: Meaning of values in tpstats
>> With the slicing, I'm not sure off the top of my head. I'm sure >> someone else can chime in. For e.g. a multi-get, they end up as >> independent tasks. > > So if I multiget 10 keys, they are fetched in //, consolidated by the > coodinator and then sent back ? Took me a while to figure out that // == "parallel" :) I'm pretty sure (but not entirely, I'd have to check the code) that the request is forwarded as one request to the necessary node(s); what I was saying rather was that the individual gets get queued up as individual tasks to be executed internally in the different stages. That does lead to parallelism locally on the node (subject to the concurrent reader setting. > Agreed, I followed someone suggestion some time ago to reduce my batch > sizes and it has helped tremendoulsy. I'm now doing multigetslices in > batchers of 512 instead of 5000 and I find I no longer have Pendings up so > high. The most I see now is a couple hundred. In general, the best balance will depend on the situation. For example the benefit of batching increases as the latency to the cluster (and within it) increases, and the negative effects increase as you have higher demands of low latency on other traffic to the cluster. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
Re: CPU bound workload
> Regarding 2), I may be running into this since data updates are very > localized by design. I've distributed the keys per storage load but I'm > going to have to distribute them by read/write load since the workload is > all but random and I'm using BOP. However, I never see an IO bottle neck > when using iostat, see below. Ah, I keep always assuming random partitions since it is a very common case (just to be sure: unless you specifically want the ordering despite the downsides, you generally want to default to the random partitioner). > I've got two processes doing writes in parallel. The one we are currently > discussing ("Process A") only writes while the other one ("Process B") reads > 2 to 4x more data than it writes. > > Process A typically looks like this (numbers come from Hector). Each line > below is one cassandra batch ie one Hector Mutator.execute(): > 15:15:53 Wrote 86 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160 > (153 usecs) > 15:15:53 Wrote 90 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160 (97 > usecs) > 15:15:54 Wrote 85 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160 > (754 usecs) > 15:15:54 Wrote 81 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160 > (561 usecs) > 15:15:54 Wrote 86 cassandra mutations using host > 176.31.226.128(176.31.226.128):9160 (130 usecs) > 15:15:54 Wrote 73 cassandra mutations using host X.Y.Z.97(X.Y.Z.97):9160 (97 > usecs) > 15:15:54 Wrote 82 cassandra mutations using host X.Y.Z.96(X.Y.Z.96):9160 (48 > usecs) > 15:15:56 Wrote 108 cassandra mutations using host X.Y.Z.95(X.Y.Z.95):9160 > (1653 usecs) > 15:15:56 Wrote 84 cassandra mutations using host X.Y.Z.109(X.Y.Z.109):9160 > (23 usecs) > I'm pretty sure those are milli-seconds and not micro-seconds as per Hector > docs (see last two lines & timestamp) which would amount to 500 to 1000 > mutations per second with a min at 65 and a max at 3652. > Clusterwide, opscenter is reporting 10 writes requests per second in the > 20mn graph but that can't be right. I'm not familiar with OpsCenter, but if they seem low I suspect it's because it's counting requests to the StorageProxy. A batch of multiple reads is still a single requests to the StorageProxy, so that stat won't be a reflection of the number of columns (nor rows) affected. (Again to clarify: I do not know if opscenter is using the StorageProxy stat; that is my speculation). > When I get a log like these, there always is a "cluster-freeze" during the > preceding minute. By "cluster-freeze", I mean that a couple of nodes go to > 0% utilization (no cpu, no system, no io) An hypothesis here is that your workload is causing problem for a node (for example, sudden spikes in memory allocation causing full GC fallbacks that take time), and both the readers and the writers get "stuck" on requests to those nodes (once a sufficient number of requests happen to be destined to those). The result would be that all other nodes are no longer seeing traffic because the clients aren't making progress. > I may be overloading the cluster when Process A runs but I would like to > understand why so I can do something about it. What I'm trying to figure out > is: > - why would counter writes timeout on 28 or 38s (5 node cluster) > - what could cause the cluster to "freeze" during those timeouts > > If you have any answers or ideas on how I could find an answer, that would > be great. I would first eliminate or confirm any GC hypothesis by running all nodes with -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps. If you can see this happen sufficiently often to manually/interactively "wait for it", I suggest something as simple as fireing up an top + iostat for each host and have them on the screen at the same time, and look for what happens when you see this again. If the problem is fallback to full GC for example, the affected nodes should be churning 100% CPU (one core) for several seconds (assuming a large heap). If there is a sudden burst of disk I/O that is causing a hiccup (e.g. dirty buffer flushing by linux) this should be visibly correlated with 'iostat -x -k 1'. -- / Peter Schuller (@scode, http://worldmodscode.wordpress.com)
plan to switch fro SimpleStrategy to NetworkTopologyStrategy
Hi, This is my first post, so first of all - thanks to Cassandra authors and community for their excellent job! Now to my question... I need a plan for transition from SimpleStrategy to NetworkSopologyStrategy (as I have to add two servers from remote datacenter with RTT up to 120ms to my cluster). Cluster consists from 10 nodes in 5 datacenters (2 node per DC), each node carry about 4G of data in keyspace Keyspace: meter: Replication Strategy: org.apache.cassandra.locator.SimpleStrategy Durable Writes: true Options: [replication_factor:3] Originally I planned (and tested) simply rolling restart nodes with NetworkTopologyStategy enabled in cassandra.yaml (and proper cassandra-topology.properties file in place) and after nodes restarted - update my keyspace with option {DC1:1,DC2:1,...}. But my concern is - will cassandra begin to move data location right after restart with enabled NTS (and should I wait for it during rolling restart?) or only after keyspace update with new options? Or there is some other way? Thanks!
node stuck "leaving" on 1.0.5
I have a dead node I need to remove from the cluster so that I can rebalance among the existing servers (can't replace it for a while). I used nodetool removetoken and it's been stuck in the "leaving" state for over a day now. I've tried a rolling restart, which kicks of some streaming for a while under netstats but now even that lists nothing going on. I'm stuck on what to do next to get this node to finally leave so I can move the tokens around. Only error I see in the system log: ERROR [Thread-209] 2011-12-11 01:40:34,347 AbstractCassandraDaemon.java (line 133) Fatal exception in thread Thread[Thread-209,5,main] java.lang.AssertionError at org.apache.cassandra.db.compaction.LeveledManifest.promote(LeveledManifest.java:178) at org.apache.cassandra.db.compaction.LeveledCompactionStrategy.handleNotification(LeveledCompactionStrategy.java:141) at org.apache.cassandra.db.DataTracker.notifySSTablesChanged(DataTracker.java:481) at org.apache.cassandra.db.DataTracker.replace(DataTracker.java:275) at org.apache.cassandra.db.DataTracker.addSSTables(DataTracker.java:237) at org.apache.cassandra.db.DataTracker.addStreamedSSTable(DataTracker.java:242) at org.apache.cassandra.db.ColumnFamilyStore.addSSTable(ColumnFamilyStore.java:920) at org.apache.cassandra.streaming.StreamInSession.closeIfFinished(StreamInSession.java:141) at org.apache.cassandra.streaming.IncomingStreamReader.read(IncomingStreamReader.java:103) at org.apache.cassandra.net.IncomingTcpConnection.stream(IncomingTcpConnection.java:184) at org.apache.cassandra.net.IncomingTcpConnection.run(IncomingTcpConnection.java:81)