Re: Combining two clusters/keyspaces into single cluster
HI ; Your objective is add the Keyspace 2 to cluster 1. The documentation link being referred to is to add a new datacenter [not applicable to you]. You need to : a. take a snapshot of keyspace 2 on cluster2 b. use sstable loader to copy the keyspace2 onto cluster 1 c. run a 'nodetool repair' on cluster 1 d. de-commission cluster2. You are ready to use cluster 1 [with both keyspaces within it] Hope this helps Jan On Thu, 4/21/16, Arlington Albertson wrote: Subject: Combining two clusters/keyspaces into single cluster To: user@cassandra.apache.org Date: Thursday, April 21, 2016, 6:15 PM Hey Folks, I've been looking through various documentations, but I'm either overlooking something obvious or not wording it correctly, but the gist of my problem is this: I have two cassandra clusters, with two separate keyspaces on EC2. We'll call them as follows: cluster1 (DC name, cluster name, etc...)keyspace1 (only exists on cluster1) cluster2 (DC name, cluster name, etc...)keyspace2 (only exists on cluster2) I need to perform the following:- take keyspace2, and add it to cluster1 so that all nodes can serve the traffic- needs to happen "live" so that I can repoint new instances to the cluster1 endpoints and they'll just start working, and no longer directly use cluster2- eventually, tear down cluster2 (easy with a `nodetool decommission` after verifying all seeds have been changed, etc...) This doc seems to be the closest I've found thus far:https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html Is that the appropriate guide for this and I'm just over thinking it? Or is there something else I should be looking at? Also, this is DSC C* 2.1.13. TIA! -AA
RE: Problem Replacing a Dead Node
Mir; You can take a node out of the cluster with nodetool decommission to a live node, or nodetool removetoken (to any other machine) to remove a dead one. This will assign the ranges the old node was responsible for to other nodes, and replicate the appropriate data there. If decommission is used, the data will stream from the decommissioned node. If removetoken is used, the data will stream from the remaining replicas. Hope this helps Jan/ On Thu, 4/21/16, Anubhav Kale wrote: Subject: RE: Problem Replacing a Dead Node To: "user@cassandra.apache.org" Date: Thursday, April 21, 2016, 6:34 PM #yiv5871637581 #yiv5871637581 -- _filtered #yiv5871637581 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv5871637581 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} #yiv5871637581 #yiv5871637581 p.yiv5871637581MsoNormal, #yiv5871637581 li.yiv5871637581MsoNormal, #yiv5871637581 div.yiv5871637581MsoNormal {margin:0in;margin-bottom:.0001pt;font-size:12.0pt;} #yiv5871637581 a:link, #yiv5871637581 span.yiv5871637581MsoHyperlink {color:blue;text-decoration:underline;} #yiv5871637581 a:visited, #yiv5871637581 span.yiv5871637581MsoHyperlinkFollowed {color:purple;text-decoration:underline;} #yiv5871637581 span.yiv5871637581EmailStyle17 {color:#1F497D;} #yiv5871637581 .yiv5871637581MsoChpDefault {} _filtered #yiv5871637581 {margin:1.0in 1.0in 1.0in 1.0in;} #yiv5871637581 div.yiv5871637581WordSection1 {} #yiv5871637581 Reusing the bootstrapping node could have caused this, but hard to tell. Since you have only 7 nodes, have you tried doing a few rolling restarts of all nodes to let gossip settle ? Also, the node is pingable from other nodes even though it says Unreachable below. Correct ? Based on nodetool status, it appears the node has streamed all the data it needs, but it doesn’t think it has joined the ring yet. Does cqlsh work on that node ? From: Mir Tanvir Hossain [mailto:mir.tanvir.hoss...@gmail.com] Sent: Thursday, April 21, 2016 11:51 AM To: user@cassandra.apache.org Subject: Re: Problem Replacing a Dead Node Here is a bit more detail of the whole situation. I am hoping someone can help me out here. We have a seven node cluster. One the nodes started to have issues but it was running. We decided to add a new node, and remove the problematic node after the new node joins. However, the new node did not join the cluster even after three days. Hence, we decided to go with the replacement option. We shutdown the problematic node. After that, we stopped cassandra on the bootstraping node, deleted all the data, and restarted that node as the replacement node for the problematic node. Since, we reused the bootstrapping node as the replacement node, I am wondering whether that is causing any issue. Any insights are appreciated. This is the output of nodetool describecluster from the replacement node, and two other nodes. mhossain@cassandra-24:~$ nodetool describecluster Cluster Information: Name: App Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.4, 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176] mhossain@cassandra-13:~$ nodetool describecluster Cluster Information: Name: App Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176] UNREACHABLE: [10.0.7.91, 10.0.7.4] mhossain@cassandra-09:~$ nodetool describecluster Cluster Information: Name: App Snitch: org.apache.cassandra.locator.DynamicEndpointSnitch Partitioner: org.apache.cassandra.dht.Murmur3Partitioner Schema versions: 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160, 10.0.7.176] UNREACHABLE: [10.0.7.91, 10.0.7.4] cassandra-24 (10.0.7.4) is the replacement node. 10.0.7.91 is the ip address of the dead node. -Mir On Thu, Apr 21, 2016 at 10:02 AM, Mir Tanvir Hossain wrote: Hi, I am trying to replace a dead node with by following https://docs.datastax.com/en/cassandra/2.0
Re: When are hints written?
HI Bo; you raised 2 questions: 20% system utilization Hints 20% system utilization: For a node or a cluster to have 20% utilization is Normal during peak write operation. Hints: hints are written when a node is unreachable;C* 3.0 has a complete over haul in the way hints have been implemented. Recommend reading up this blog article: http://www.datastax.com/dev/blog/whats-coming-to-cassandra-in-3-0-improved-hint-storage-and-delivery hope this helps Jan/ On Thu, 4/21/16, Jens Rantil wrote: Subject: Re: When are hints written? To: user@cassandra.apache.org Date: Thursday, April 21, 2016, 8:57 AM Hi again Bo, I assume this is the piece of documentation you are referring to? http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_hh_c.html?scroll=concept_ds_ifg_jqx_zj__performance > If a replica node is overloaded or unavailable, and the failure detector has not yet marked it down, then expect most or all writes to that node to fail after the timeout triggered by write_request_timeout_in_ms, which defaults to 10 seconds. During that time, Cassandra writes the hint when the timeout is reached. I'm not an expert on this, but the way I've seen is that hints are written stored as soon as there is _any_ issues writing a mutation (insert/update/delete) to a node. By "issue", that essentially means that a node hasn't acknowledged back to the coordinator that the write succeeded within write_request_timeout_in_ms. This includes TCP/socket timeouts, connection issues or that the node is down. The hints are stored for a maximum timespan defaulting to 3 hours. Cheers, Jens On Thu, Apr 21, 2016 at 8:06 AM Bo Finnerup Madsen wrote: Hi Jens, Thank you for the tip!ALL would definitely cure our hints issue, but as you note, it is not optimal as we are unable to take down nodes without clients failing. I am most probably overlooking something in the documentation, but I cannot see any description of when hints are written other than when a node is marked as being down. And since none of our nodes have been marked as being down (at least according to the logs), I suspect that there is some timeout that governs when hints are written? Regarding your other post: Yes, 3.0.3 is pretty new. But we are new to this cassandra game, and our schema-fu is not strong enough for us to create a schema without using materialized views :) ons. 20. apr. 2016 kl. 17.09 skrev Jens Rantil : Hi Bo, > In our case, I would like for the cluster to wait for the write to be persisted on the relevant nodes before returning an ok to the client. But I don't know which knobs to turn to accomplish this? or if it is even possible :) This is what write consistency option is for. Have a look at https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html. Note, however that if you use ALL, your clients will fail (throw exception, depending on language) as soon as a single partition can't be written. This means you can't do online maintenance of a Cassandra node (such as upgrading it etc.) without experiencing write issues. Cheers,Jens On Wed, Apr 20, 2016 at 3:39 PM Bo Finnerup Madsen wrote: Hi, We have a small 5 node cluster of m4.xlarge clients that receives writes from ~20 clients. The clients will write as fast as they can, and the whole process is limited by the write performance of the cassandra cluster.After we have tweaked our schema to avoid large partitions, the load is going ok and we don't see any warnings or errors in the cassandra logs. But we do see quite a lot of hint handoff activity. During the load, the cassandra nodes are quite loaded, with linux reporting a load as high as 20. I have read the available documentation on how hints works, and to my understanding hints should only be written if a node is down. But as far as I can see, none of the nodes are marked as down during the load. So I suspect I am missing something :)We have configured the servers with write_request_timeout_in_ms: 12 and the clients with a timeout of 13, but still get hints stored. In our case, I would like for the cluster to wait for the write to be persisted on the relevant nodes before returning an ok to the client. But I don't know which knobs to turn to accomplish this? or if it is even possible :) We are running cassandra 3.0.3, with 8Gb heap and a replication factor of 3. Thank you in advance! Yours sincerely, Bo Madsen -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32. -- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
enabling Solr on a DSE C* node
HI Folks; I am trying to have one of my DSE 4.7 C* nodes also function as a Solr node within the cluster. I have followed the docs in vain : https://docs.datastax.com/en/datastax_enterprise/4.0/datastax_enterprise/srch/srchInstall.html Any pointers would help. Thanks Jan
AW: Java GC pauses, reality check
https://www.azul.com/products/zing/order-zing/ At least a list price for zing I found there: 3k$ per year. - Ursprüngliche Nachricht - Von: "Work" Gesendet: 26.11.2016 07:53 An: "user@cassandra.apache.org" Betreff: Re: Java GC pauses, reality check I'm not affiliated with them, I've just been impressed by them. They have done amazing work in performance measurement. They discovered a major flaw in most performance testing ... I've never seen their pricing. But, recently, they made their product available for testing by developers. And the assured me that pricing is on a sliding scale depending upon utilization, and not ridiculous. - James Sent from my iPhone On Nov 25, 2016, at 10:40 PM, Benjamin Roth wrote: This sounds amazing but also expensive - I don't see pricing on their page. Are you able and allowed to tell a rough pricing range? Am 26.11.2016 04:33 schrieb "Harikrishnan Pillai" : We are running azul zing in prod with 1 million reads/s and 100 K writes/s with azul .we never had a major gc above 10 ms . Sent from my iPhone > On Nov 25, 2016, at 3:49 PM, Martin Schröder wrote: > > 2016-11-25 23:38 GMT+01:00 Kant Kodali : >> I would also restate the following sentence "java GC pauses are pretty much >> a fact of life" to "Any GC based system pauses are pretty much a fact of >> life". >> >> I would be more than happy to see if someone can counter prove. > > Azul disagrees. > https://www.azul.com/products/zing/pgc/ > > Best > Martin
AbstractQueryPager in debug.log
Hi, I was looking through our logs today, an some thing that caught my eye are many debug logs like this one: DEBUG [SharedPool-Worker-8] 2017-02-14 12:05:39,330 AbstractQueryPager.java:112 - Got result (1) smaller than page size (5000), considering pager exhausted Those line got logged very often (about 2500 times a minute) and I was wondering if this is just "ok" or if there is something misusing paged results for requests fetching a single record and we should have a look at it. Maybe paging results could be a performance issue? Thanks for any hints, Jan
Re: Count(*) is not working
Hi, could you post the output of nodetool cfstats for the table? Cheers, Jan Am 16.02.2017 um 17:00 schrieb Selvam Raman: > I am not getting count as result. Where i keep on getting n number of > results below. > > Read 100 live rows and 1423 tombstone cells for query SELECT * FROM > keysace.table WHERE token(id) > > token(test:ODP0144-0883E-022R-002/047-052) LIMIT 100 (see > tombstone_warn_threshold) > > On Thu, Feb 16, 2017 at 12:37 PM, Jan Kesten <mailto:j...@dafuer.de>> wrote: > > Hi, > > do you got a result finally? > > Those messages are simply warnings telling you that c* had to read > many tombstones while processing your query - rows that are > deleted but not garbage collected/compacted. This warning gives > you some explanation why things might be much slower than expected > because per 100 rows that count c* had to read about 15 times rows > that were deleted already. > > Apart from that, count(*) is almost always slow - and there is a > default limit of 10.000 rows in a result. > > Do you really need the actual live count? To get a idea you can > always look at nodetool cfstats (but those numbers also contain > deleted rows). > > > Am 16.02.2017 um 13:18 schrieb Selvam Raman: >> Hi, >> >> I want to know the total records count in table. >> >> I fired the below query: >>select count(*) from tablename; >> >> and i have got the below output >> >> Read 100 live rows and 1423 tombstone cells for query SELECT * >> FROM keysace.table WHERE token(id) > >> token(test:ODP0144-0883E-022R-002/047-052) LIMIT 100 (see >> tombstone_warn_threshold) >> >> Read 100 live rows and 1435 tombstone cells for query SELECT * >> FROM keysace.table WHERE token(id) > token(test:2565-AMK-2) LIMIT >> 100 (see tombstone_warn_threshold) >> >> Read 96 live rows and 1385 tombstone cells for query SELECT * >> FROM keysace.table WHERE token(id) > token(test:-2220-UV033/04) >> LIMIT 100 (see tombstone_warn_threshold). >> >> >> >> >> Can you please help me to get the total count of the table. >> >> -- >> Selvam Raman >> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து" > > > > > -- > Selvam Raman > "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Re: How to measure disk space used by a keyspace?
nodetool cfstats would be your best bet. Sum all the column families info., within a keyspace to get to the number you are looking for. Jan/ On Wednesday, July 1, 2015 9:05 AM, graham sanderson wrote: If you are pushing metric data to graphite, there is org.apache.cassandra.metrics.keyspace..LiveDiskSpaceUsed.value … for each node; Easy enough to graph the sum across machines. Metrics/JMX are tied together in C*, so there is an equivalent value exposed via JMX… I don’t know what it is called off the top of my head, but would be something similar to the above. On Jul 1, 2015, at 9:28 AM, sean_r_dur...@homedepot.com wrote: That’s ok for a single node, but to answer the question, “how big is my table across the cluster?” it would be much better if the cluster could provide an answer. Sean Durity From: Jonathan Haddad [mailto:j...@jonhaddad.com] Sent: Monday, June 29, 2015 8:15 AM To: user Subject: Re: How to measure disk space used by a keyspace? If you're looking to measure actual disk space, I'd use the du command, assuming you're on a linux: http://linuxconfig.org/du-1-manual-page On Mon, Jun 29, 2015 at 2:26 AM shahab wrote: Hi, Probably this question has been already asked in the mailing list, but I couldn't find it. The question is how to measure disk-space used by a keyspace, column family wise, excluding snapshots? best,/Shahab The information in this Internet Email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this Email by anyone else is unauthorized. If you are not the intended recipient, any disclosure, copying, distribution or any action taken or omitted to be taken in reliance on it, is prohibited and may be unlawful. When addressed to our clients any opinions or advice contained in this Email are subject to the terms and conditions expressed in any applicable governing The Home Depot terms of business or client engagement letter. The Home Depot disclaims all responsibility and liability for the accuracy and content of this attachment and for any damages or losses arising from any inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other items of a destructive nature, which may be contained in this attachment and shall not be liable for direct, indirect, consequential or special damages in connection with this e-mail message or its attachment.
Re: Stream failure while adding a new node
David ; bring down all the nodes with the exception of the 'seed' node.Now bring up the 10th node. Run 'nodetool status' and wait until this 10th node is UP. Bring up the rest of the nodes after that. Run 'nodetool status' again and check that all the nodes are UP. Alternatively;decommission the 10th node completely.drop it from the Cluster. Build a new node with the same IP and hostname and have it join the running cluster. hope this helpsJan On Wednesday, July 1, 2015 7:56 AM, David CHARBONNIER wrote: #yiv2507924157 #yiv2507924157 -- _filtered #yiv2507924157 {panose-1:2 4 5 3 5 4 6 3 2 4;} _filtered #yiv2507924157 {font-family:Calibri;panose-1:2 15 5 2 2 2 4 3 2 4;} _filtered #yiv2507924157 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 4 2 4;}#yiv2507924157 #yiv2507924157 p.yiv2507924157MsoNormal, #yiv2507924157 li.yiv2507924157MsoNormal, #yiv2507924157 div.yiv2507924157MsoNormal {margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv2507924157 a:link, #yiv2507924157 span.yiv2507924157MsoHyperlink {color:blue;text-decoration:underline;}#yiv2507924157 a:visited, #yiv2507924157 span.yiv2507924157MsoHyperlinkFollowed {color:purple;text-decoration:underline;}#yiv2507924157 p {margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv2507924157 p.yiv2507924157MsoAcetate, #yiv2507924157 li.yiv2507924157MsoAcetate, #yiv2507924157 div.yiv2507924157MsoAcetate {margin:0cm;margin-bottom:.0001pt;font-size:8.0pt;}#yiv2507924157 span.yiv2507924157EmailStyle18 {color:#1F497D;}#yiv2507924157 span.yiv2507924157TextedebullesCar {}#yiv2507924157 .yiv2507924157MsoChpDefault {} _filtered #yiv2507924157 {margin:70.85pt 70.85pt 70.85pt 70.85pt;}#yiv2507924157 div.yiv2507924157WordSection1 {}#yiv2507924157 Hi Alain, We still have the timeout problem in OPSCenter and we still didn’t solve this problem so no we didn’t ran an entire repair with the repair service. And yes, during this try, we’ve set auto_bootstrap to true and ran a repair on the 9th node after it finished streaming. Thank you for your help. Best regards, | | | David CHARBONNIER | | Sysadmin | | T : +33 411 934 200 | | david.charbonn...@rgsystem.com | | | | ZAC Aéroport | | 125 Impasse Adam Smith | | 34470 Pérols - France | | www.rgsystem.com | | | | De : Alain RODRIGUEZ [mailto:arodr...@gmail.com] Envoyé : mardi 30 juin 2015 15:18 À : user@cassandra.apache.org Objet : Re: Stream failure while adding a new node Hi David, Are you sure you ran the repair entirely (9 days + repair logs ok on opscenter server) before adding the 10th node ? This is important to avoid potential data loss ! Did you set auto_bootstrap to true on this 10th node ? C*heers, Alain 2015-06-29 14:54 GMT+02:00 David CHARBONNIER : Hi, We’re using Cassandra 2.0.8.39 through Datastax Enterprise 4.5.1 with a 9 nodes cluster. We need to add a few new nodes to the cluster but we’re experiencing an issue we don’t know how to solve. Here is exactly what we did : - We had 8 nodes and need to add a few ones - We tried to add 9th node but stream stucked a very long time and bootstrap never finish (related to streaming_socket_timeout_in_ms default value in cassandra.yaml) - We ran a solution given by a Datastax’s architect : restart the node with auto_bootstrap set to false and run a repair - After this issue, we ran into pathing the default configuration on all our nodes to avoid this problem and made a rolling restart of the cluster - Then, we tried adding a 10th node but it receives stream from only one node (node2). Here is the logs on this problematic node (node10) : INFO [main] 2015-06-26 15:25:59,490 StreamResultFuture.java (line 87) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Executing streaming plan for Bootstrap INFO [main] 2015-06-26 15:25:59,490 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node6 INFO [main] 2015-06-26 15:25:59,491 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node5 INFO [main] 2015-06-26 15:25:59,492 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node4 INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node3 INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node9 INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node8 INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream #a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node7 INFO [main] 2015-06-26 15:25:59,494
Cassandra 2015 Summit videos
HI Folks could you please point me to the 2015 Cassandra summit held in California. I do see the ones posted for the 2014 & 2013 conferences. ThanksJan
flipping ordering of returned query results
Folks; Need some advice. We have a time-series application that needs the data being returned from C* to be flipped from the typical column based data to be row based. example : C* data : A B C D E F need returned data to be : A D B E C F Any input would be much appreciated. thanks,Jan
Re: Alternative approach to setting up new DC
Jens; I am unsure that you need to enable Replication & also use the sstable loader. You could load the data into the new DC and susbsequently alter the keyspace to replicate from the older DC. Cheers Jan On Thu, 4/21/16, Jens Rantil wrote: Subject: Re: Alternative approach to setting up new DC To: user@cassandra.apache.org Date: Thursday, April 21, 2016, 9:00 AM Hi, I never got any response here, but just wanted to share that I went to a Cassandra meet-up in Stockholm yesterday where I talked to two knowledgable Cassandra people that verified that the approach below should work. The most important thing is that the backup must be fully imported before gc_grace_seconds after when the backup is taken. As of me, I managed to a get a more stable VPN setup and did not have to go down this path. Cheers,Jens On Mon, Apr 18, 2016 at 10:15 AM Jens Rantil wrote: Hi, I am provisioning a new datacenter for an existing cluster. A rather shaky VPN connection is hindering me from making a "nodetool rebuild" bootstrap on the new DC. Interestingly, I have a full fresh database snapshot/backup at the same location as the new DC (transferred outside of the VPN). I am now considering the following approach:Make sure my clients are using the old DC. Provision the new nodes in new DC. ALTER the keyspace to enable replicas on the new DC. This will start replicating all writes from old DC to new DC. Before gc_grace_seconds after operation 3) above, use sstableloader to stream my backup to the new nodes. For safety precaution, do a full repair. Could you see any issues with doing this? Cheers,Jens-- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.-- Jens Rantil Backend Developer @ Tink Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden For urgent matters you can reach me at +46-708-84 18 32.
Re: Does nodetool repair stop the node to answer requests ?
Running a 'nodetool repair' will 'not' bring the node down. Your question: does a nodetool repair make the server stop serving requests, or does it just use a lot of ressources but still serves request Answer: NO, the server will not stop serving requests. It will use some resources but not enough to affect the server serving requests. hope this helpsJan
Re: How to store weather station Details along with monitoring data efficiently?
The model you are using seems OK. Your question: This forces me to enter the wea_name and wea_add for each new row, so how to identify a new row has been created? Answer: You do 'not' need to add the wea_name or wea_address during inserts for every new row. Your insert could only include the Primary & clustered keys and it should be fine. You identify the new row via : Primary & clustered keys. Errata: You could add Longitude & Latitude too to the model to add a level of detail especially since its widely prevalent for weather station data. hope this helps. jan/ On Friday, January 23, 2015 3:14 AM, Srinivasa T N wrote: I forgot, my task at hand is to generate a report of all the weather station's along with the sum of temperatures measured each day. Regards, Seenu. On Fri, Jan 23, 2015 at 2:14 PM, Srinivasa T N wrote: Hi All, I was following the TimeSeries data modelling in PlanetCassandra by Patrick McFadin. Regarding that, I had one query: If I need to store the weather station name also, should it be in the same table, say: create table test (wea_id int, wea_name text, wea_add text, eventday timeuuid, eventtime timeuuid, temp int, PRIMARY KEY ((wea_id, eventday), eventtime) ); This forces me to enter the wea_name and wea_add for each new row, so how to identify a new row has been created? Or is there any better mechanism for modeling the above data? Regards, Seenu.
Re: Controlling the MAX SIZE of sstables after compaction
Parth et al; the folks at Netflix seem to have built a solution for your problem. The Netflix Tech Blog: Aegisthus - A Bulk Data Pipeline out of Cassandra | | | | | | | | | | | The Netflix Tech Blog: Aegisthus - A Bulk Data Pipeline ...By Charles Smith and Jeff Magnusson | | | | View on techblog.netflix.com | Preview by Yahoo | | | | | May want to chase Jeff Magnuson & check if the solution is open sourced. Pl. report back to this forum if you get an answer to the problem. hope this helps. Jan C* Architect On Monday, January 26, 2015 11:25 AM, Robert Coli wrote: On Sun, Jan 25, 2015 at 10:40 PM, Parth Setya wrote: 1. Is there a way to configure the size of sstables created after compaction? No, won'tfix : https://issues.apache.org/jira/browse/CASSANDRA-4897. You could use the "sstablesplit" utility on your One Big SSTable to split it into files of your preferred size. 2. Is there a better approach to generate the report? The major compaction isn't too bad, but something that understands SSTables as an input format would be preferable to sstable2json. 3. What are the flaws with this approach? sstable2json is slow and transforms your data to JSON. =Rob
Re: Fixtures / CI docker
Hi Alain; The requirements are impossible to meet, since you are expected to have a predictable and determinist tests while you need "recent data" (max 1 week old data).Reason: You cannot have a replicable result set when the data is variable on a weekly basis. To obtain a replicable test result, I recommend the following: a) Keep the 'data' expectation to a point in time which is a known quanta. b) Load some data into your cluster & take a snapshot. Reload this snapshot before every Test for consistent results. hope this helps. Jan/C* Architect On Monday, January 26, 2015 10:43 AM, Eric Stevens wrote: I don't have directly relevant advice, especially WRT getting a meaningful and coherent subset of your production data - that's probably too closely coupled with your business logic. Perhaps you can run a testing cluster with a default TTL on all your tables of ~2 weeks, feeding it with real production data so that you have a rolling current snapshot of production. We do this basic strategy to support integration tests with the rest of our platform. We have a data access service with other internal teams acting as customers of that data. But it's hard to write strong tests against this, because it becomes challenging to predict the values which you should expect to get back without rewriting the business logic directly into your tests (and then what exactly are you testing, are you testing your tests?) But our data interaction layer tests all focus around inserting the data under test immediately before the assertions portion of the given test. We use Specs2 as a testing framework, and that gives us access to a very nice "eventually { ... }" syntax which will retry the assertions portion several times with a backoff (so that we can account for the eventually consistent nature of Cassandra, and reduce the number of false failures without having to do test execution speed impacting operations like sleep before assert). Basically our data access layer unit tests are strong and rely only on synthetic data (assert that the response is exact for every value), while integration tests from other systems use much softer tests against real data (more like is there data, and does that data seem to be the right format and for the right time range). On Mon, Jan 26, 2015 at 3:26 AM, Alain RODRIGUEZ wrote: Hi guys, We currently use a CI with tests based on docker containers. We have a C* service "dockerized". Yet we have an issue since we would like 2 things, hard to achieve: - A fix data set to have predictable and determinist tests (that we can repeat at any time with the same result)- A recent data set to perform smoke testing on things services that need "recent data" (max 1 week old data) As our dataset is very big and data is not sorted by dates in SSTable, it is hard to have a coherent extract of the production data. Does anyone of you achieve to have something like this ? For "static" data, we could write queries by hand but I find it more relevant to have a real production extract. Regarding dynamic data we need a process that we could repeat every day / week to update data and have something light enough to keep fastness in containers start. How do you guys do this kind of things ? FWIW we are migrating to 2.0.11 very soon so solutions might use 2.0 features. Any idea is welcome and if you need more info, please ask. C*heers, Alain
Syntax for using JMX term to connect to Cassandra
HI Folks; I am trying to use JMXterm, a command line based tool to script & monitor C* cluster. Would anyone on this forum know the exact syntax to connect to Cassandra Domain using JMXterm ?Please give me an example. I do 'not' intend to use OpsCenter or any other UI based tool. thanksJan
Re: Syntax for using JMX term to connect to Cassandra
Thanks Rob; here is what I am looking for : java -jar /home/user/jmxterm-1.0-alpha-4-uber.jar 10.30.41.52:7199 -O org.apache.cassandra.internal:type=FlushWriter -A CurrentlyBlockedTask It does Not work since there is something wrong with my syntax. However once working, it would be scripted to connect to a large cluster from a single host that would store the results into logs. Any help with a single working example would greatly help. I am running circles around this tool for a couple of hrs now. ThanksJan On Thursday, January 29, 2015 4:45 PM, Robert Coli wrote: On Thu, Jan 29, 2015 at 3:27 PM, Jan wrote: I am trying to use JMXterm, a command line based tool to script & monitor C* cluster. Would anyone on this forum know the exact syntax to connect to Cassandra Domain using JMXterm ? Here's an example from an old JIRA at my shop : 1. Download the jmxterm-1.0-alpha-4-uber.jar from http://wiki.cyclopsgroup.org/jmxterm2. sudo java -jar jmxterm-1.0-alpha-4-uber.jar # then within the tool:3. open 4. bean org.apache.cassandra.db:type=StorageService # or whichever bean you're looking for 5. run setLog4jLevel org.apache.cassandra.db.index.keys.KeySearcher.java DEBUG # example of how to set log level=Rob
Re: Opscenter served reads / second
Mbean: org.apache.cassandra.request Attribute: org.apache.cassandra.request:type=ReadStage Hope this helpsJan/ On Thursday, January 29, 2015 9:13 AM, Batranut Bogdan wrote: Hello, Is there a metric that will show how many reads per second C* serves? Read requests shows how many requests are issued to cassandra, but I want to know how many the cluster can actualy serve .
Re: Syntax for using JMX term to connect to Cassandra
Here is the answer : Put the following into a shell script & it would yield the results : JMXTERM_CMD="get -b org.apache.cassandra.db:type=StorageService -s Load"echo $JMXTERM_CMD | java -jar /home/xyz/jmxterm-1.0-alpha-4-uber.jar -l 10.32.22.45:7199 -v silent -n Variables are : -b bean would change based on what you are looking to monitor -s Attribute being monitored /home/xyz/ location where jmxterm jar file is located -l IP address of C* node & the port configured for JMX monitoring Thanks Robert, your example got me going in the right direction. Hope this helpsJan/ On Thursday, January 29, 2015 5:01 PM, Jan wrote: Thanks Rob; here is what I am looking for : java -jar /home/user/jmxterm-1.0-alpha-4-uber.jar 10.30.41.52:7199 -O org.apache.cassandra.internal:type=FlushWriter -A CurrentlyBlockedTask It does Not work since there is something wrong with my syntax. However once working, it would be scripted to connect to a large cluster from a single host that would store the results into logs. Any help with a single working example would greatly help. I am running circles around this tool for a couple of hrs now. ThanksJan On Thursday, January 29, 2015 4:45 PM, Robert Coli wrote: On Thu, Jan 29, 2015 at 3:27 PM, Jan wrote: I am trying to use JMXterm, a command line based tool to script & monitor C* cluster. Would anyone on this forum know the exact syntax to connect to Cassandra Domain using JMXterm ? Here's an example from an old JIRA at my shop : 1. Download the jmxterm-1.0-alpha-4-uber.jar from http://wiki.cyclopsgroup.org/jmxterm2. sudo java -jar jmxterm-1.0-alpha-4-uber.jar # then within the tool:3. open 4. bean org.apache.cassandra.db:type=StorageService # or whichever bean you're looking for 5. run setLog4jLevel org.apache.cassandra.db.index.keys.KeySearcher.java DEBUG # example of how to set log level=Rob
Re: Timeouts but returned consistency level is invalid
HI Michal; The consistency level defaults to ONE for all write and read operations. However consistency level is also set for the keyspace. Could it be possible that your queries are spanning multiple keyspaces which bear different levels of consistency ? cheersJan C* Architect On Friday, January 30, 2015 1:36 AM, Michał Łowicki wrote: Hi, We're using C* 2.1.2, django-cassandra-engine which in turn uses cqlengine. LOCAL_QUROUM is set as default consistency level. From time to time we get timeouts while talking to the database but what is strange returned consistency level is not LOCAL_QUROUM: code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 3 responses." info={'received_responses': 3, 'required_responses': 4, 'consistency': 'ALL'} code=1200 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 1 responses." info={'received_responses': 1, 'required_responses': 2, 'consistency': 'LOCAL_QUORUM'} code=1100 [Coordinator node timed out waiting for replica nodes' responses] message="Operation timed out - received only 0 responses." info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'} Any idea why it might happen? -- BR, Michał Łowicki
Re: Unable to create a keyspace
Saurabh; a) How exactly are the three nodes hosted. b) Can you take down node 2 and create the keyspace from node 1c) Can you take down node 1 and create the keyspace from node2d) Do the nodes see each other with 'nodetool status' cheersJan/ C* Architect On Saturday, January 31, 2015 5:40 AM, Carlos Rolo wrote: Something that can cause weird behavior is the machine clocks not being properly synced. I didn't read the thread in full detail, so disregard this if it is not the case. --
Re: Cassandra 2.0.11 with stargate-core read writes are slow
HI Asit; Question 1) Am I using the right hardware as of now I am testing say 10 record reads. Answer: Recommend looking at either the 'sar' output logs & watching nodetool cfstats & watching your system.log files to track hardware usage & JVM presssure. As a rule of thumb, its recommeneded to have 8 GB for the C* JVM itself on production systems. Question 3) Is unclear, pl. rephrase the question. hope this helpsJan C* Architect On Saturday, January 31, 2015 5:33 AM, Carlos Rolo wrote: HI Asit, The only help I'm going to give is on point 3), as I have little experience with 2) and 1) depends on a lot of factors. For testing the workload use this: http://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCStress_t.html It probably covers all your testing needs. Regards, Carlos Juzarte RoloCassandra Consultant Pythian - Love your data rolo@pythian | Twitter: cjrolo | Linkedin: linkedin.com/in/carlosjuzarteroloTel: 1649www.pythian.com On Sat, Jan 31, 2015 at 2:49 AM, Asit KAUSHIK wrote: Hi all, We are testing our logging application on 3 node cluster each system is virtual machine with 4 cores and 8GB RAM with RedHat enterprise. Now my question is in 3 parts 1) Am I using the right hardware as of now I am testing say 10 record reads. 2) I am using Stargate-core for full text search is there any slowness observed because of that as ??? 2) How can I simulate the write load I created an application which creates say 20 threads and each tread I insert 1000 records and on each thread I open cluster connection session connection execute 1000 records and close the connection. This takes a lot of time please suggest if I missing something --
Re: Cassandra on Ceph
Colin; Ceph is a block based storage architecture based on RADOS. It comes with its own replication & rebalancing along with a map of the storage layer. Some more details & similarities: a)Ceph stores a client’s data as objects within storage pools. (think of C* partitions)b) Using the CRUSH algorithm, Ceph calculates which placement group should contain the object, (C* primary keys & vnode data distribution) c) and further calculates which Ceph OSD Daemon should store the placement group (C* node locality) d) The CRUSH algorithm enables the Ceph Storage Cluster to scale, rebalance, and recover dynamically (C* big table storage architecture). Summary: C* comes with everything that Ceph provides (with the exception of block storage). There is no value add that Ceph brings to the table that C* does not already provide. I seriously doubt if C* could even work out of the box with yet another level of replication & rebalancing. Hope this helpsJan/ C* Architect On Saturday, January 31, 2015 7:28 PM, Colin Taylor wrote: I may be forced to run Cassandra on top of Ceph. Does anyone have experience / tips with this. Or alternatively, strong reasons why this won't work. cheersColin
Re: Any problem mounting a keyspace directory in ram memory?
HI Gabriel; I don't think Apache Cassandra supports in-memory keyspaces. However Datastax Enterprise does support it. Quoting from Datastax: DataStax Enterprise includes the in-memory option for storing data to and accessing data from memory exclusively. No disk I/O occurs. Consider using the in-memory option for storing a modest amount of data, mostly composed of overwrites, such as an application for mirroring stock exchange data. Only the prices fluctuate greatly while the keys for the data remain relatively constant. Generally, the table you design for use in-memory should have the following characteristics: - Store a small amount of data - Experience a workload that is mostly overwrites - Be heavily trafficked Using the in-memory option | DataStax Enterprise 4.0 Documentation | | | | | | | | | Using the in-memory option | DataStax Enterprise 4.0 DocumentationUsing the in-memory option | | | | View on www.datastax.com | Preview by Yahoo | | | | | hope this helpsJan C* Architect On Sunday, February 1, 2015 1:32 PM, Gabriel Menegatti wrote: Hi guys, Please, does anyone here already mounted a specific keyspace directory to ram memory using tmpfs? Do you see any problem doing so, except by the fact that the data can be lost? Thanks in advance. Regards,Gabriel.
Re: Help on modeling a table
HI Asit; The Partition key is only a part of the performance. Recommend reading this article: Advanced Time Series with Cassandra | | | | | | | | | | | Advanced Time Series with CassandraDataStax - Software, support, and training for Apache Cassandra | | | | View on www.datastax.com | Preview by Yahoo | | | | | hope this helpsJan/ On Monday, February 2, 2015 8:33 AM, Asit KAUSHIK wrote: HI All We are working on a application logging project and this is one of the search tables as below : CREATE TABLE logentries ( logentrytimestamputcguid timeuuid PRIMARY KEY, context text, date_to_hour bigint, durationinseconds float, eventtimestamputc timestamp, ipaddress inet, logentrytimestamputc timestamp, loglevel int, logmessagestring text, logsequence int, message text, modulename text, productname text, searchitems map, servername text, sessionname text, stacktrace text, threadname text, timefinishutc timestamp, timestartutc timestamp, urihostname text, uripathvalue text, uriquerystring text, useragentstring text, username text); I have some queries on the design of this table : 1) Does a timeuuid is a good candidate for partition key as we would be querying other fields with stargate-core full text project This table is actually be used for search like username like '*john' likewise and uing this present model the performance is very slow . Please advise RegardsAsit
Re: OOM and high SSTables count
HI Roni; You mentioned: DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB of RAM and 5GB HEAP. Best practices would be be to:a) have a consistent type of node across both DC's. (CPUs, Memory, Heap & Disk) b) increase heap on DC2 servers to be 8GB for C* Heap The leveled compaction issue is not addressed by this. hope this helps Jan/ On Wednesday, March 4, 2015 8:41 AM, Roni Balthazar wrote: Hi there, We are running C* 2.1.3 cluster with 2 DataCenters: DC1: 30 Servers / DC2 - 10 Servers. DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB of RAM and 5GB HEAP. DC1 nodes have about 1.4TB of data and DC2 nodes 2.3TB. DC2 is used only for backup purposes. There are no reads on DC2. Every writes and reads are on DC1 using LOCAL_ONE and the RF DC1: 2 and DC2: 1. All keyspaces have STCS (Average 20~30 SSTables count each table on both DCs) except one that is using LCS (DC1: Avg 4K~7K SSTables / DC2: Avg 3K~14K SSTables). Basically we are running into 2 problems: 1) High SSTables count on keyspace using LCS (This KS has 500GB~600GB of data on each DC1 node). 2) There are 2 servers on DC1 and 4 servers in DC2 that went down with the OOM error message below: ERROR [SharedPool-Worker-111] 2015-03-04 05:03:26,394 JVMStabilityInspector.java:94 - JVM state determined to be unstable. Exiting forcefully due to: java.lang.OutOfMemoryError: Java heap space at org.apache.cassandra.db.composites.CompoundSparseCellNameType.copyAndMakeWith(CompoundSparseCellNameType.java:186) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.composites.AbstractCompoundCellNameType$CompositeDeserializer.readNext(AbstractCompoundCellNameType.java:286) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.AtomDeserializer.readNext(AtomDeserializer.java:104) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:426) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:350) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:142) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:44) ~[apache-cassandra-2.1.3.jar:2.1.3] at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) ~[guava-16.0.jar:na] at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-16.0.jar:na] at org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:172) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:155) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99) ~[apache-cassandra-2.1.3.jar:2.1.3] at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) ~[guava-16.0.jar:na] at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) ~[guava-16.0.jar:na] at org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:203) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:107) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:81) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:320) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1915) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1748) ~[apache-cassandra-2.1.3.jar:2.1.3] at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:342) ~[apache-cassandra-2.1.3.jar:2.1.3]
Re: Write timeout under load but Read is fine
HI Jaydeep; - look at the i/o on all three nodes - Increase the write_request_timeout_in_ms: 1 - check the time-outs if any on the client inserting the Writes - check the Network for dropped/lost packets hope this helpsJan/ On Wednesday, March 4, 2015 12:26 PM, Jaydeep Chovatia wrote: Hi, In my test program when I increase load then I keep getting few "write timeout" from Cassandra say every 10~15 mins. My read:write ratio is 50:50. My reads are fine but only writes time out. Here is my Cassandra details:Version: 2.0.11 Ring of 3 nodes with RF=3Node configuration: 24 core + 64GB RAM + 2TB "write_request_timeout_in_ms: 5000", rest of Cassandra.yaml configuration is default I've also checked IO on Cassandra nodes and looks very low (around 5%). I've also checked Cassandra log file and do not see any GC happening. Also CPU on Cassandra is low (around 20%). I have 20GB data on each node. My test program creates connection to all three Cassandra nodes and sends read+write request randomly. Any idea what should I look for? Jaydeep
Re: cassandra node jvm stall intermittently
HI Jason; Whats in the log files at the moment jstat shows 100%. What is the activity on the cluster & the node at the specific point in time (reads/ writes/ joins etc) Jan/ On Wednesday, March 4, 2015 5:59 AM, Jason Wee wrote: Hi, our cassandra node using java 7 update 72 and we ran jstat on one of the node, and notice some strange behaviour as indicated by output below. any idea why when eden space stay the same for few seconds like 100% and 18.02% for few seconds? we suspect such "stalling" cause timeout to our cluster. any idea what happened, what went wrong and what could cause this? $ jstat -gcutil 32276 1s 0.00 5.78 91.21 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 4.65 29.66 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 70.88 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 71.58 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 72.15 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 72.33 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 72.73 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 73.20 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 73.71 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 73.84 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 73.91 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.18 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 5.43 12.64 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 69.24 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 78.05 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 78.97 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 79.07 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 79.18 71.09 60.07 2661 73.534 4 0.056
Re: Write timeout under load but Read is fine
Hello Jaydeep; Run cassandra-stress with R/W options enabled for about the same time and check if you have dropped packets. It would eliminate the client as the source of the error & also give you a replicable tool to base subsequent tests/ findings. Jan/ On Thursday, March 5, 2015 12:19 PM, Jaydeep Chovatia wrote: I have tried increasing timeout to 1 but no help. Also verified that there is no network lost packets. Jaydeep On Wed, Mar 4, 2015 at 12:19 PM, Jan wrote: HI Jaydeep; - look at the i/o on all three nodes - Increase the write_request_timeout_in_ms: 1 - check the time-outs if any on the client inserting the Writes - check the Network for dropped/lost packets hope this helpsJan/ On Wednesday, March 4, 2015 12:26 PM, Jaydeep Chovatia wrote: Hi, In my test program when I increase load then I keep getting few "write timeout" from Cassandra say every 10~15 mins. My read:write ratio is 50:50. My reads are fine but only writes time out. Here is my Cassandra details:Version: 2.0.11 Ring of 3 nodes with RF=3Node configuration: 24 core + 64GB RAM + 2TB "write_request_timeout_in_ms: 5000", rest of Cassandra.yaml configuration is default I've also checked IO on Cassandra nodes and looks very low (around 5%). I've also checked Cassandra log file and do not see any GC happening. Also CPU on Cassandra is low (around 20%). I have 20GB data on each node. My test program creates connection to all three Cassandra nodes and sends read+write request randomly. Any idea what should I look for? Jaydeep -- Jaydeep
Re: cassandra node jvm stall intermittently
HI Jason; The single node showing the anomaly is a hint that the problem is probably local to a node (as you suspected). - How many nodes do you have on the ring ? - What is the activity when this occurs - reads / writes/ compactions ? - Is there anything that is unique about this node that makes it different from the other nodes ? - Is this a periodic occurance OR a single occurence - I am trying to determine a pattern about when this shows up. - What is the load distribution the ring (ie: is this node carrying more load than the others). The system.log should have more info., about it. hope this helpsJan/ On Friday, March 6, 2015 4:50 AM, Jason Wee wrote: well, StatusLogger.java started shown in cassandra system.log, MessagingService.java also shown some stage (e.g. read, mutation) dropped. It's strange it only happen in this node but this type of message does not shown in other node log file at the same time... Jason On Thu, Mar 5, 2015 at 4:26 AM, Jan wrote: HI Jason; Whats in the log files at the moment jstat shows 100%. What is the activity on the cluster & the node at the specific point in time (reads/ writes/ joins etc) Jan/ On Wednesday, March 4, 2015 5:59 AM, Jason Wee wrote: Hi, our cassandra node using java 7 update 72 and we ran jstat on one of the node, and notice some strange behaviour as indicated by output below. any idea why when eden space stay the same for few seconds like 100% and 18.02% for few seconds? we suspect such "stalling" cause timeout to our cluster. any idea what happened, what went wrong and what could cause this? $ jstat -gcutil 32276 1s 0.00 5.78 91.21 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 5.78 100.00 70.94 60.07 2657 73.437 4 0.056 73.493 0.00 4.65 29.66 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 70.88 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 71.58 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 72.15 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 72.33 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 72.73 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 73.20 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 73.71 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 73.84 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 73.91 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.18 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 4.65 74.29 71.00 60.07 2659 73.488 4 0.056 73.544 0.00 5.43 12.64 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73.534 4 0.056 73.590 0.00 5.43 18.02 71.09 60.07 2661 73
Pointers on deploying snitch for Multi region cluster
HI Folks; We are planning to deploy a Multi region C* Cluster with nodes on both US coasts. Need some advice : a) As I do not have Public IP address access, is there an alternative way to deploy EC2MultiRegion snitch using Private IP addresses ? b) Has anyone used EC2_Snitch with nodes on either coast & connected multiple VPC's with EC2 instances using IPSec tunnels. Did this work ? c) Has anyone used "Gossiping_File_Property" snitch & got it working successfully in a Multi region deployment. Advice/ gotchas/ input/ do's/ don'ts much appreciated. ThanksJan
Re: Best way to alert/monitor "nodetool status” down.
You could set up an Alert for Node down within OpsCenter. OpsCenter also offers you the option to send an email to a paging system with reminders. Jan/ On Sunday, March 8, 2015 6:10 AM, Vasileios Vlachos wrote: We use Nagios for monitoring, and we call the following through NRPE: #!/bin/bash # Just for reference: # Nodetool's output represents "Status" ans "State" in this order. # Status values: U (up), D (down) # State values: N (normal), L (leaving), J (joining), M (moving) NODETOOL=$(which nodetool); NODES_DOWN=$(${NODETOOL} --host localhost status | grep --count -E '^D[A-Z]'); if [[ ${NODES_DOWN} -gt 0 ]]; then output="CRITICAL - Nodes down: ${NODES_DOWN}"; return_code=2; elif [[ ${NODES_DOWN} -eq 0 ]]; then output="OK - Nodes down: ${NODES_DOWN}"; return_code=0; else output="UNKNOWN - Couldn't retrieve cluster information."; return_code=3; fi echo "${output}"; exit "${return_code}"; I've not used zabbix so I'm not sure the exit codes etc are the same for you. Also, you may need to modify the REGEX slightly depending on the Cassandra version you are using. There must be a way to get this via the JMX console as well, which might be easier for you to monitor. On 07/03/15 00:37, Kevin Burton wrote: What’s the best way to monitor nodetool status being down? IE if a specific server things a node is down (DN). Does this just use JMX? IS there an API we can call? We want to tie it into our zabbix server so we can detect if here is failure. -- Founder/CEO Spinn3r.com Location: San Francisco, CA blog: http://burtonator.wordpress.com … or check out my Google+ profile -- Kind Regards, Vasileios Vlachos IT Infrastructure Engineer MSc Internet & Wireless Computing BEng Electronics Engineering Cisco Certified Network Associate (CCNA)
Re: Deleted snapshot files filling up /var/lib/cassandra
David; all the packaged installations use the /var/lib/cassandra directory. Could you check your yaml config files and see if you are using this default directory for backups May want to change it to a location with more disk space. hope this helpsJan/ On Monday, March 16, 2015 2:52 PM, David Wahler wrote: We have a 16-node, globally-distributed cluster. running Cassandra 2.0.12. We're using the Datastax packages on CentOS 6.5. Even though the total amount of data on each server is only a few hundred MB (as measured by both du and the "load" metric), we're seeing a problem where the disk usage is steadily increasing and eventually filling up the 10GB /var/lib/cassandra partition. Running "lsof" on the Cassandra process shows that it has open file handles for thousands of deleted snapshot files: $ sudo lsof -p 4753 | grep DEL -c 13314 $ sudo lsof -p 4753 | grep DEL | head java 4753 cassandra DEL REG 253,6 538873 /var/lib/cassandra/data/keyspace/cf/snapshots/65bc8170-cc20-11e4-a355-0d37e54cc22e/keyspace-cf-jb-3979-Index.db java 4753 cassandra DEL REG 253,6 538899 /var/lib/cassandra/data/keyspace/cf/snapshots/8cb41770-cc20-11e4-a355-0d37e54cc22e/keyspace-cf-jb-3983-Index.db ...etc... We're not manually creating these snapshots; they're being generated by periodic runs of "nodetool repair -pr". There are some errors in system.log that seem to be related: ERROR [RepairJobTask:10] 2015-03-16 02:02:12,485 RepairJob.java (line 143) Error occurred during snapshot phase java.lang.RuntimeException: Could not create snapshot at /10.1.1.188 at org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:81) at org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:344) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ERROR [AntiEntropySessions:4] 2015-03-16 02:02:12,486 RepairSession.java (line 288) [repair #55a8eb50-cbaa-11e4-9af9-27d7677e5965] session completed with the following error java.io.IOException: Failed during snapshot creation. at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:323) at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:144) at com.google.common.util.concurrent.Futures$4.run(Futures.java:1160) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) ERROR [AntiEntropySessions:4] 2015-03-16 02:02:12,488 CassandraDaemon.java (line 199) Exception in thread Thread[AntiEntropySessions:4,5,RMI Runtime] java.lang.RuntimeException: java.io.IOException: Failed during snapshot creation. at com.google.common.base.Throwables.propagate(Throwables.java:160) at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask.run(FutureTask.java:262) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.IOException: Failed during snapshot creation. at org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:323) at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:144) at com.google.common.util.concurrent.Futures$4.run(Futures.java:1160) ... 3 more Has anyone encountered this problem before? The same stack trace shows up in CASSANDRA-8020, but that bug was supposedly introduced in 2.1.0 and fixed in 2.1.1. In any case, we don't want to upgrade to 2.1.x, since the consensus on this list seems to be that it's not yet production-ready. I'm fairly new to Cassandra, so general troubleshooting tips would also be much appreciated. Thanks, -- David
Re: Problems after trying a migration
Hi David; some input to get back to where you were : a) Start with the French cluster only and get it working with DSE 4.5.1 b) Opscenter keyspace is by default RF1; alter the keyspace to RF3 c) Take a full snapshot of all your nodes & copy the files to a safe location on all the nodes To migrate the data into new cluster: a) Use the same version DSE 4.5.1 in Luxembourg & bring up 1 node at a time. Check that the node has comeup in the new Datacenter.b) Bring up new nodes into the new Datacenter one at a timec) After all your new nodes are UP in Luxembourg, conduct a 'nodetool repair -parallel' d) Check in OpsCenter that you have all your nodes showing up (new and old)e) Start taking down your nodes in France, one at a timef) After all the nodes in France are down, conduct a 'nodetool repair -parallel' again g) Upgrade the nodes in Luxembourg to DSE 4.6.1 h) conduct a 'nodetool repair -parallel' again i) Upgrade to OpsCenter 5.1 Best of luck, hope this helps. Jan/ On Wednesday, March 18, 2015 1:01 PM, Robert Coli wrote: On Wed, Mar 18, 2015 at 9:05 AM, David CHARBONNIER wrote: - New nodes in the other country have been installed like French nodes except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in the other country which means Cassandra version 2.0.8.39 in France and 2.0.12.200 in the other country) This is officially unsupported, and might cause of problems during this process. =Rob
Re: best way to measure repair times?
Ian; to respond to your specific question: You could pipe the output of your repair into a file and subsequently determine the time taken. example: nodetool repair -dc DC1 [2014-07-24 21:59:55,326] Nothing to repair for keyspace 'system' [2014-07-24 21:59:55,617] Starting repair command #2, repairing 490 ranges for keyspace system_traces (seq=true, full=true) [2014-07-24 22:23:14,299] Repair session 323b9490-137e-11e4-88e3-c972e09793ca for range (820981369067266915,822627736366088177] finished [2014-07-24 22:23:14,320] Repair session 38496a61-137e-11e4-88e3-c972e09793ca for range (2506042417712465541,2515941262699962473] finished What to look for: a) Look for the specific name of the Keyspace & the word 'starting repair'b) Look for the word 'finished'. c) Compute the average time per keyspace and you would be able to have a rough idea of how long your repairs would take on a regular basis.This is only for continual operational repair, not the first time its done. hope this helpsJan/ On Thursday, March 19, 2015 12:55 PM, Paulo Motta wrote: From: http://www.datastax.com/dev/blog/modern-hinted-handoff Repair and the fine print At first glance, it may appear that Hinted Handoff lets you safely get away without needing repair. This is only true if you never have hardware failure. Hardware failure means that - We lose “historical” data for which the write has already finished, so there is nothing to tell the rest of the cluster exactly what data has gone missing - We can also lose hints-not-yet-replayed from requests the failed node coordinated With sufficient dedication, you can get by with “only run repair after hardware failure and rely on hinted handoff the rest of the time,” but as your clusters grow (and hardware failure becomes more common) performing repair as a one-off special case will become increasingly difficult to do perfectly. Thus, we continue to recommend running a full repair weekly. 2015-03-19 16:42 GMT-03:00 Robert Coli : On Thu, Mar 19, 2015 at 12:13 PM, Ali Akhtar wrote: Cassandra doesn't guarantee eventual consistency? If you run regularly scheduled repair, it does. If you do not run repair, it does not. Hinted handoff, for example, is considered an optimization for repair, and does not assert that it provides a consistency guarantee. =Rob http://twitter.com/rcolidba -- Paulo Ricardo -- European Master in Distributed Computing Royal Institute of Technology - KTH Instituto Superior Técnico - ISThttp://paulormg.com
Re: active queries
HI Rahul; your question: Can we see active queries on cassandra cluster. Is there any tool? Answer: nodetool tpstats & nodetool cfsstats The nodetool tpstats command provides statistics about the number of active, pending, and completed tasks for each stage of Cassandra operations by thread pool. You should be looking for the very first Row : ReadStage The nodetool cfstats command displays statistics for each table and keyspace. You should be looking for the first row: Read Count: cheersJan/ On Thursday, March 19, 2015 12:13 AM, Rahul Bhardwaj wrote: Hi , Can we see active queries on cassandra cluster. Is there any tool? Please help. Regards:Rahul Bhardwaj Follow IndiaMART.com for latest updates on this and more:Mobile Channel: Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki Kaam Yahin Banta Hai!!!
Re: Delete columns
Benyi ; have you considered using the TTL option in case your columns are meant to be deleted after a predetermined amount of time ? Its probably the easiest way to get the task accomplished. cheersJan On Friday, February 27, 2015 10:38 AM, Benyi Wang wrote: In C* 2.1.2, is there a way you can delete without specifying the row key? create table ( guid text, key1 text, key2 text, data int primary key (guid, key1, key2)); delete from a_table where key1='' and key2=''; I'm trying to avoid doing like this:* query the table to get guids (32 bytes long) * send back delete queries like this delete from a_table where guid in (...) and key1='' and kye2=''. key1 and key2 only have 3~4 values, if I try to create multiple tables like table_kvi_kvj, it will be easy to delete, but results in the large dataset because of the duplicated guids. Because the CQL model will create a cassandra column family like guid, kv1-kv2, .., kvi-kvj, ..., kvn-kvm, ... Is there an API can drop columns in a column familty?
Re: FileNotFoundException
HI Batranut; In both errors you described above the files seem to be missing while compaction is running. Without knowing what else is going on your system, I would presume that this error occurs on this single node only and not your entire cluster. Some guesses:a) You may have a disk corruption problem. Take the node offline and run a diskcheck.b) Take the node offline, wipe it clean of everything and have it rejoin the cluster. Check if the problem recurs. Hope this helpsJan On Tuesday, February 24, 2015 2:56 AM, Batranut Bogdan wrote: Also I must add that grepping the logs for a particular file I see this: INFO [CompactionExecutor:19] 2015-02-24 10:44:35,618 CompactionTask.java (line 120) Compacting [SSTableReader(path='/data/ranks/positions/ranks-positions-jb-339-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-354-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-408-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-286-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-20-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-127-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-357-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-257-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-316-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-41-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-285-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-338-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-180-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-398-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-249-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-284-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-294-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-248-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-377-Data.db'), SSTableReader(path='/data/ranks/positions/ranks-positions-jb-395-Data.d ... also several entries like this in the log after grep.java.lang.RuntimeException: java.io.FileNotFoundException: /data/ranks/positions/ranks-positions-jb-41-Data.db (No such file or directory)Caused by: java.io.FileNotFoundException: /data/ranks/positions/ranks-positions-jb-41-Data.db (No such file or directory) I was grepping for jb-41-Data.db ... seems that this file does not exist for some reason. I must say that when I first added the node I included it's IP in the seeds list. Then I have decommissioned it, removed it's IP from the seed list, deleted all data / commit log / saved caches and started it. Since then I have not manualy deleted any files . Any ideeas? On Tuesday, February 24, 2015 11:46 AM, Batranut Bogdan wrote: Hello all, One of my C* throws a big amount of exceptions like this: ERROR [ReadStage:792] 2015-02-24 10:43:54,183 CassandraDaemon.java (line 199) Exception in thread Thread[ReadStage:792,5,main]java.lang.RuntimeException: java.lang.RuntimeException: java.io.FileNotFoundException: /data/ranks/positions/ranks-positions-jb-174-Data.db (No such file or directory) at org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744)Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: /data/ranks/positions/ranks-positions-jb-174-Data.db (No such file or directory) at org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:47) at org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:48) at org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39) at org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1239) at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:417) at org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:387) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:145) at org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:45) at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) at com.google.common.collect.AbstractIterator.hasNext(A
Re: Cassandra Read Timeout
Yulian; Quote :Raw size is aroung 190MB.There are bigger raws with similar structure ( its index raws , which actually stores keys ) and everything is working fine on them, everything is working also fine on this cf but on other raw. Tables data from CFStats ( First table has bigger raws but works fine , where second has timeout ) : --- You asked: There are bigger raws with similar structure Question: Do you mean bigger rows ? What is the structure of the statuspindexes Keyspace & which table are you querying within it ? you asked: its index raws , which actually stores keysQuestion: do you mean Index rows ? how are you creating Indexes , what type of Indexes ? you asked: Tables data from CFStats, where second has timeout Question: What is the time out value set at & whats different about both these tables ? What are you querying from the second table ? Unfortunately, I have more questions that answers; however despite the sacrilege of using super-columns (lol), there has got to be a logical answer to the Performance problem you are having. Hopefully we could dig in and find an answer . Jan/ On Tuesday, February 24, 2015 12:00 PM, Robert Coli wrote: On Tue, Feb 24, 2015 at 8:50 AM, Yulian Oifa wrote: The structure is the same , the CFs are super column CFs , where key is long ( timestamp to partition the index , so each 11 days new row is created ) , super Column is int32 and columns / values are timeuuids.I am running same queries , getting reversed slice by raw key and super column. Obligatory notice that Super Columns are not really recommended for use. I have no idea if the performance problem you are seeing is related to the use of Super Columns. =Rob
Re: Out of Memory Error While Opening SSTables on Startup
Paul Nickerson; curious, did you get a solution to your problem ? Regards,Jan/ On Tuesday, February 10, 2015 5:48 PM, Flavien Charlon wrote: I already experienced the same problem (hundreds of thousands of SSTables) with Cassandra 2.1.2. It seems to appear when running an incremental repair while there is a medium to high insert load on the cluster. The repair goes in a bad state and starts creating way more SSTables than it should (even when there should be nothing to repair). On 10 February 2015 at 15:46, Eric Stevens wrote: This kind of recovery is definitely not my strong point, so feedback on this approach would certainly be welcome. As I understand it, if you really want to keep that data, you ought to be able to mv it out of the way to get your node online, then move those files in a several thousand at a time, nodetool refresh OpsCenter rollups60 && nodetool compact OpsCenter rollups60; rinse and repeat. This should let you incrementally restore the data in that keyspace without putting so many sstables in there that it ooms your cluster again. On Tue, Feb 10, 2015 at 3:38 PM, Chris Lohfink wrote: yeah... probably just 2.1.2 things and not compactions. Still probably want to do something about the 1.6 million files though. It may be worth just mv/rm'ing to 60 sec rollup data though unless really attached to it. Chris On Tue, Feb 10, 2015 at 4:04 PM, Paul Nickerson wrote: I was having trouble with snapshots failing while trying to repair that table (http://www.mail-archive.com/user@cassandra.apache.org/msg40686.html). I have a repair running on it now, and it seems to be going successfully this time. I am going to wait for that to finish, then try a manual nodetool compact. If that goes successfully, then would it be safe to chalk the lack of compaction on this table in the past up to 2.1.2 problems? ~ Paul Nickerson On Tue, Feb 10, 2015 at 3:34 PM, Chris Lohfink wrote: Your cluster is probably having issues with compactions (with STCS you should never have this many). I would probably punt with OpsCenter/rollups60. Turn the node off and move all of the sstables off to a different directory for backup (or just rm if you really don't care about 1 minute metrics), than turn the server back on. Once you get your cluster running again go back and investigate why compactions stopped, my guess is you hit an exception in past that killed your CompactionExecutor and things just built up slowly until you got to this point. Chris On Tue, Feb 10, 2015 at 2:15 PM, Paul Nickerson wrote: Thank you Rob. I tried a 12 GiB heap size, and still crashed out. There are 1,617,289 files under OpsCenter/rollups60. Once I downgraded Cassandra to 2.1.1 (apt-get install cassandra=2.1.1), I was able to start up Cassandra OK with the default heap size formula. Now my cluster is running multiple versions of Cassandra. I think I will downgrade the rest to 2.1.1. ~ Paul Nickerson On Tue, Feb 10, 2015 at 2:05 PM, Robert Coli wrote: On Tue, Feb 10, 2015 at 11:02 AM, Paul Nickerson wrote: I am getting an out of memory error why I try to start Cassandra on one of my nodes. Cassandra will run for a minute, and then exit without outputting any error in the log file. It is happening while SSTableReader is opening a couple hundred thousand things. ... Does anyone know how I might get Cassandra on this node running again? I'm not very familiar with correctly tuning Java memory parameters, and I'm not sure if that's the right solution in this case anyway. Try running 2.1.1, and/or increasing heap size beyond 8gb. Are there actually that many SSTables on disk? =Rob
Re: Logging client ID for YCSB workloads on Cassandra?
HI Jatin; besides enabling Tracing, is there any other way to get the task done ? (to log the client ID for every operation)Please share with the community the solution, so that we could collectively learn from your experience. cheersJan/ On Friday, February 20, 2015 12:48 PM, Jatin Ganhotra wrote: Never mind, got it working. Thanks :) — Jatin GanhotraGraduate Student, Computer ScienceUniversity of Illinois at Urbana Champaignhttp://jatinganhotra.comhttp://linkedin.com/in/jatinganhotra On Wed, Feb 18, 2015 at 7:09 PM, Jatin Ganhotra wrote: Hi, I'd like to log the client ID for every operation performed by the YCSB on my Cassandra cluster. The purpose is to identify & analyze various other consistency measures other than eventual consistency. I wanted to know if people have done something similar in the past. Or am I missing something really basic here? Please let me know if you need more information. Thanks — Jatin Ganhotra
Re: Cluster status instability
Marcin ; are all your nodes within the same Region ? If not in the same region, what is the Snitch type that you are using ? Jan/ On Thursday, April 2, 2015 3:28 AM, Michal Michalski wrote: Hey Marcin, Are they actually going up and down repeatedly (flapping) or just down and they never come back?There might be different reasons for flapping nodes, but to list what I have at the top of my head right now: 1. Network issues. I don't think it's your case, but you can read about the issues some people are having when deploying C* on AWS EC2 (keyword to look for: phi_convict_threshold) 2. Heavy load. Node is under heavy load because of massive number of reads / writes / bulkloads or e.g. unthrottled compaction etc., which may result in extensive GC. Could any of these be a problem in your case? I'd start from investigating GC logs e.g. to see how long does the "stop the world" full GC take (GC logs should be on by default from what I can see [1]) [1] https://issues.apache.org/jira/browse/CASSANDRA-5319 Michał Kind regards,Michał Michalski,michal.michal...@boxever.com On 2 April 2015 at 11:05, Marcin Pietraszek wrote: Hi! We have 56 node cluster with C* 2.0.13 + CASSANDRA-9036 patch installed. Assume we have nodes A, B, C, D, E. On some irregular basis one of those nodes starts to report that subset of other nodes is in DN state although C* deamon on all nodes is running: A$ nodetool status UN B DN C DN D UN E B$ nodetool status UN A UN C UN D UN E C$ nodetool status DN A UN B UN D UN E After restart of A node, C and D report that A it's in UN and also A claims that whole cluster is in UN state. Right now I don't have any clear steps to reproduce that situation, do you guys have any idea what could be causing such behaviour? How this could be prevented? It seems like when A node is a coordinator and gets request for some data being replicated on C and D it respond with Unavailable exception, after restarting A that problem disapears. -- mp
Re: Combining two clusters/keyspaces into single cluster
Hi, one way I think might work (but not tested in any way by me and there will be some lag / stale data): - create the keyspace2 von cluster1 - use nodetool flush and snapshot on cluster2, remember the timestamp - use sstableloader to write all sstables from cluster2 snapshot to cluster1 - you can repeat last two steps and use sstableload only on tables with mtime > timestamp to add the differencens to cluster1 - shutdown cluster2 when done Of course, data written by old clients to cluster2 wont be available in cluster1 until loading that data into it. Just my 2 cents :) Jan Am 22.04.2016 um 01:15 schrieb Arlington Albertson: Hey Folks, I've been looking through various documentations, but I'm either overlooking something obvious or not wording it correctly, but the gist of my problem is this: I have two cassandra clusters, with two separate keyspaces on EC2. We'll call them as follows: *cluster1* (DC name, cluster name, etc...) *keyspace1* (only exists on cluster1) *cluster2* (DC name, cluster name, etc...) *keyspace2*(only exists on cluster2) I need to perform the following: - take keyspace2, and add it to cluster1 so that all nodes can serve the traffic - needs to happen "live" so that I can repoint new instances to the cluster1 endpoints and they'll just start working, and no longer directly use cluster2 - eventually, tear down cluster2 (easy with a `nodetool decommission` after verifying all seeds have been changed, etc...) This doc seems to be the closest I've found thus far: https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html Is that the appropriate guide for this and I'm just over thinking it? Or is there something else I should be looking at? Also, this is DSC C* 2.1.13. TIA! -AA
Nodetool Cleanup Problem
Hi All, I use cassandra 3.4.When running 'nodetool cleanup' command , see this error? error: Expecting URI in variable: [cassandra.config]. Found[cassandra.yaml]. Please prefix the file with [file:///] for local files and [file:///] for remote files. If you are executing this from an external tool, it needs to set Config.setClientMode(true) to avoid loading configuration. -- StackTrace -- org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in variable: [cassandra.config]. Found[cassandra.yaml]. Please prefix the file with [file:///] for local files and [file:///] for remote files. If you are executing this from an external tool, it needs to set Config.setClientMode(true) to avoid loading configuration. at org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:78) at org.apache.cassandra.config.YamlConfigurationLoader.(YamlConfigurationLoader.java:92) at org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:134) at org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:121) at org.apache.cassandra.config.CFMetaData$Builder.(CFMetaData.java:1160) at org.apache.cassandra.config.CFMetaData$Builder.create(CFMetaData.java:1175) at org.apache.cassandra.config.CFMetaData$Builder.create(CFMetaData.java:1170) at org.apache.cassandra.cql3.statements.CreateTableStatement.metadataBuilder(CreateTableStatement.java:118) at org.apache.cassandra.config.CFMetaData.compile(CFMetaData.java:413) at org.apache.cassandra.schema.SchemaKeyspace.compile(SchemaKeyspace.java:238) at org.apache.cassandra.schema.SchemaKeyspace.(SchemaKeyspace.java:88) at org.apache.cassandra.config.Schema.(Schema.java:96) at org.apache.cassandra.config.Schema.(Schema.java:50) at org.apache.cassandra.tools.nodetool.Cleanup.execute(Cleanup.java:45) at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:248) at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:162) Can anyone help me? Best regards, Jan Ali
Are updates on map columns causing tombstones?
Hi, when I replace the content of a map-valued column (when I replace the complete map), will this create tombstones for those map entries that are not present in the new map? My expectation is 'yes', because the map is laid out as normal columns internally so keys not in the new map should lead to a delete. Is that correct? Jan
Re: Thousands of SSTables generated in only one node
Hi Lahiru, maybe your node was running out of memory before. I saw this behaviour if available heap is low forcing to flush out memtables to sstables quite often. If this is that what is hitting you, you should see that the sstables are really small. To cleanup, nodetool compact would do the job - but if you do not need data from one of the keyspaces at all just drop and recreate it (but look into your data directory if there are snapshots left). Prevent this in future have a close look at heap consumption and maybe give it more memory. HTH, Jan
Re: Thousands of SSTables generated in only one node
Hi Lahiru, 2.1.0 is also quite old (Sep 2014) - and just from my memory I remembered that there was an issue whe had with cold_reads_to_omit: http://grokbase.com/t/cassandra/user/1523sm4y0r/how-to-deal-with-too-many-sstables https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1 That's just a random google hits but maybe that also helps. I ended up with a few thousand sstables smaller than 1MB in size. However I would suggest upgrading to a newer version of cassandra first before diving too deep into this - maybe 2.1.16 or 2.2.8 - as chances are really good your problems will be gone after that. Regards. Jan
Re: Hotspots / Load on Cassandra node
Hi, can you check the size of your data directories on that machine to verify in comparison to the others? Have a look for snapshot directories which could still be there from a former table or keyspace. Regards, Jan Am 26. Oktober 2016 06:53:03 MESZ, schrieb Harikrishnan A : >Hello, >When I am issuing nodetool status, I see the load ( in GB) on one of >the node is high compare to the other nodes in my ring. >I do not see any issues with the Data Modeling, and it looks like the >Partition sizes are almost evenly sized and distributed across the >nodes. Repairs are running properly. >How do I approach and fix this issue?. > >Thanks & Regards,Hari -- Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.
Rust Cassandra Driver?
Hi, I am looking for a driver for the Rust language. I found some projects which seem quite abandoned. Can someone point me to the driver that makes the most sense to look at or help working on? Cheers, Jan
Re: Cluster scaling
Hi Branislav, what is it you would expect? Some thoughts: Batches are often misunderstood, they work well only if they contain only one partition key - think of a batch of different sensor data to one key. If you group batches with many partition keys and/or do large batches this puts high load on the coordinator node with then itself needs to talk to the nodes holding the partitions. This could explain the scaling you see in your second try without batches. Keep in mind that the driver supports executeAsync and ResultSetFutures. Second, put commitlog and data directories on seperate disks when using spindles. Third, have you monitored iostats and cpustats while running your tests? Cheers, Jan Am 08.02.2017 um 16:39 schrieb Branislav Janosik -T (bjanosik - AAP3 INC at Cisco): Hi all, I have a cluster of three nodes and would like to ask some questions about the performance. I wrote a small benchmarking tool in java that mirrors (read, write) operations that we do in the real project. Problem is that it is not scaling like it should. The program runs two tests: one using batch statement and one without using the batch. The operation sequence is: optional select, insert, update, insert. I run the tool on my server with 128 threads (# of threads has no influence on the performance), creating usually 100K resources for testing purposes. The average results (operations per second) with the use of batch statement are: Replication Factor = 1 with readingwithout reading 1-node cluster 37K 46K 2-node cluster 37K 47K 3-node cluster 39K 70K Replication Factor = 2 with readingwithout reading 2-node cluster 21K 40K 3-node cluster 30K 48K The average results (operations per second) without the use of batch statement are: Replication Factor = 1 with readingwithout reading 1-node cluster 31K 20K 2-node cluster 38K 39K 3-node cluster 45K 87K Replication Factor = 2 with readingwithout reading 2-node cluster 19K 22K 3-node cluster 26K 36K The Cassandra VMs specs are: 16 CPUs, 16GB and two 32GB of RAM, at least 30GB of disk space for each node. Non SSD, each VM is on separate physical server. The code is available here https://github.com/bjanosik/CassandraBenchTool.git . It can be built with Maven and then you can use jar in target directory with java -jar target/cassandra-test-1.0-SNAPSHOT-jar-with-dependencies.jar . Thank you for any help. -- Jan Kesten, mailto:j.kes...@enercast.de Tel.: +49 561/4739664-0 FAX: -9 Mobil: +49 160 / 90 98 41 68 enercast GmbH Universitätsplatz 12 D-34127 Kassel HRB15471 http://www.enercast.de Online-Prognosen für erneuerbare Energien Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO) Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank. This e-mail and any attachment may contain confidential and/or privileged information. If you are not the named addressee or if this transmission has been addressed to you in error, please notify us immediately by reply e-mail and then delete this e-mail and any attachment from your system. Please understand that you must not copy this e-mail or any attachment or disclose the contents to any other person. Thank you for your cooperation.
Re: Count(*) is not working
Hi, do you got a result finally? Those messages are simply warnings telling you that c* had to read many tombstones while processing your query - rows that are deleted but not garbage collected/compacted. This warning gives you some explanation why things might be much slower than expected because per 100 rows that count c* had to read about 15 times rows that were deleted already. Apart from that, count(*) is almost always slow - and there is a default limit of 10.000 rows in a result. Do you really need the actual live count? To get a idea you can always look at nodetool cfstats (but those numbers also contain deleted rows). Am 16.02.2017 um 13:18 schrieb Selvam Raman: Hi, I want to know the total records count in table. I fired the below query: select count(*) from tablename; and i have got the below output Read 100 live rows and 1423 tombstone cells for query SELECT * FROM keysace.table WHERE token(id) > token(test:ODP0144-0883E-022R-002/047-052) LIMIT 100 (see tombstone_warn_threshold) Read 100 live rows and 1435 tombstone cells for query SELECT * FROM keysace.table WHERE token(id) > token(test:2565-AMK-2) LIMIT 100 (see tombstone_warn_threshold) Read 96 live rows and 1385 tombstone cells for query SELECT * FROM keysace.table WHERE token(id) > token(test:-2220-UV033/04) LIMIT 100 (see tombstone_warn_threshold). Can you please help me to get the total count of the table. -- Selvam Raman "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
Re: Read after Write inconsistent at times
Hi, are your nodes at high load? Are there any dropped messages (nodetool tpstats) on any node? Also have a look at your system clocks. C* needs them in thight sync - via ntp for example. Side hint: if you use ntp use the same set of upstreams on all of your nodes - ideal your own one. Using pool.ntp.org might lead to minimal dirfts in time across your cluster. Another thing that could help you out is using client side timestamps: https://docs.datastax.com/en/developer/java-driver/3.1/manual/query_timestamps/ (of course only when you are using a single client or all clients are in sync via ntp). Am 24.02.2017 um 07:29 schrieb Charulata Sharma (charshar): Hi All, In my application sometimes I cannot read data that just got inserted. This happens very intermittently. Both write and read use LOCAL QUOROM. We have a cluster of 12 nodes which spans across 2 Data Centers and a RF of 3. Has anyone encountered this problem and if yes what steps have you taken to solve it Thanks, Charu -- Jan Kesten, mailto:j.kes...@enercast.de Tel.: +49 561/4739664-0 FAX: -9 Mobil: +49 160 / 90 98 41 68 enercast GmbH Universitätsplatz 12 D-34127 Kassel HRB15471 http://www.enercast.de Online-Prognosen für erneuerbare Energien Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO) Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank. This e-mail and any attachment may contain confidential and/or privileged information. If you are not the named addressee or if this transmission has been addressed to you in error, please notify us immediately by reply e-mail and then delete this e-mail and any attachment from your system. Please understand that you must not copy this e-mail or any attachment or disclose the contents to any other person. Thank you for your cooperation.
LOCAL_SERIAL
Hi, suppose I have two data centers and want to coordinate a bunch of services in each data center (for example to load data into a per-DC system that is not DC-aware (Solr)). Does it make sense to use CAS functionality with explicit LOCAL_SERIAL to 'elect' a leader per data center to do the work? So instead of saying 'for this query, LOCAL_SERIAL is enough for me' this would be like saying 'I want XYZ to happen exactly once, per data center'. - All services would try to do XYZ, but only one instance *per datacenter* will actually become the leader and succeed. Makes sense? Jan
ClosedChannelExcption while nodetool repair
Hi, I have some problems recently on my cassandra cluster. I am running 12 nodes with 2.2.4 and while repairing with a plain "nodetool repair". In system.log I can find ERROR [STREAM-IN-/172.17.2.233] 2016-01-08 08:32:38,327 StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef] Streaming error occurred java.nio.channels.ClosedChannelException: null on one node and at the same time in the the node mentioned in the Log: INFO [STREAM-IN-/172.17.2.223] 2016-01-08 08:32:38,073 StreamResultFuture.java:168 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef ID#0] Prepare completed. Receiving 2 files(46708049 bytes), sending 2 files(1856721742 bytes) ERROR [STREAM-OUT-/172.17.2.223] 2016-01-08 08:32:38,325 StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef] Streaming error occurred org.apache.cassandra.io.FSReadError: java.io.IOException: Datenübergabe unterbrochen (broken pipe) at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144) ~[apache-cassandra-2.2.4.jar:2.2.4] Full relevant NFO [STREAM-IN-/172.17.2.223] 2016-01-08 08:32:38,073 StreamResultFuture.java:168 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef ID#0] Prepare completed. Receiving 2 files(46708049 bytes), sending 2 files(1856721742 bytes) ERROR [STREAM-OUT-/172.17.2.223] 2016-01-08 08:32:38,325 StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef] Streaming error occurred org.apache.cassandra.io.FSReadError: java.io.IOException: Datenübergabe unterbrochen (broken pipe) at org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144) ~[apache-cassandra-2.2.4.jar:2.2.4] More complete log can be found here: http://pastebin.com/n6DjCCed http://pastebin.com/6rD5XNwU I already did a nodetool scrub. Any suggestions what is causing this? Thanks in advance, Jan
Re: Cassandra is consuming a lot of disk space
Hi Rahul, just an idea, did you have a look at the data directorys on disk (/var/lib/cassandra/data)? It could be that there are some from old keyspaces that have been deleted and snapshoted before. Try something like "du -sh /var/lib/cassandra/data/*" to verify which keyspace is consuming your space. Jan Von meinem iPhone gesendet > Am 14.01.2016 um 07:25 schrieb Rahul Ramesh : > > Thanks for your suggestion. > > Compaction was happening on one of the large tables. The disk space did not > decrease much after the compaction. So I ran an external compaction. The disk > space decreased by around 10%. However it is still consuming close to 750Gb > for load of 250Gb. > > I even restarted cassandra thinking there may be some open files. However it > didnt help much. > > Is there any way to find out why so much of data is being consumed? > > I checked if there are any open files using lsof. There are not any open > files. > > Recovery: > Just a wild thought > I am using replication factor of 2 and I have two nodes. If I delete complete > data on one of the node, will I be able to recover all the data from the > active node? > I don't want to pursue this path as I want to find out the root cause of the > issue! > > > Any help will be greatly appreciated > > Thank you, > > Rahul > > > > > > >> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo wrote: >> You can check if the snapshot exists in the snapshot folder. >> Repairs stream sstables over, than can temporary increase disk space. But I >> think Carlos Alonso might be correct. Running compactions might be the issue. >> >> Regards, >> >> Carlos Juzarte Rolo >> Cassandra Consultant >> >> Pythian - Love your data >> >> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo >> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649 >> www.pythian.com >> >>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso wrote: >>> I'd have a look also at possible running compactions. >>> >>> If you have big column families with STCS then large compactions may be >>> happening. >>> >>> Check it with nodetool compactionstats >>> >>> Carlos Alonso | Software Engineer | @calonso >>> >>>> On 13 January 2016 at 05:22, Kevin O'Connor wrote: >>>> Have you tried restarting? It's possible there's open file handles to >>>> sstables that have been compacted away. You can verify by doing lsof and >>>> grepping for DEL or deleted. >>>> >>>> If it's not that, you can run nodetool cleanup on each node to scan all of >>>> the sstables on disk and remove anything that it's not responsible for. >>>> Generally this would only work if you added nodes recently. >>>> >>>> >>>>> On Tuesday, January 12, 2016, Rahul Ramesh wrote: >>>>> We have a 2 node Cassandra cluster with a replication factor of 2. >>>>> >>>>> The load factor on the nodes is around 350Gb >>>>> >>>>> Datacenter: Cassandra >>>>> == >>>>> Address RackStatus State LoadOwns >>>>> Token >>>>> >>>>> -5072018636360415943 >>>>> 172.31.7.91 rack1 Up Normal 328.5 GB100.00% >>>>> -7068746880841807701 >>>>> 172.31.7.92 rack1 Up Normal 351.7 GB100.00% >>>>> -5072018636360415943 >>>>> >>>>> However,if I use df -h, >>>>> >>>>> /dev/xvdf 252G 223G 17G 94% /HDD1 >>>>> /dev/xvdg 493G 456G 12G 98% /HDD2 >>>>> /dev/xvdh 197G 167G 21G 90% /HDD3 >>>>> >>>>> >>>>> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one >>>>> of the machine and in another machine it is close to 650Gb. >>>>> >>>>> I started repair 2 days ago, after running repair, the amount of disk >>>>> space consumption has actually increased. >>>>> I also checked if this is because of snapshots. nodetool listsnapshot >>>>> intermittently lists a snapshot but it goes away after sometime. >>>>> >>>>> Can somebody please help me understand, >>>>> 1. why so much disk space is consumed? >>>>> 2. Why did it increase after repair? >>>>> 3. Is there any way to recover from this state. >>>>> >>>>> >>>>> Thanks, >>>>> Rahul >> >> >> -- >> >
Re: Cassandra is consuming a lot of disk space
Hi Rahul, it should work as you would expect - simply copy over the sstables from your extra disk to the original one. To minimize downtime of the node you can do something like this: - rsync the files while the node is still running (sstables are immutable) to copy most of the data - edit cassandra.yaml to remove the additional datadir - shutdown the node - rsync again (just for the case, a new sstable got written while the first one was running) - restart HTH Jan Am 14.01.2016 um 08:38 schrieb Rahul Ramesh: > One update. I cleared the snapshot using nodetool clearsnapshot command. > Disk space is recovered now. > > Because of this issue, I have mounted one more drive to the server and > there are some data files there. How can I migrate the data so that I > can decommission the drive? > Will it work if I just copy all the contents in the table directory to > one of the drives? > > Thanks for all the help. > > Regards, > Rahul > > On Thursday 14 January 2016, Rahul Ramesh <mailto:rr.ii...@gmail.com>> wrote: > > Hi Jan, > I checked it. There are no old Key Spaces or tables. > Thanks for your pointer, I started looking inside the directories. I > see lot of snapshots directory inside the table directory. These > directories are consuming space. > > However these snapshots are not shown when I issue listsnapshots > ./bin/nodetool listsnapshots > Snapshot Details: > There are no snapshots > > Can I safely delete those snapshots? why listsnapshots is not > showing the snapshots? Also in future, how can we find out if there > are snapshots? > > Thanks, > Rahul > > > > On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten > wrote: > > Hi Rahul, > > just an idea, did you have a look at the data directorys on disk > (/var/lib/cassandra/data)? It could be that there are some from > old keyspaces that have been deleted and snapshoted before. Try > something like "du -sh /var/lib/cassandra/data/*" to verify > which keyspace is consuming your space. > > Jan > > Von meinem iPhone gesendet > > Am 14.01.2016 um 07:25 schrieb Rahul Ramesh >: > >> Thanks for your suggestion. >> >> Compaction was happening on one of the large tables. The disk >> space did not decrease much after the compaction. So I ran an >> external compaction. The disk space decreased by around 10%. >> However it is still consuming close to 750Gb for load of 250Gb. >> >> I even restarted cassandra thinking there may be some open >> files. However it didnt help much. >> >> Is there any way to find out why so much of data is being >> consumed? >> >> I checked if there are any open files using lsof. There are >> not any open files. >> >> *Recovery:* >> Just a wild thought >> I am using replication factor of 2 and I have two nodes. If I >> delete complete data on one of the node, will I be able to >> recover all the data from the active node? >> I don't want to pursue this path as I want to find out the >> root cause of the issue! >> >> >> Any help will be greatly appreciated >> >> Thank you, >> >> Rahul >> >> >> >> >> >> >> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo > > wrote: >> >> You can check if the snapshot exists in the snapshot folder. >> Repairs stream sstables over, than can temporary increase >> disk space. But I think Carlos Alonso might be correct. >> Running compactions might be the issue. >> >> Regards, >> >> Carlos Juzarte Rolo >> Cassandra Consultant >> >> Pythian - Love your data >> >> rolo@pythian | Twitter: @cjrolo | Linkedin: >> _linkedin.com/in/carlosjuzarterolo >> <http://linkedin.com/in/carlosjuzarterolo>_ >> Mobile: +351 91 891 81 00 >> | Tel: +1 613 565 8696 >> x1649 >> www.pythian.com <http://www.pythian.com/> >> >> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso >> > > wrote: >> >> I'd have a look also at possible running compactions. >> >>
Re: compaction throughput
Keep in mind that compaction in LCS can only run 1 compaction per level. Even if it wants to run more compactions in L0 it might be blocked because it is already running a compaction in L0. BR Jan On 01/16/2016 01:26 AM, Sebastian Estevez wrote: LCS is IO ontensive but CPU is also relevant. On slower disks compaction may not be cpu bound. If you aren't seeing more than one compaction thread at a time, I suspect your system is not compaction bound. all the best, Sebastián On Jan 15, 2016 7:20 PM, "Kai Wang" <mailto:dep...@gmail.com>> wrote: Sebastian, Because I have this impression that LCS is IO intensive and it's recommended only on SSDs. So I am curious to see how far it can stress those SSDs. But it turns out the most expensive part about LCS is not IO bound but CUP bound, or more precisely single core speed bound. This is a little surprising. Of course LCS is still superior in other aspects. On Jan 15, 2016 6:34 PM, "Sebastian Estevez" mailto:sebastian.este...@datastax.com>> wrote: Correct. Why are you concerned with the raw throughput, are you accumulating pending compactions? Are you seeing high sstables per read statistics? all the best, Sebastián On Jan 15, 2016 6:18 PM, "Kai Wang" mailto:dep...@gmail.com>> wrote: Jeff & Sebastian, Thanks for the reply. There are 12 cores but in my case C* only uses one core most of the time. *nodetool compactionstats* shows there's only one compactor running. I can see C* process only uses one core. So I guess I should've asked the question more clearly: 1. Is ~25 M/s a reasonable compaction throughput for one core? 2. Is there any configuration that affects single core compaction throughput? 3. Is concurrent_compactors the only option to parallelize compaction? If so, I guess it's the compaction strategy itself that decides when to parallelize and when to block on one core. Then there's not much we can do here. Thanks. On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa mailto:jeff.ji...@crowdstrike.com>> wrote: With SSDs, the typical recommendation is up to 0.8-1 compactor per core (depending on other load). How many CPU cores do you have? From: Kai Wang Reply-To: "user@cassandra.apache.org <mailto:user@cassandra.apache.org>" Date: Friday, January 15, 2016 at 12:53 PM To: "user@cassandra.apache.org <mailto:user@cassandra.apache.org>" Subject: compaction throughput Hi, I am trying to figure out the bottleneck of compaction on my node. The node is CentOS 7 and has SSDs installed. The table is configured to use LCS. Here is my compaction related configs in cassandra.yaml: compaction_throughput_mb_per_sec: 160 concurrent_compactors: 4 I insert about 10G of data and start observing compaction. *nodetool compaction* shows most of time there is one compaction. Sometimes there are 3-4 (I suppose this is controlled by concurrent_compactors). During the compaction, I see one CPU core is 100%. At that point, disk IO is about 20-25 M/s write which is much lower than the disk is capable of. Even when there are 4 compactions running, I see CPU go to +400% but disk IO is still at 20-25M/s write. I use *nodetool setcompactionthroughput 0* to disable the compaction throttle but don't see any difference. Does this mean compaction is CPU bound? If so 20M/s is kinda low. Is there anyway to improve the throughput? Thanks.
Re: Sudden disk usage
Hi, what kind of compaction strategy do you use? What you are about to see is a compaction likely - think of 4 sstables of 50gb each, compacting those can take up 200g while rewriting the new sstable. After that the old ones are deleted and space will be freed again. If using SizeTieredCompaction you can end up with very huge sstables as I do (>250gb each). In the worst case you could possibly need twice the space - a reason why I set up my monitoring for disk to 45% usage. Just my 2 cents. Jan Von meinem iPhone gesendet > Am 13.02.2016 um 08:48 schrieb Branton Davis : > > One of our clusters had a strange thing happen tonight. It's a 3 node > cluster, running 2.1.10. The primary keyspace has RF 3, vnodes with 256 > tokens. > > This evening, over the course of about 6 hours, disk usage increased from > around 700GB to around 900GB on only one node. I was at a loss as to what > was happening and, on a whim, decided to run nodetool cleanup on the > instance. I had no reason to believe that it was necessary, as no nodes were > added or tokens moved (not intentionally, anyhow). But it immediately > cleared up that extra space. > > I'm pretty lost as to what would have happened here. Any ideas where to look? > > Thanks! >
Re: Forming a cluster of embedded Cassandra instances
Hi, the embedded cassandra to speedup entering the project may will work for developers, we used it for junit. But a simple clone and maven build - I guess it will end in a single node cassandra cluster. Remember cassandra is a distributed database, one will need more than one node to get performance and fault tolerance. Also I would not recommend adding and removing of cluster nodes at high frequency with application start-stop-cycles. To help in getting things up and running, provide a small readme for downloading and starting cassandra. For mac and linux unpacking the tar.gz and running cassandra.sh is not too complicated. Or use a hint to the DataStax Community Edition installers. Apart from installing Java that is a five minute stop to a single node "TestCluster". Configuring a distributed setup is a bit more or a lot more difficult and definitly needs more understanding and planning. Just as a hint and offtopic: I saw people using cassandra as application glue for interprocess communication where every app server started a node (for communication, sessions and as queue and so on). If that is eventually a use case - have a look at hazelcast. Jan Von meinem iPhone gesendet > Am 14.02.2016 um 23:26 schrieb John Sanda : > > The motivation was to make it easy for someone to get up and running quickly > with the project. Clone the git repo, run the maven build, and then you are > all set. It definitely does lower the learning curve for someone just getting > started with a project and who is not really thinking about Cassandra. It > also is convenient for non-devs who need to quickly get the project up and > running. For development, we have people working on Linux, Mac OS X, and > Windows. I am not a Windows user and not even sure if ccm works on Windows, > so ccm can't be the de factor standard for development. > >> On Sun, Feb 14, 2016 at 2:52 PM, Jack Krupansky >> wrote: >> What motivated the use of an embedded instance for development - as opposed >> to simply spawning a process for Cassandra? >> >> >> >> -- Jack Krupansky >> >>> On Sun, Feb 14, 2016 at 2:05 PM, John Sanda wrote: >>> The project I work on day to day uses an embedded instance of Cassandra, >>> but it is intended for primarily for development. We embed Cassandra in a >>> WildFly (i.e., JBoss) server. It is packaged and deployed as an EAR. I >>> personally do not do this. I use and recommend ccm for development. If you >>> do you WildFly, there is also wildfly-cassandra which deploys Cassandra as >>> a custom WildFly extension. In other words it is deployed in WildFly like >>> other subsystems like EJB, web, etc, not like an application. There isn't a >>> whole lot of active development on this, but it could be another option. >>> >>> For production, we have to support single node clusters (not embedded >>> though), and it has been challenging for pretty much all the reasons you >>> find people saying not to do so. >>> >>> As for failure detection and cluster membership changes, are you using the >>> Datastax driver? You can register an event listener with the driver to >>> receive notifications for those things. >>> >>>> On Sat, Feb 13, 2016 at 6:33 PM, Jonathan Haddad >>>> wrote: >>>> +1 to what jack said. Don't mess with embedded till you understand the >>>> basics of the db. You're not making your system any less complex, I'd say >>>> you're most likely going to shoot yourself in the foot. >>>>> On Sat, Feb 13, 2016 at 2:22 PM Jack Krupansky >>>>> wrote: >>>>> HA requires an odd number of replicas - 3, 5, 7 - so that split-brain can >>>>> be avoided. Two nodes would not support HA. You need to be able to reach >>>>> a quorum, which is defined as n/2+1 where n is the number of replicas. >>>>> IOW, you cannot update the data if a quorum cannot be reached. The data >>>>> on any given node needs to be replicated on at least two other nodes. >>>>> >>>>> Embedded Cassandra is only for extremely sophisticated developers - not >>>>> those who are new to Cassandra, with a "superficial understanding". >>>>> >>>>> As a general proposition, you should not be running application code on >>>>> Cassandra nodes. >>>>> >>>>> That said, if any of the senior Cassandra developers wish to personally >>>>> support your efforts towards embedded clusters, they are
Re: Cassandra nodes reduce disks per node
Hi Branton, two cents from me - I didnt look through the script, but for the rsyncs I do pretty much the same when moving them. Since they are immutable I do a first sync while everything is up and running to the new location which runs really long. Meanwhile new ones are created and I sync them again online, much less files to copy now. After that I shutdown the node and my last rsync now has to copy only a few files which is quite fast and so the downtime for that node is within minutes. Jan Von meinem iPhone gesendet > Am 18.02.2016 um 22:12 schrieb Branton Davis : > > Alain, thanks for sharing! I'm confused why you do so many repetitive > rsyncs. Just being cautious or is there another reason? Also, why do you > have --delete-before when you're copying data to a temp (assumed empty) > directory? > >> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ wrote: >> I did the process a few weeks ago and ended up writing a runbook and a >> script. I have anonymised and share it fwiw. >> >> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk >> >> It is basic bash. I tried to have the shortest down time possible, making >> this a bit more complex, but it allows you to do a lot in parallel and just >> do a fast operation sequentially, reducing overall operation time. >> >> This worked fine for me, yet I might have make some errors while making it >> configurable though variables. Be sure to be around if you decide to run >> this. Also I automated this more by using knife (Chef), I hate to repeat >> ops, this is something you might want to consider. >> >> Hope this is useful, >> >> C*heers, >> - >> Alain Rodriguez >> France >> >> The Last Pickle >> http://www.thelastpickle.com >> >> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal : >>> Hey Branton, >>> >>> Please do let us know if you face any problems doing this. >>> >>> Thanks >>> anishek >>> >>>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis >>>> wrote: >>>> We're about to do the same thing. It shouldn't be necessary to shut down >>>> the entire cluster, right? >>>> >>>>> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli >>>>> wrote: >>>>> >>>>> >>>>>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal >>>>>> wrote: >>>>>> To accomplish this can I just copy the data from disk1 to disk2 with in >>>>>> the relevant cassandra home location folders, change the cassanda.yaml >>>>>> configuration and restart the node. before starting i will shutdown the >>>>>> cluster. >>>>> >>>>> Yes. >>>>> >>>>> =Rob >
Thrift composite partition key to cql migration
Hi, while migrating the reminder of thrift operations in my application I came across a point where I cant find a good hint. In our old code we used a composite with two strings as row / partition key and a similar composite as column key like this: public Composite rowKey() { final Composite composite = new Composite(); composite.addComponent(key1, StringSerializer.get()); composite.addComponent(key2, StringSerializer.get()); return composite; } public Composite columnKey() { final Composite composite = new Composite(); composite.addComponent(key3, StringSerializer.get()); composite.addComponent(key4, StringSerializer.get()); return composite; } In cql this columnfamiliy looks like this: CREATE TABLE foo.bar ( key blob, column1 text, column2 text, value blob, PRIMARY KEY (key, column1, column2) ) For the columns key3 and key4 became column1 and column2 - but the old rowkey is presented as blob (I can put it into a hex editor and see that key1 and key2 values are in there). Any pointers to handle this or is this a known issue? I am using now DataStax Java driver for CQL, old connector used thrift. Is there any way to get key1 and key2 back apart from completly rewriting the table? This is what I had expected it to be: CREATE TABLE foo.bar ( key1 text, key2 text, column1 text, column2 text, value blob, PRIMARY KEY ((key1, key2), column1, column2) ) Cheers, Jan
Re: NTP Synchronization Setup Changes
Hi Mickey, I would strongly suggest to setup a NTP server on your site - this is not really a big deal and with some tutorials on the net done quickly. Then configure your cassandra nodes (and all the rest if you like) to use your ntp instead of public ones. As I have learned the hard way - cassandra is not really happy when nodes have different times in some cases. Benefit of this is, that your nodes will keep time in sync even without connection to the internet. Of course "your time" may drift without a proper timesource or connection but all nodes will have the same drift and so no problems with consistency. If your ntp syncs your nodes will be adjusted smoothly. Pro(?)-solution (what I did before): Attach a gps mouse to your ntp server and use that as time source. So you can have synchronized _and_ accurate time without any connection to public ntp servers as the gps satellites are flying atom clocks :) Just my 2 cents, Jan Von meinem iPhone gesendet > Am 31.03.2016 um 03:07 schrieb Mukil Kesavan : > > Hi, > > We run a 3 server cassandra cluster that is initially NTP synced to a single > physical server over LAN. This server does not have connectivity to the > internet for a few hours to sometimes even days. In this state we perform > some schema operations and reads/writes with QUORUM consistency. > > Later on, the physical server has connectivity to the internet and we > synchronize its time to an external NTP server on the internet. > > Are there any issues if this causes a huge time correction on the cassandra > cluster? I know that NTP gradually corrects the time on all the servers. I > just wanted to understand if there were any corner cases that will cause us > to lose data/schema updates when this happens. In particular, we seem to be > having some issues around missing secondary indices at the moment (not all > but some). > > Also, for our situation where we have to work with cassandra for a while > without internet connectivity, what is the preferred NTP architecture/steps > that have worked for you in the field? > > Thanks, > Micky
Re: Large primary keys
Hi Robert, why do you need the actual text as a key? I sounds a bit unatural at least for me. Keep in mind that you cannot do "like" queries on keys in cassandra. For performance and keeping things more readable I would prefer hashing your text and use the hash as key. You should also take into account to store the keys (hashes) in a seperate table per day / hour or something like that, so you can quickly get all keys for a time range. A query without the partition key may be very slow. Jan Am 11.04.2016 um 23:43 schrieb Robert Wille: I have a need to be able to use the text of a document as the primary key in a table. These texts are usually less than 1K, but can sometimes be 10’s of K’s in size. Would it be better to use a digest of the text as the key? I have a background process that will occasionally need to do a full table scan and retrieve all of the texts, so using the digest doesn’t eliminate the need to store the text. Anyway, is it better to keep primary keys small, or is C* okay with large primary keys? Robert
Re: Fwd: Cassandra Load spike
Hi, you should check the "snapshot" directories on your nodes - it is very likely there are some old ones from failed operations taking up some space. Am 15.04.2016 um 01:28 schrieb kavya: Hi, We are running a 6 node cassandra 2.2.4 cluster and we are seeing a spike in the disk Load as per the ‘nodetool status’ command that does not correspond with the actual disk usage. Load reported by nodetool was as high as 3 times actual disk usage on certain nodes. We noticed that the periodic repair failed with below error on running the command : ’nodetool repair -pr’ ERROR [RepairJobTask:2] 2016-04-12 15:46:29,902 RepairRunnable.java:243 - Repair session 64b54d50-0100-11e6-b46e-a511fd37b526 for range (-3814318684016904396,-3810689996127667017] failed with error [….] Validation failed in / org.apache.cassandra.exceptions.RepairException: [….] Validation failed in at org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) ~[apache-cassandra-2.2.4.jar:2.2.4] at org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) ~[apache-cassandra-2.2.4.jar:2.2.4] at org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:410) ~[apache-cassandra-2.2.4.jar:2.2.4] at org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163) ~[apache-cassandra-2.2.4.jar:2.2.4] at org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) ~[apache-cassandra-2.2.4.jar:2.2.4] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_40] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_40] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40 We restarted all nodes in the cluster and ran a full repair which completed successfully without any validation errors, however we still see Load spike on the same nodes after a while. Please advice. Thanks!
Replacing dead node when num_tokens is used
Hello, while trying out cassandra I read about the steps necessary to replace a dead node. In my test cluster I used a setup using num_tokens instead of initial_tokens. How do I replace a dead node in this scenario? Thanks, Jan
Re: Replacing dead node when num_tokens is used
Hello Aaron, thanks for your reply. Found it just an hour ago on my own, yesterday I accidentally looked at the 1.0 docs. Right now my replacement node is streaming from the others - than more testing can follow. Thanks again, Jan
sstablesplit - status
Hi all, I have some problem with really large sstables which dont get compacted anymore and I know there are many duplicated rows in them. Splitting the tables into smaller ones to get them compacted again would help I thought, so I tried sstablesplit, but: cassandra@cassandra01 /tmp/cassandra $ ./apache-cassandra-3.10/tools/bin/sstablesplit lb-388151-big-Data.db Skipping non sstable file lb-388151-big-Data.db No valid sstables to split cassandra@cassandra01 /tmp/cassandra $ sstablesplit lb-388151-big-Data.db Skipping non sstable file lb-388151-big-Data.db No valid sstables to split It seems that sstablesplit cant handle the "new" filename pattern anymore (acutally running 2.2.8 on those nodes). Any hints or other suggestions to split those sstables or get rid of them? Thanks in advance, Jan - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: sstablesplit - status
Hi again, and thanks for the input. It's not tombstoned data I think, but over a really long time very many rows are inserted over and over again - but with some significant pauses between the inserts. I found some examples where a specific row (for example pk=xyz, value=123) exists in more than one or two tables, with exactly the same content but different timestamps. The largest sstables compacted a while ago are now 300-400G in size over some nodes, and it's very unlikely they will be compacted some time soon as there are only one or two sstables of that size on a single node. I think I will try rebootstraping a node to see if that helps. sstablesplit exists in 2.x - but as far as I know is deprecated and in my 3.6 test-cluster it was gone. I was trying sstabledump to have a deeper look - but that says "pre-3.0 SSTabe is not supported" (fair, I am on a 2.2.8 cluster). Jan - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Effect of frequent mutations / memtable
Hi, I am using a updates to a column with a ttl to represent a lock. The owning process keeps updating the lock's TTL as long as it is running. If the process crashes, the lock will timeout and be deleted. Then another process can take over. I have used this pattern very successfully over years with TTLs in the order of tens of seconds. Now I have a use case in mind that would require much smaller TTLs, e.g. 1 or two seconds and I am worried about the increased number of mutations and possible effect on SSTables. However: I'd assume these frequent updates on a cell to mostly happen in the memtable resulting in only occasional manifestation in SSTables. Is that assumption correct and if so, what config parameters should I tweak to keep the memtable from being flushed for longer periods of time? Jan - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Effect of frequent mutations / memtable
Hi Jayesh, On 25 May 2017, at 18:31, Thakrar, Jayesh wrote: Hi Jan, I would suggest looking at using Zookeeper for such a usecase. thanks - yes, it is an alternative. Out of curiosity: since both, Zk and C* implement Paxos to enable such kind of thing, why do you think Zookeeper would be a better fit? Jan See http://zookeeper.apache.org/doc/trunk/recipes.html for some examples. Zookeeper is used for such purposes in Apache HBase (active master), Apache Kafka (active controller), Apache Hadoop, etc. Look for the "Leader Election" usecase. Examples http://techblog.outbrain.com/2011/07/leader-election-with-zookeeper/ https://www.tutorialspoint.com/zookeeper/zookeeper_leader_election.htm Its more/new work, but should be an elegant solution. Hope that helps. Jayesh On 5/25/17, 9:19 AM, "Jan Algermissen" wrote: Hi, I am using a updates to a column with a ttl to represent a lock. The owning process keeps updating the lock's TTL as long as it is running. If the process crashes, the lock will timeout and be deleted. Then another process can take over. I have used this pattern very successfully over years with TTLs in the order of tens of seconds. Now I have a use case in mind that would require much smaller TTLs, e.g. 1 or two seconds and I am worried about the increased number of mutations and possible effect on SSTables. However: I'd assume these frequent updates on a cell to mostly happen in the memtable resulting in only occasional manifestation in SSTables. Is that assumption correct and if so, what config parameters should I tweak to keep the memtable from being flushed for longer periods of time? Jan - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: Effect of frequent mutations / memtable
Jonathan, On 26 May 2017, at 17:00, Jonathan Haddad wrote: If you have a small amount of hot data, enable the row cache. The memtable is not designed to be a cache. You will not see a massive performance impact of writing one to disk. Sstables will be in your page cache, meaning you won't be hitting disk very often. What I (and AFAIU Max, too) am concerned with is very frequent updates on certain cells and their impact on the amount of SSTables created. Suppose I have a row that sees tens of thousands of mutations during the first minutes of its lifetime but isn't changed afterwards. The hope/assumption is that tuning C* can help having all those mutations take place in the memtable so we end up with only a single SSTable in the end (roughly speaking). Besides such an exceptional case I'd consider high-frequent mutations an anti pattern due to the SSTables bloat. Makes sense? Jan On Fri, May 26, 2017 at 7:41 AM Max C wrote: In my case, we're using Cassandra to store QA test data — so the pattern is that we may do a bunch of updates within a few minutes / hours, and then the data will essentially be read-only for the rest of its lifetime (years). My question is the same — do we need to worry about the performance impact of having N mutations written to the SSTable — or will these mutations generally be constrained to the mem table? - Max Hi, I am using a updates to a column with a ttl to represent a lock. The owning process keeps updating the lock's TTL as long as it is running. If the process crashes, the lock will timeout and be deleted. Then another process can take over. I have used this pattern very successfully over years with TTLs in the order of tens of seconds. Now I have a use case in mind that would require much smaller TTLs, e.g. 1 or two seconds and I am worried about the increased number of mutations and possible effect on SSTables. However: I'd assume these frequent updates on a cell to mostly happen in the memtable resulting in only occasional manifestation in SSTables. Is that assumption correct and if so, what config parameters should I tweak to keep the memtable from being flushed for longer periods of time? - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
How to know when repair repaired something?
Hi, is it possible to extract from repair logs the writetime of the writes that needed to be repaired? I have some processes I would like to re-trigger from a time point if repair found problems. Is that useful? Possible? Jan - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Re: How to know when repair repaired something?
On 30 May 2017, at 21:11, Varun Gupta wrote: I am missing the point, why do you want to re-trigger the process post repair. Repair will sync the data correctly. Sorry - I mis-represented that. I want to trigger something else, not repair. I am investigating a CQRS/Event Sourced pattern which C* as a distributed event log and a process reading from that log, changing state in other data bases (Solr, Graph-DB, other C* tables, etc.) Since I do not want to write to/read from the commit log with EACH_QUORUM or LOCAL_QUORUM it could happen that the process processing the event log misses an event that only later pops up during repair. What that happens, I'd like to re-process the log (my processing is idempotent, so it can just go again). This is why I was looking for a way to learn that a repair has actually repaired something. Jan On Mon, May 29, 2017 at 8:07 AM, Jan Algermissen wrote: Hi, is it possible to extract from repair logs the writetime of the writes that needed to be repaired? I have some processes I would like to re-trigger from a time point if repair found problems. Is that useful? Possible? Jan - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Write / read cost of *QUORUM
Hi, my understanding is that - for writes using any of the quorum CLs will not put more overall load on the cluster because writes will be sent to all nodes responsible for a partition anyhow. So quorum only increases response time of the coordinator, not cluster load. Correct? - for reads all quorum CLs will yield more requests sent by the coordinator to other nodes and hence *QUORUM reads definitely increase cluster load. (And of course response time of the coordinator, too). Correct? Jan - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Tolerable amount of CAS queries?
Hi, I just read [1] which describes a lease implementation using CAS queries. It applies a TTL to the lease which needs to be refreshed periodically by the lease holder. I use such a pattern myself since a couple of years, so no surprise there. However, the article uses CAS queries not only for acquiring the lease, but also for TTL updates and active lease canceling. Given a default TTL of three minutes is described, the amount of CAS queries might be ok. What if I go for much shorter TTLs, eg 5 seconds to minimise the time another peer takes over if the current lease owner crashes or is stopped? Using some safety margin for updating the TTL, we'd end up with a CAS query every 3 seconds or so. If we have a bunch of such leases, we'd likely see 10 or more such CAS queries a second. I am looking for advice whether such a high number of CAS queries could be tolerable at all? I'd assume there is not much contention on the same lease, is the overhead of a CAS query basically that it leads to 4 or sometimes significantly more 'queries' in the C* cluster? IOW, suppose I - have a cluster spanning geographic regions - restrict the CAS queries to key spaces that are only replicated in a single region and I use LOCAL_SERIAL CL would 100 CAS queries per second that in the normal case do not conflict (== work in different partition keys) be sort of 'ok'? Or should it rather be in the range of 10/s? Jan [1] https://www.datastax.com/dev/blog/consensus-on-cassandra - To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org For additional commands, e-mail: user-h...@cassandra.apache.org
Reducing tombstones impact in queue access patterns through rolling shards?
Hi, I just came across this recipe by Netflix, that addresses the impact of tombstones in queue access patterns with a time based rolling shard to allow compaction to happen in one shard while the other is ‘busy’. (At least this is what understand from the intro) https://github.com/Netflix/astyanax/wiki/Message-Queue Has anyone adopted such a pattern and can share experience? Jan
Re: Scala driver
Hi Gary, On 31 Aug 2014, at 07:19, Gary Zhao wrote: > Hi > > Could you recommend a Scala driver and share your experiences of using it. Im > thinking if i use java driver in Scala directly > > I am using Martin’s approach without any problems: https://github.com/magro/play2-scala-cassandra-sample The actual mapping from Java to Scala futures for the async case is in https://github.com/magro/play2-scala-cassandra-sample/blob/master/app/models/Utils.scala HTH, Jan > Thanks
Re: Concurrents deletes and updates
On 17 Sep 2014, at 20:55, Sávio S. Teles de Oliveira wrote: > I'm using the Cassandra 2.0.9 with JAVA datastax driver. > I'm running the tests in a cluster with 3 nodes, RF=3 and CL=ALL for each > operation. > > I have a Column family filled with some keys (for example 'a' and 'b'). > When this keys are deleted and inserted hereafter, sporadically this keys > disappear. Could it be that the delete and insert have the same timestamp? Are you using batched queries maybe? In my current project a team experienced similar behavior during automated tests If you delete with T1 and insert with T1 the delete wins, which was the reason in our case. You might want to test this with client provided timestamps and make sure the insert has a T_insert > T_delete Jan > > Is it a bug on Cassandra or on Datastax driver? > Any suggestions? > > Tks
Exploring Simply Queueing
Hi, I have put together some thoughts on realizing simple queues with Cassandra. https://github.com/algermissen/cassandra-ruby-queue The design is inspired by (the much more sophisticated) Netfilx approach[1] but very reduced. Given that I am still a C* newbie, I’d be very glad to hear some thoughts on the design path I took. Jan [1] https://github.com/Netflix/astyanax/wiki/Message-Queue
Re: Exploring Simply Queueing
Chris, thanks for taking a look. On 06 Oct 2014, at 04:44, Chris Lohfink wrote: > It appears you are aware of the tombstones affect that leads people to label > this an anti-pattern. Without "due" or any time based value being part of > the partition key means you will still get a lot of buildup. You only have 1 > partition per shard which just linearly decreases the tombstones. That isn't > likely to be enough to really help in a situation of high queue throughput, > especially with the default of 4 shards. Yes, dealing with the tombstones effect is the whole point. The work loads I have to deal with are not really high throughput, it is unlikely we’ll ever reach multiple messages per second.The emphasis is also more on coordinating producer and consumer than on high volume capacity problems. Your comment seems to suggest to include larger time frames (e.g. the due-hour) in the partition keys and use the current time to select the active partitions (e.g. the shards of the hour). Once an hour has passed, the corresponding shards will never be touched again. Am I understanding this correctly? > > You may want to consider switching to LCS from the default STCS since > re-writing to same partitions a lot. It will still use STCS in L0 so in high > write/delete scenarios, with low enough gc_grace, when it never gets higher > then L1 it will be sameish write throughput. In scenarios where you get more > LCS will shine I suspect by reducing number of obsolete tombstones. Would be > hard to identify difference in small tests I think. Thanks, I’ll try to explore the various effects > > Whats the plan to prevent two consumers from reading same message off of a > queue? You mention in docs you will address it at a later point in time but > its kinda a biggy. Big lock & batch reads like astyanax recipe? I have included a static column per shard to act as a lock (the ’lock’ column in the examples) in combination with conditional updates. I must admit, I have not quite understood what Netfix is doing in terms of coordination - but since performance isn’t our concern, CAS should do fine, I guess(?) Thanks again, Jan > > --- > Chris Lohfink > > > On Oct 5, 2014, at 6:03 PM, Jan Algermissen > wrote: > >> Hi, >> >> I have put together some thoughts on realizing simple queues with Cassandra. >> >> https://github.com/algermissen/cassandra-ruby-queue >> >> The design is inspired by (the much more sophisticated) Netfilx approach[1] >> but very reduced. >> >> Given that I am still a C* newbie, I’d be very glad to hear some thoughts on >> the design path I took. >> >> Jan >> >> [1] https://github.com/Netflix/astyanax/wiki/Message-Queue >
Re: Exploring Simply Queueing
Shane, On 06 Oct 2014, at 16:34, Shane Hansen wrote: > Sorry if I'm hijacking the conversation, but why in the world would you want > to implement a queue on top of Cassandra? It seems like using a proper > queuing service > would make your life a lot easier. Agreed - however, the use case simply does not justify the additional operations. > > That being said, there might be a better way to play to the strengths of C*. > Ideally everything you do > is append only with few deletes or updates. So an interesting way to > implement a queue might be > to do one insert to put the job in the queue and another insert to mark the > job as done or in process > or whatever. This would also give you the benefit of being able to replay the > state of the queue. Thanks, I’ll try that, too. Jan > > > On Mon, Oct 6, 2014 at 12:57 AM, Jan Algermissen > wrote: > Chris, > > thanks for taking a look. > > On 06 Oct 2014, at 04:44, Chris Lohfink wrote: > > > It appears you are aware of the tombstones affect that leads people to > > label this an anti-pattern. Without "due" or any time based value being > > part of the partition key means you will still get a lot of buildup. You > > only have 1 partition per shard which just linearly decreases the > > tombstones. That isn't likely to be enough to really help in a situation > > of high queue throughput, especially with the default of 4 shards. > > Yes, dealing with the tombstones effect is the whole point. The work loads I > have to deal with are not really high throughput, it is unlikely we’ll ever > reach multiple messages per second.The emphasis is also more on coordinating > producer and consumer than on high volume capacity problems. > > Your comment seems to suggest to include larger time frames (e.g. the > due-hour) in the partition keys and use the current time to select the active > partitions (e.g. the shards of the hour). Once an hour has passed, the > corresponding shards will never be touched again. > > Am I understanding this correctly? > > > > > You may want to consider switching to LCS from the default STCS since > > re-writing to same partitions a lot. It will still use STCS in L0 so in > > high write/delete scenarios, with low enough gc_grace, when it never gets > > higher then L1 it will be sameish write throughput. In scenarios where you > > get more LCS will shine I suspect by reducing number of obsolete > > tombstones. Would be hard to identify difference in small tests I think. > > Thanks, I’ll try to explore the various effects > > > > > Whats the plan to prevent two consumers from reading same message off of a > > queue? You mention in docs you will address it at a later point in time > > but its kinda a biggy. Big lock & batch reads like astyanax recipe? > > I have included a static column per shard to act as a lock (the ’lock’ column > in the examples) in combination with conditional updates. > > I must admit, I have not quite understood what Netfix is doing in terms of > coordination - but since performance isn’t our concern, CAS should do fine, I > guess(?) > > Thanks again, > > Jan > > > > > > --- > > Chris Lohfink > > > > > > On Oct 5, 2014, at 6:03 PM, Jan Algermissen > > wrote: > > > >> Hi, > >> > >> I have put together some thoughts on realizing simple queues with > >> Cassandra. > >> > >> https://github.com/algermissen/cassandra-ruby-queue > >> > >> The design is inspired by (the much more sophisticated) Netfilx > >> approach[1] but very reduced. > >> > >> Given that I am still a C* newbie, I’d be very glad to hear some thoughts > >> on the design path I took. > >> > >> Jan > >> > >> [1] https://github.com/Netflix/astyanax/wiki/Message-Queue > > > >
Re: Exploring Simply Queueing
Robert, On 06 Oct 2014, at 17:50, Robert Coli wrote: > In theory they can also be designed such that history is not infinite, which > mitigates the buildup of old queue state. > Hmm, I was under the impression that issues with old queue state disappear after gc_grace_seconds and that the goal primarily is to keep the rows ‘short’ enough to achieve a tombstones read performance impact that one can live with in a given use case. Is that understanding wrong? Jan
Re: Exploring Queueing
Hi all, thanks again for the comments. I have created an (improved?) design, this time using dedicated consumers per shard and time-based row expire, hence without immediate deletes. https://github.com/algermissen/cassandra-ruby-sharded-workers As before, comments are welcome. Jan On 06 Oct 2014, at 22:50, Robert Coli wrote: > On Mon, Oct 6, 2014 at 1:40 PM, Jan Algermissen > wrote: > Hmm, I was under the impression that issues with old queue state disappear > after gc_grace_seconds and that the goal primarily is to keep the rows > ‘short’ enough to achieve a tombstones read performance impact that one can > live with in a given use case. > > The design I pasted does a link to does not include specifics regarding > pruning old history. Yes, you can just delete it, if your system design > doesn't require replay from the start. > > =Rob >
high context switches
Hello, We are running a 3 node cluster with RF=3 and 5 clients in a test environment. The C* settings are mostly default. We noticed quite high context switching during our tests. On 100 000 000 keys/partitions we averaged around 260 000 cs (with a max of 530 000). We were running 12 000~ transactions per second. 10 000 reads and 2000 updates. Nothing really wrong with that however I would like to understand why these numbers are so high. Have others noticed this behavior? How much context switching is expected and why? What are the variables that affect this? /J
RE: high context switches
We use CQL with 1 session per client and default connection settings. I do not think that we are using too many client threads. Number of native transport threads is set to default (max 128). From: Robert Coli [mailto:rc...@eventbrite.com] Sent: den 21 november 2014 19:30 To: user@cassandra.apache.org Subject: Re: high context switches On Fri, Nov 21, 2014 at 1:21 AM, Jan Karlsson mailto:jan.karls...@ericsson.com>> wrote: Nothing really wrong with that however I would like to understand why these numbers are so high. Have others noticed this behavior? How much context switching is expected and why? What are the variables that affect this? I +1 Nikolai's conjecture that you are probably using a very high number of client threads. However as a general statement Cassandra is highly multi-threaded. Threads are assigned within thread pools and these thread pools can be thought of as a type of processing pipeline, such that one is often the input to another. When pushing Cassandra near its maximum capacity, you will therefore spend a lot of time switching between threads. =Rob http://twitter.com/rcolidba
Re: Cassandra schema migrator
Hi Jens, maybe you should have a look at mutagen for cassandra: https://github.com/toddfast/mutagen-cassandra It is a litte quiet around this for some months, but maybe still worth it. Cheers, Jan Am 25.11.2014 um 10:22 schrieb Jens Rantil: Hi, Anyone who is using, or could recommend, a tool for versioning schemas/migrating in Cassandra? My list of requirements is: * Support for adding tables. * Support for versioning of table properties. All our tables are to be defaulted to LeveledCompactionStrategy. * Support for adding non-existing columns. * Optional: Support for removing columns. * Optional: Support for removing tables. We are preferably a Java shop, but could potentially integrate something non-Java. I understand I could write a tool that would make these decisions using system.schema_columnfamilies and system.schema_columns, but as always reusing a proven tool would be preferable. So far I only know of Spring Data Cassandra that handles creating tables and adding columns. However, it does not handle table properties in any way. Thanks, Jens ——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter
sstablemetadata and sstablerepairedset not working with DSC on Debian
Hi, while curious on the new incremental repairs I updated our cluster to C* version 2.1.2 via the Debian apt-repository. Everything went quite well, but trying to start the tools sstablemetadata and sstablerepairedset lead to the following error: root@a01:/home/ifjke# sstablerepairedset Error: Could not find or load main class org.apache.cassandra.tools.SSTableRepairedAtSetter root@a01:/home/ifjke# Looking at the scripts starting these tools I found that the java classpath is build via for jar in `dirname $0`/../../lib/*.jar; do CLASSPATH=$CLASSPATH:$jar done Because of the scripts beeing located in /usr/bin/ this leads to search for libs in /lib. Obviously there are no java or cassandra libraries there - nodetool instead uses a different way: if [ "x$CASSANDRA_INCLUDE" = "x" ]; then for include in "`dirname "$0"`/cassandra.in.sh" \ "$HOME/.cassandra.in.sh" \ /usr/share/cassandra/cassandra.in.sh \ /usr/local/share/cassandra/cassandra.in.sh \ /opt/cassandra/cassandra.in.sh; do if [ -r "$include" ]; then . "$include" break fi done elif [ -r "$CASSANDRA_INCLUDE" ]; then . "$CASSANDRA_INCLUDE" fi I created a simple patch which works for both sstablemetadata and sstablerepairedset for me, but maybe that's worth sharing it: ---SNIP--- --- sstablerepairedset2014-11-11 15:50:02.0 + +++ sstablerepairedset_new2014-12-18 07:52:26.967368891 + @@ -16,22 +16,19 @@ # See the License for the specific language governing permissions and # limitations under the License. -if [ "x$CLASSPATH" = "x" ]; then - -# execute from the build dir. -if [ -d `dirname $0`/../../build/classes ]; then -for directory in `dirname $0`/../../build/classes/*; do -CLASSPATH=$CLASSPATH:$directory -done -else -if [ -f `dirname $0`/../lib/stress.jar ]; then -CLASSPATH=`dirname $0`/../lib/stress.jar +if [ "x$CASSANDRA_INCLUDE" = "x" ]; then +for include in "`dirname "$0"`/cassandra.in.sh" \ + "$HOME/.cassandra.in.sh" \ + /usr/share/cassandra/cassandra.in.sh \ + /usr/local/share/cassandra/cassandra.in.sh \ + /opt/cassandra/cassandra.in.sh; do +if [ -r "$include" ]; then +. "$include" +break fi -fi - -for jar in `dirname $0`/../../lib/*.jar; do -CLASSPATH=$CLASSPATH:$jar done +elif [ -r "$CASSANDRA_INCLUDE" ]; then +. "$CASSANDRA_INCLUDE" fi # Use JAVA_HOME if set, otherwise look for java in PATH ---SNIP--- Worked for me on both tools. Jan
Re: Replacing nodes disks
Hi Or, I did some sort of this a while ago. If your machines do have a free disk slot - just put another disk there and use it as another data_file_directory. If not - as in my case: - grab an usb dock for disks - put the new one in there, plug in, format, mount to /mnt etc. - I did an online rsync from /var/lib/cassandra/data to /mnt - after that, bring cassandra down - do another rsync from /var/lib/cassandra/data to /mnt (should be faster, as sstables do not change, minimizes downtime) - if you need adjust /etc/fstab if needed - shutdown the node - swap disks - power on the node - everything should be fine ;-) Of course you will need a replication factor > 1 for this to work ;-) Just my 2 cents, Jan rsync the full contents there, Am 18.12.2014 um 16:17 schrieb Or Sher: Hi all, We have a situation where some of our nodes have smaller disks and we would like to align all nodes by replacing the smaller disks to bigger ones without replacing nodes. We don't have enough space to put data on / disk and copy it back to the bigger disks so we would like to rebuild the nodes data from other replicas. What do you think should be the procedure here? I'm guessing it should be something like this but I'm pretty sure it's not enough. 1. shutdown C* node and server. 2. replace disks + create the same vg lv etc. 3. start C* (Normally?) 4. nodetool repair/rebuild? *I think I might get some consistency issues for use cases relying on Quorum reads and writes for strong consistency. What do you say? Another question is (and I know it depends on many factors but I'd like to hear an experienced estimation): How much time would take to rebuild a 250G data node? Thanks in advance, Or. -- Or Sher
Re: Replacing nodes disks
Hi, even if recovery like a dead node would work - backup and restore (like my way with an usb docking station) will be much faster and produce less IO and CPU impact on your cluster. Keep that in Mind :-) Cheers, Jan Am 22.12.2014 um 10:58 schrieb Or Sher: Great. replace_address works great. From some reason I thought it won't work with the same IP. On Sun, Dec 21, 2014 at 5:14 PM, Ryan Svihla <mailto:rsvi...@datastax.com>> wrote: Cassandra is designed to rebuild a node from other nodes, whether a node is dead by your hand because you killed it or fate is irrelevant, the process is the same, a "new node" can be the same hostname and ip or it can have totally different ones. On Sun, Dec 21, 2014 at 6:01 AM, Or Sher mailto:or.sh...@gmail.com>> wrote: If I'll use the replace_address parameter with the same IP address, would that do the job? On Sun, Dec 21, 2014 at 11:20 AM, Or Sher mailto:or.sh...@gmail.com>> wrote: What I want to do is kind of replacing a dead node - http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html But replacing it with a clean node with the same IP and hostname. On Sun, Dec 21, 2014 at 9:53 AM, Or Sher mailto:or.sh...@gmail.com>> wrote: Thanks guys. I have to replace all data disks, so I don't have another large enough local disk to move the data to. If I'll have no choice, I will backup the data before on some other node or something, but I'd like to avoid it. I would really love letting Cassandra do it thing and rebuild itself. Did anybody handled such cases that way (Letting Cassandra rebuild it's data?) Although there are no documented procedure for it, It should be possible right? On Fri, Dec 19, 2014 at 8:41 AM, Jan Kesten mailto:j.kes...@enercast.de>> wrote: Hi Or, I did some sort of this a while ago. If your machines do have a free disk slot - just put another disk there and use it as another data_file_directory. If not - as in my case: - grab an usb dock for disks - put the new one in there, plug in, format, mount to /mnt etc. - I did an online rsync from /var/lib/cassandra/data to /mnt - after that, bring cassandra down - do another rsync from /var/lib/cassandra/data to /mnt (should be faster, as sstables do not change, minimizes downtime) - if you need adjust /etc/fstab if needed - shutdown the node - swap disks - power on the node - everything should be fine ;-) Of course you will need a replication factor > 1 for this to work ;-) Just my 2 cents, Jan rsync the full contents there, Am 18.12.2014 um 16:17 schrieb Or Sher: Hi all, We have a situation where some of our nodes have smaller disks and we would like to align all nodes by replacing the smaller disks to bigger ones without replacing nodes. We don't have enough space to put data on / disk and copy it back to the bigger disks so we would like to rebuild the nodes data from other replicas. What do you think should be the procedure here? I'm guessing it should be something like this but I'm pretty sure it's not enough. 1. shutdown C* node and server. 2. replace disks + create the same vg lv etc. 3. start C* (Normally?) 4. nodetool repair/rebuild? *I think I might get some consistency issues for use cases relying on Quorum reads and writes for strong consistency. What do you say? Another question is (and I know it depends on many factors but I'd like to hear an experienced estimation): How much time would tak
Repair producing validation failed errors regularly
at org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:930) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:97) ~[apache-cassandra-2.1.1.jar:2.1.1] at org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:557) ~[apache-cassandra-2.1.1.jar:2.1.1] at java.util.concurrent.FutureTask.run(FutureTask.java:262) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) ~[na:1.7.0_51] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) [na:1.7.0_51] at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51] BR Jan
Re: Nodetool clearsnapshot
Hi, I have read that snapshots are basicaly symlinks and they do not take that much space. Why if I run nodetool clearsnapshot it frees a lot of space? I am seeing GBs freed... both together makes sense. Creating a snaphot just creates links for all files unter the snapshot directory. This is very fast and takes no space. But those links are hard links, not symbolic ones. After a while your running cluster will compact some of its sstables and writing it to a new one as deleting the old ones. Now for example you had SSTable1..4 and a snapshot with the links to those four after compaction you will have one active SSTable5 which is newly written and consumes space. The snapshot-linked ones are still there, still consuming their space. Only when this snapshot is cleared you get your disk space back. HTH, Jan
Re: Many really small SSTables
Hi Eric and all, I almost expected this kind answer. I did a nodetool compactionstats already to see if those sstables are beeing compacted, but on all nodes there are 0 outstanding compactions (right now in the morning, not running any tests on this cluster). The reported read latency is about 1-3ms and on nodes which have many sstables (new highscore are ~18k sstables). The 99% percentile is about 30-40 micros and a cell count of about 80-90 (if I got the docs right these are the number of sstables accessed, that changed from 2.0 to 2.1 I think as I see this only on testing cluster). I looks to me that compactions were not triggered. I tried a nodetool compact on one node overnight - but that crashed the entire node. Roland Am 15.01.2015 um 19:14 schrieb Eric Stevens: Yes, many sstables can have a huge negative impact read performance, and will also create memory pressure on that node. There are a lot of things which can produce this effect, and it strongly also suggests you're falling behind on compaction in general (check nodetool compactionstats, you should have <5 outstanding/pending, preferably 0-1). To see whether and how much it is impacting your read performance, check nodetool cfstats and nodetool cfhistograms . On Thu, Jan 15, 2015 at 2:11 AM, Roland Etzenhammer mailto:r.etzenham...@t-online.de>> wrote: Hi, I'm testing around with cassandra fair a bit, using 2.1.2 which I know has some major issues,but it is a test environment. After some bulk loading, testing with incremental repairs and running out of heap once I found that now I have a quit large number of sstables which are really small: <1k 0 0,0% <10k 2780 76,8% <100k 3392 93,7% <1000k3461 95,6% <1k 3471 95,9% <10k 3517 97,1% <100k 3596 99,3% all 3621100,0% 76,8% of all sstables in this particular column familiy are smaller that 10kB, 93.7% are smaller then 100kB. Just for my understanding - does that impact performance? And is there any way to reduce the number of sstables? A full run of nodetool compact is running for a really long time (more than 1day). Thanks for any input, Roland -- i.A. Jan Kesten Systemadministration enercast GmbH Friedrich - Ebert - Straße 104 D–34119 Kassel Tel.: +49 561 / 4739664-0 Fax: (+49)561/4739664-9 mailto: j.kes...@enercast.de http://www.enercast.de AG Kassel HRB 15471 Thomas Landgraf Geschäftsführer t.landg...@enercast.de Tel.: (+49)561/4739664-0 FAX: -9 Mobil: (+49)172/6565087 enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel HRB15471 http://www.enercast.de Online-Prognosen für erneuerbare Energien Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO) Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank. This e-mail and any attachment may contain confidential and/or privileged information. If you are not the named addressee or if this transmission has been addressed to you in error, please notify us immediately by reply e-mail and then delete this e-mail and any attachment from your system. Please understand that you must not copy this e-mail or any attachment or disclose the contents to any other person. Thank you for your cooperation.
Re: Node joining take a long time
Hi, a short hint for those upgrading: If you upgrade to 2.1.3 - there is a bug in the config builder when rpc_interface is used. If you use rpc_address in your cassandra.yaml you will be fine - I ran into it this morning and filed an issue for it. https://issues.apache.org/jira/browse/CASSANDRA-8839 Jan
Re: Node stuck in joining the ring
Hi Batranut, apart from the other suggestions - do you have ntp running on all your cluster nodes and are times in sync? Jan
Strange Sizes after 2.1.3 upgrade
Hi, I found something strange this morning on our secondary cluster. I upgraded to 2.1.3 - hoping for incremental repairs to work - recently and this morning OpsCenter showed me disk usages to be very unequal. Most irritating is that some nodes show data sizes of > 3TB on one node, but they have only 3 TB drives. I made a screenshot. https://www.dropbox.com/s/0qhbpm1znwd07rj/strange_sizes.png?dl=0 Did this occur somewhere else? Maybe it is totally unrelated to 2.1.3 upgrade. Thanks for any pointers, Jan
RE: Read Repair in cassandra
The request would return with the latest data. The read request would fire against node 1 and node 3. The coordinator would get answers from both and would merge the answers and return the latest. Then read repair might run to update node 3. QUORUM does not take into consideration whether an answer is the latest or not. It just makes sure a QUORUM of nodes reply. From: ankit tyagi [mailto:ankittyagi.mn...@gmail.com] Sent: April 08, 2015 6:37 AM To: user@cassandra.apache.org Subject: Read Repair in cassandra Hi All, I have a doubt regarding read repair while reading data. I and using QUORUM for both read and write operations with RF 3 for strong consistency suppose while write data node1 and node2 replicate the data but it doesn't get replicate on node3 because of various factors. coordinator node will save hinted handoff for node3. now read request comes, if at the time node2 gets down, so data will be served from node1 and node3. node3 may return older data as hinted handoff may not be run from coordinator nofr. In that case read request will fail as only 1 node has the latest data or latest data will get returned from node1 and read repair request will be fired for node3?
Re: java.io.FileNotFoundException when setting up internode_compression
I had this error as well some time ago. It was due to the noexec mount flag of the tmp directory. Worked again when I removed that flag from the tmp directory. Cheers -- Jan Schmidle Founder & CEO P+49 89 999540-41 mschmi...@cospired.com cospired GmbH Roßmarkt 6 D-80331 Munich P+49 89 999540-40 F+49 89 999540-49 mhe...@cospired.com T@cospired Whttp://cospired.com HRB 196843UID DE281743865 Am 14.11.2013 um 03:39 schrieb srmore: > Yes it does, the stack trace is in the first thread. I did not try to create > CF (was trying to enable it in cassandra.yaml), I have an existing CF and > wanted to use compression for inter-node communication. When I enable snappy > compression (in yaml) I get the error and cassandra quits. I figured this > might be a snappy issue and nothing to do with cassandra, will log a bug > there. > > > On Wed, Nov 13, 2013 at 8:01 PM, Aaron Morton wrote: > IIRC there is a test for snappy when the node starts does that log an error ? > > And / or can you create a CF that uses snappy compression (it was the default > for a while in 1.2). > > Cheers > > - > Aaron Morton > New Zealand > @aaronmorton > > Co-Founder & Principal Consultant > Apache Cassandra Consulting > http://www.thelastpickle.com > > On 13/11/2013, at 3:09 am, srmore wrote: > >> Thanks Christopher ! >> I don't think glibc is an issue (as it did go that far) >> /usr/tmp/snappy-1.0.5-libsnappyjava.so is not there, permissions look ok, >> are there any special settings (like JVM args) that I should be using ? I >> can see libsnappyjava.so in the jar though >> (snappy-java-1.0.5.jar\org\xerial\snappy\native\Linux\i386\) one other thing >> I am using RedHat 6. I will try updating glibc ans see what happens. >> >> Thanks ! >> >> >> >> >> On Mon, Nov 11, 2013 at 5:01 PM, Christopher Wirt >> wrote: >> I had this the other day when we were accidentally provisioned a centos5 >> machine (instead of 6). Think it relates to the version of glibc. Notice it >> wants the native binary .so not the .jar >> >> >> >> So maybe update to a newer version of glibc? Or possibly make sure the .so >> exists at /usr/tmp/snappy-1.0.5-libsnappyjava.so? >> >> I was lucky and just did an OS reload to centos6. >> >> >> >> Here is someone having a similar issue. >> >> http://mail-archives.apache.org/mod_mbox/cassandra-commits/201307.mbox/%3CJIRA.12616012.1352862646995.6820.1373083550278@arcas%3E >> >> >> >> >> >> From: srmore [mailto:comom...@gmail.com] >> Sent: 11 November 2013 21:32 >> To: user@cassandra.apache.org >> Subject: java.io.FileNotFoundException when setting up internode_compression >> >> >> >> I might be missing something obvious here, for some reason I cannot seem to >> get internode_compression = all to work. I am getting the following >> exception. I am using cassandra 1.2.9 and have snappy-java-1.0.5.jar in my >> classpath. Google search did not return any useful result, has anyone seen >> this before ? >> >> >> java.io.FileNotFoundException: /usr/tmp/snappy-1.0.5-libsnappyjava.so (No >> such file or directory) >> at java.io.FileOutputStream.open(Native Method) >> at java.io.FileOutputStream.(FileOutputStream.java:194) >> at java.io.FileOutputStream.(FileOutputStream.java:145) >> at >> org.xerial.snappy.SnappyLoader.extractLibraryFile(SnappyLoader.java:394) >> at >> org.xerial.snappy.SnappyLoader.findNativeLibrary(SnappyLoader.java:468) >> at >> org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:318) >> at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229) >> at org.xerial.snappy.Snappy.(Snappy.java:48) >> at >> org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45) >> at >> org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55) >> at >> org.apache.cassandra.io.compress.SnappyCompressor.(SnappyCompressor.java:37) >> at >> org.apache.cassandra.config.CFMetaData.(CFMetaData.java:82) >> at >> org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:81) >> at >> org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:471) >> at >> org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:123) >> >> Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path >> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1738) >> at java.lang.Runtime.loadLibrary0(Runtime.java:823) >> at java.lang.System.loadLibrary(System.java:1028) >> at >> org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52) >> ... 18 more >> >> > >
Paging error after upgrade from C* 2.0.1 to 2.0.3 , Driver from 2.0.0-rc1 to 2.0.0-rc2
Hi all, after upgrading C* and the java-driver I am running into problems with paging. Maybe someone can provide a quick clue. Upgrading was C* from 2.0.1 to 2.0.3 Java Driver from 2.0.0-rc1 to 2.0.0-rc2 Client side, I get the following messages (apparently during a call to resultSet.one() ): com.datastax.driver.core.exceptions.DriverInternalError: An unexpected error occured server side on /37.139.24.133: java.l ang.AssertionError at com.datastax.driver.core.exceptions.DriverInternalError.copy(DriverInternalError.java:42) at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271) at com.datastax.driver.core.ResultSet.fetchMoreResultsBlocking(ResultSet.java:252) at com.datastax.driver.core.ResultSet.one(ResultSet.java:166) Server Side: INFO [HANDSHAKE-/37.139.3.70] 2013-12-19 09:55:11,277 OutboundTcpConnection.java (line 386) Handshaking version with /37.139.3.70 INFO [HANDSHAKE-/37.139.24.133] 2013-12-19 09:55:11,284 OutboundTcpConnection.java (line 386) Handshaking version with /37.139.24.133 INFO [HANDSHAKE-/37.139.24.133] 2013-12-19 09:55:11,309 OutboundTcpConnection.java (line 386) Handshaking version with /37.139.24.133 INFO [HANDSHAKE-/146.185.135.226] 2013-12-19 10:00:10,077 OutboundTcpConnection.java (line 386) Handshaking version with /146.185.135.226 WARN [ReadStage:87] 2013-12-19 10:00:16,490 SliceQueryFilter.java (line 209) Read 111 live and 1776 tombstoned cells (see tombstone_warn_threshold) WARN [ReadStage:87] 2013-12-19 10:00:16,976 SliceQueryFilter.java (line 209) Read 48 live and 1056 tombstoned cells (see tombstone_warn_threshold) WARN [ReadStage:87] 2013-12-19 10:00:18,588 SliceQueryFilter.java (line 209) Read 80 live and 1160 tombstoned cells (see tombstone_warn_threshold) WARN [ReadStage:88] 2013-12-19 10:00:24,675 SliceQueryFilter.java (line 209) Read 48 live and 1056 tombstoned cells (see tombstone_warn_threshold) WARN [ReadStage:88] 2013-12-19 10:00:25,715 SliceQueryFilter.java (line 209) Read 80 live and 1160 tombstoned cells (see tombstone_warn_threshold) WARN [ReadStage:89] 2013-12-19 10:00:31,406 SliceQueryFilter.java (line 209) Read 300 live and 6300 tombstoned cells (see tombstone_warn_threshold) WARN [ReadStage:89] 2013-12-19 10:00:32,075 SliceQueryFilter.java (line 209) Read 65 live and 1040 tombstoned cells (see tombstone_warn_threshold) WARN [ReadStage:89] 2013-12-19 10:00:33,207 SliceQueryFilter.java (line 209) Read 72 live and 1224 tombstoned cells (see tombstone_warn_threshold) WARN [ReadStage:90] 2013-12-19 10:00:37,183 SliceQueryFilter.java (line 209) Read 135 live and 1782 tombstoned cells (see tombstone_warn_threshold) INFO [ScheduledTasks:1] 2013-12-19 10:00:58,523 GCInspector.java (line 116) GC for ParNew: 213 ms for 1 collections, 720697792 used; max is 2057306112 ERROR [Native-Transport-Requests:216] 2013-12-19 10:00:58,913 ErrorMessage.java (line 222) Unexpected exception during request java.lang.AssertionError at org.apache.cassandra.service.pager.AbstractQueryPager.discardFirst(AbstractQueryPager.java:183) at org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:102) at org.apache.cassandra.service.pager.RangeSliceQueryPager.fetchPage(RangeSliceQueryPager.java:36) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:171) at org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:58) at org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:188) at org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:222) at org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) at org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:304) at org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43) at org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Jan
Re: Paging error after upgrade from C* 2.0.1 to 2.0.3 , Driver from 2.0.0-rc1 to 2.0.0-rc2
Sylvain, thanks. Is there anything I can do except waiting for a fix? Could I do something to my data? Or data model? I moved to 2.0.3 because I think I experienced missing rows in 2.0.1 paging - is this related to the 2.0.3 bug? Meaning: going back to 2.0.1 will fix the exception, but leave me with the faulty situation the assertion is there to detect? Jan On 19.12.2013, at 11:39, Sylvain Lebresne wrote: > https://issues.apache.org/jira/browse/CASSANDRA-6447 > > > On Thu, Dec 19, 2013 at 11:16 AM, Jan Algermissen > wrote: > Hi all, > > after upgrading C* and the java-driver I am running into problems with > paging. Maybe someone can provide a quick clue. > > Upgrading was > C* from 2.0.1 to 2.0.3 > Java Driver from 2.0.0-rc1 to 2.0.0-rc2 > > > > Client side, I get the following messages (apparently during a call to > resultSet.one() ): > > > com.datastax.driver.core.exceptions.DriverInternalError: An unexpected error > occured server side on /37.139.24.133: java.l > ang.AssertionError > at > com.datastax.driver.core.exceptions.DriverInternalError.copy(DriverInternalError.java:42) > at > com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271) > at > com.datastax.driver.core.ResultSet.fetchMoreResultsBlocking(ResultSet.java:252) > at com.datastax.driver.core.ResultSet.one(ResultSet.java:166) > > > > Server Side: > > INFO [HANDSHAKE-/37.139.3.70] 2013-12-19 09:55:11,277 > OutboundTcpConnection.java (line 386) Handshaking version with /37.139.3.70 > INFO [HANDSHAKE-/37.139.24.133] 2013-12-19 09:55:11,284 > OutboundTcpConnection.java (line 386) Handshaking version with /37.139.24.133 > INFO [HANDSHAKE-/37.139.24.133] 2013-12-19 09:55:11,309 > OutboundTcpConnection.java (line 386) Handshaking version with /37.139.24.133 > INFO [HANDSHAKE-/146.185.135.226] 2013-12-19 10:00:10,077 > OutboundTcpConnection.java (line 386) Handshaking version with > /146.185.135.226 > WARN [ReadStage:87] 2013-12-19 10:00:16,490 SliceQueryFilter.java (line 209) > Read 111 live and 1776 tombstoned cells (see tombstone_warn_threshold) > WARN [ReadStage:87] 2013-12-19 10:00:16,976 SliceQueryFilter.java (line 209) > Read 48 live and 1056 tombstoned cells (see tombstone_warn_threshold) > WARN [ReadStage:87] 2013-12-19 10:00:18,588 SliceQueryFilter.java (line 209) > Read 80 live and 1160 tombstoned cells (see tombstone_warn_threshold) > WARN [ReadStage:88] 2013-12-19 10:00:24,675 SliceQueryFilter.java (line 209) > Read 48 live and 1056 tombstoned cells (see tombstone_warn_threshold) > WARN [ReadStage:88] 2013-12-19 10:00:25,715 SliceQueryFilter.java (line 209) > Read 80 live and 1160 tombstoned cells (see tombstone_warn_threshold) > WARN [ReadStage:89] 2013-12-19 10:00:31,406 SliceQueryFilter.java (line 209) > Read 300 live and 6300 tombstoned cells (see tombstone_warn_threshold) > WARN [ReadStage:89] 2013-12-19 10:00:32,075 SliceQueryFilter.java (line 209) > Read 65 live and 1040 tombstoned cells (see tombstone_warn_threshold) > WARN [ReadStage:89] 2013-12-19 10:00:33,207 SliceQueryFilter.java (line 209) > Read 72 live and 1224 tombstoned cells (see tombstone_warn_threshold) > WARN [ReadStage:90] 2013-12-19 10:00:37,183 SliceQueryFilter.java (line 209) > Read 135 live and 1782 tombstoned cells (see tombstone_warn_threshold) > INFO [ScheduledTasks:1] 2013-12-19 10:00:58,523 GCInspector.java (line 116) > GC for ParNew: 213 ms for 1 collections, 720697792 used; max is 2057306112 > ERROR [Native-Transport-Requests:216] 2013-12-19 10:00:58,913 > ErrorMessage.java (line 222) Unexpected exception during request > java.lang.AssertionError > at > org.apache.cassandra.service.pager.AbstractQueryPager.discardFirst(AbstractQueryPager.java:183) > at > org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:102) > at > org.apache.cassandra.service.pager.RangeSliceQueryPager.fetchPage(RangeSliceQueryPager.java:36) > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:171) > at > org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:58) > at > org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:188) > at > org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:222) > at > org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119) > at > org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:304) > at > org.jboss.netty.handler.execution.ChannelUpstreamEventRunna