from:"Jan"

Re: Combining two clusters/keyspaces into single cluster

2016-04-21 Thread Jan

HI ;

Your objective is add the Keyspace 2 to cluster 1.
The documentation link being referred to is to add a new datacenter [not
applicable to you].

You need to :
a. take a snapshot of keyspace 2 on cluster2
b. use sstable loader to copy the keyspace2 onto cluster 1
c. run a 'nodetool repair' on cluster 1
d. de-commission cluster2.

You are ready to use cluster 1 [with both keyspaces within it]

Hope this helps
Jan

On Thu, 4/21/16, Arlington Albertson wrote:

Subject: Combining two clusters/keyspaces into single cluster
To: user@cassandra.apache.org
Date: Thursday, April 21, 2016, 6:15 PM

Hey Folks,
I've been looking through various
documentations, but I'm either overlooking something
obvious or not wording it correctly, but the gist of my
problem is this:
I have two cassandra clusters, with two separate
keyspaces on EC2. We'll call them as
follows:
cluster1 (DC name, cluster name,
etc...)keyspace1 (only exists on
cluster1)
cluster2 (DC name, cluster name,
etc...)keyspace2 (only exists on
cluster2)
I need to
perform the following:- take keyspace2,
and add it to cluster1 so that all nodes can serve the
traffic- needs to happen "live" so that
I can repoint new instances to the cluster1 endpoints and
they'll just start working, and no longer directly use
cluster2- eventually, tear down cluster2 (easy
with a `nodetool decommission` after verifying all seeds
have been changed, etc...)

This doc seems to be the closest I've found
thus
far:https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html

Is that the appropriate guide for this and
I'm just over thinking it? Or is there something else I
should be looking at?
Also, this is DSC C* 2.1.13.
TIA!
-AA

RE: Problem Replacing a Dead Node

2016-04-21 Thread Jan

Mir; 

You can take a node out of the cluster with nodetool decommission to a live 
node, or nodetool removetoken (to any other machine) to remove a dead one. 
This will assign the ranges the old node was responsible for to other nodes, 
and replicate the appropriate data there. If decommission is used, the data 
will stream from the decommissioned node. If removetoken is used, the data will 
stream from the remaining replicas.


Hope this helps
Jan/


On Thu, 4/21/16, Anubhav Kale  wrote:

 Subject: RE: Problem Replacing a Dead Node
 To: "user@cassandra.apache.org" 
 Date: Thursday, April 21, 2016, 6:34 PM
 
 #yiv5871637581
 #yiv5871637581 --
  
  _filtered #yiv5871637581 {panose-1:2 4 5 3 5 4 6 3 2 4;}
  _filtered #yiv5871637581 {font-family:Calibri;panose-1:2 15
 5 2 2 2 4 3 2 4;}
 #yiv5871637581  
 #yiv5871637581 p.yiv5871637581MsoNormal, #yiv5871637581
 li.yiv5871637581MsoNormal, #yiv5871637581
 div.yiv5871637581MsoNormal
{margin:0in;margin-bottom:.0001pt;font-size:12.0pt;}
 #yiv5871637581 a:link, #yiv5871637581
 span.yiv5871637581MsoHyperlink
{color:blue;text-decoration:underline;}
 #yiv5871637581 a:visited, #yiv5871637581
 span.yiv5871637581MsoHyperlinkFollowed
{color:purple;text-decoration:underline;}
 #yiv5871637581 span.yiv5871637581EmailStyle17
{color:#1F497D;}
 #yiv5871637581 .yiv5871637581MsoChpDefault
{}
  _filtered #yiv5871637581 {margin:1.0in 1.0in 1.0in 1.0in;}
 #yiv5871637581 div.yiv5871637581WordSection1
{}
 #yiv5871637581 
 
 Reusing the bootstrapping node
 could have caused this, but hard to tell. Since you have
 only 7 nodes, have you tried doing a few rolling restarts of
 all nodes
  to let gossip settle ? Also, the node is pingable from
 other nodes even though it says Unreachable below. Correct
 ? 
    
 Based on nodetool status, it
 appears the node has streamed all the data it needs, but it
 doesn’t think it has joined the ring yet. Does cqlsh work
 on that node
  ?  
    
 From: Mir Tanvir Hossain
 [mailto:mir.tanvir.hoss...@gmail.com]
 
 
 Sent: Thursday, April 21, 2016 11:51 AM
 
 To: user@cassandra.apache.org
 
 Subject: Re: Problem Replacing a Dead Node
 
    
 
 Here is a bit more detail
 of the whole situation. I am hoping someone can help me out
 here. 
 
    
 
 
 We have a seven node
 cluster. One the nodes started to have issues but it was
 running. We decided to add a new node, and remove the
 problematic node after the new node joins. However, the new
 node did not join the cluster even after three
  days. Hence, we decided to go with the replacement option.
 We shutdown the problematic node. After that, we stopped
 cassandra on the bootstraping node, deleted all the data,
 and restarted that node as the replacement node for the
 problematic node.  
 
 
    
 
 
 Since, we reused the
 bootstrapping node as the replacement node, I am wondering
 whether that is causing any issue. Any insights are
 appreciated.  
 
 
    
 
 
 This is the output of
 nodetool describecluster from the replacement node, and two
 other nodes. 
 
 
    
 
 
 
 mhossain@cassandra-24:~$
 nodetool describecluster 
 
 
 Cluster Information: 
 
 
    
 Name: App 
 
 
    
 Snitch:
 org.apache.cassandra.locator.DynamicEndpointSnitch 
 
 
    
 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 
 
 
    
 Schema versions: 
 
 
    
 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80, 10.0.7.4,
 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160,
 10.0.7.176] 
 
 
    
 
 
    
 
 
 mhossain@cassandra-13:~$
 nodetool describecluster 
 
 
 Cluster Information: 
 
 
    
 Name: App 
 
 
    
 Snitch:
 org.apache.cassandra.locator.DynamicEndpointSnitch 
 
 
    
 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 
 
 
    
 Schema versions: 
 
 
    
 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160,
 10.0.7.176] 
 
 
    
 
 
    
 UNREACHABLE: [10.0.7.91, 10.0.7.4] 
 
 
    
 
 
    
 
 
 mhossain@cassandra-09:~$
 nodetool describecluster 
 
 
 Cluster Information: 
 
 
    
 Name: App 
 
 
    
 Snitch:
 org.apache.cassandra.locator.DynamicEndpointSnitch 
 
 
    
 Partitioner: org.apache.cassandra.dht.Murmur3Partitioner
 
 
 
    
 Schema versions: 
 
 
    
 80649e67-8ed9-38a4-8afa-560be7c694f4: [10.0.7.80,
 10.0.7.190, 10.0.7.100, 10.0.7.195, 10.0.7.160,
 10.0.7.176] 
 
 
    
 
 
    
 UNREACHABLE: [10.0.7.91, 10.0.7.4] 
 
 
    
 
 
    
 
 
 cassandra-24 (10.0.7.4) is
 the replacement node. 10.0.7.91 is the ip address of the
 dead node. 
 
 
    
 
 
 -Mir  
 
 
 
 
    
 
 On Thu, Apr 21, 2016 at 10:02
 AM, Mir Tanvir Hossain 
 wrote: 
 
 
 Hi, I am trying to replace a
 dead node with by following 
https://docs.datastax.com/en/cassandra/2.0

Re: When are hints written?

2016-04-21 Thread Jan

HI Bo;

you raised 2 questions:
20% system utilization
Hints

20% system utilization: For a node or a cluster to have 20% utilization is
Normal during peak write operation.
Hints: hints are written when a node is unreachable;C* 3.0 has a
complete over haul in the way hints have been implemented.

Recommend reading up this blog article:
http://www.datastax.com/dev/blog/whats-coming-to-cassandra-in-3-0-improved-hint-storage-and-delivery

hope this helps
Jan/

On Thu, 4/21/16, Jens Rantil wrote:

Subject: Re: When are hints written?
To: user@cassandra.apache.org
Date: Thursday, April 21, 2016, 8:57 AM

Hi again
Bo,
I assume this is the piece of
documentation you are referring to?
http://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_about_hh_c.html?scroll=concept_ds_ifg_jqx_zj__performance

> If a
replica node is overloaded or unavailable, and the failure
detector has not yet marked it down, then expect most or all
writes to that node to fail after the timeout triggered by
write_request_timeout_in_ms,
which defaults to 10 seconds. During that time, Cassandra
writes the hint when the timeout is reached.
I'm not an expert on this, but
the way I've seen is that hints are written stored as
soon as there is _any_ issues writing a mutation
(insert/update/delete) to a node. By "issue", that
essentially means that a node hasn't acknowledged back
to the coordinator that the write succeeded within
write_request_timeout_in_ms. This includes TCP/socket
timeouts, connection issues or that the node is down. The
hints are stored for a maximum timespan defaulting to 3
hours.

Cheers,
Jens
On Thu, Apr
21, 2016 at 8:06 AM Bo Finnerup Madsen
wrote:
Hi Jens,
Thank you
for the tip!ALL would definitely cure our hints
issue, but as you note, it is not optimal as we are unable
to take down nodes without clients failing.
I am most probably overlooking
something in the documentation, but I cannot see any
description of when hints are written other than when a node
is marked as being down. And since none of our nodes have
been marked as being down (at least according to the logs),
I suspect that there is some timeout that governs when hints
are written?
Regarding
your other post: Yes, 3.0.3 is pretty new. But we are new to
this cassandra game, and our schema-fu is not strong enough
for us to create a schema without using materialized views
:)

ons. 20. apr. 2016 kl. 17.09 skrev Jens Rantil
:
Hi Bo,
> In our case, I would like for the
cluster to wait for the write to be persisted on the
relevant nodes before returning an ok to the client. But I don't know which
knobs to turn to accomplish this? or if it is even possible
:)
This is what write consistency
option is for. Have a look at
https://docs.datastax.com/en/cassandra/2.0/cassandra/dml/dml_config_consistency_c.html.
Note, however that if you use ALL, your clients will fail
(throw exception, depending on language) as soon as a single
partition can't be written. This means you can't do
online maintenance of a Cassandra node (such as upgrading it
etc.) without experiencing write issues.
Cheers,Jens
On Wed, Apr 20, 2016 at 3:39 PM Bo Finnerup Madsen

wrote:
Hi,
We have a
small 5 node cluster of m4.xlarge clients that receives
writes from ~20 clients. The clients will write as fast as
they can, and the whole process is limited by the write
performance of the cassandra cluster.After we have tweaked our schema to
avoid large partitions, the load is going ok and we
don't see any warnings or errors in the cassandra logs.
But we do see quite a lot of hint handoff activity. During
the load, the cassandra nodes are quite loaded, with linux
reporting a load as high as 20.
I have read the available
documentation on how hints works, and to my understanding
hints should only be written if a node is down. But as far
as I can see, none of the nodes are marked as down during
the load. So I suspect I am missing something
:)We have configured the servers
with write_request_timeout_in_ms: 12 and the clients
with a timeout of 13, but still get hints
stored.
In our case, I
would like for the cluster to wait for the write to be
persisted on the relevant nodes before returning an ok to
the client. But I don't know which knobs to turn to
accomplish this? or if it is even possible :)
We are running cassandra 3.0.3, with
8Gb heap and a replication factor of 3.
Thank you in advance!
Yours sincerely, Bo
Madsen
--

Jens Rantil
Backend Developer @ Tink
Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
For urgent matters you can reach me at
+46-708-84 18 32.
--

Jens Rantil
Backend Developer @ Tink
Tink AB, Wallingatan 5, 111 60
Stockholm, Sweden
For urgent matters you can
reach me at +46-708-84 18 32.

enabling Solr on a DSE C* node

2016-05-06 Thread Jan

HI Folks; 
I am trying to have one of my  DSE 4.7  C*  nodes also function as a Solr  node 
within the  cluster.

I have followed the docs in vain : 
https://docs.datastax.com/en/datastax_enterprise/4.0/datastax_enterprise/srch/srchInstall.html


Any pointers would help. 

Thanks
Jan

AW: Java GC pauses, reality check

2016-11-25 Thread Jan

https://www.azul.com/products/zing/order-zing/

At least a list price for zing I found there: 3k$ per year.

- Ursprüngliche Nachricht -
Von: "Work" 
Gesendet: ‎26.‎11.‎2016 07:53
An: "user@cassandra.apache.org" 
Betreff: Re: Java GC pauses, reality check

I'm not affiliated with them, I've just been impressed by them. They have done 
amazing work in performance measurement. They discovered a major flaw in most 
performance testing ... I've never seen their pricing. But, recently, they made 
their product available for testing by developers. And the assured me that 
pricing is on a sliding scale depending upon utilization, and not ridiculous. 

- James 

Sent from my iPhone

On Nov 25, 2016, at 10:40 PM, Benjamin Roth  wrote:

This sounds amazing but also expensive - I don't see pricing on their page. Are 
you able and allowed to tell a rough pricing range?

Am 26.11.2016 04:33 schrieb "Harikrishnan Pillai" :

We are running azul zing in prod with 1 million reads/s and 100 K writes/s with 
azul .we never had a major gc above 10 ms .

Sent from my iPhone

> On Nov 25, 2016, at 3:49 PM, Martin Schröder  wrote:
>
> 2016-11-25 23:38 GMT+01:00 Kant Kodali :
>> I would also restate the following sentence "java GC pauses are pretty much
>> a fact of life" to "Any GC based system pauses are pretty much a fact of
>> life".
>>
>> I would be more than happy to see if someone can counter prove.
>
> Azul disagrees.
> https://www.azul.com/products/zing/pgc/
>
> Best
>   Martin

AbstractQueryPager in debug.log

2017-02-14 Thread Jan

Hi,

I was looking through our logs today, an some thing that caught my eye
are many debug logs like this one:

DEBUG [SharedPool-Worker-8] 2017-02-14 12:05:39,330
AbstractQueryPager.java:112 - Got result (1) smaller than page size
(5000), considering pager exhausted

Those line got logged very often (about 2500 times a minute) and I was
wondering if this is just "ok" or if there is something misusing paged
results for requests fetching a single record and we should have a look
at it. Maybe paging results could be a performance issue?

Thanks for any hints,
Jan

Re: Count(*) is not working

2017-02-16 Thread Jan

Hi,

could you post the output of nodetool cfstats for the table?

Cheers,

Jan


Am 16.02.2017 um 17:00 schrieb Selvam Raman:
> I am not getting count as result. Where i keep on getting n number of
> results below.
>
> Read 100 live rows and 1423 tombstone cells for query SELECT * FROM
> keysace.table WHERE token(id) >
> token(test:ODP0144-0883E-022R-002/047-052) LIMIT 100 (see
> tombstone_warn_threshold)
>
> On Thu, Feb 16, 2017 at 12:37 PM, Jan Kesten  <mailto:j...@dafuer.de>> wrote:
>
> Hi,
>
> do you got a result finally?
>
> Those messages are simply warnings telling you that c* had to read
> many tombstones while processing your query - rows that are
> deleted but not garbage collected/compacted. This warning gives
> you some explanation why things might be much slower than expected
> because per 100 rows that count c* had to read about 15 times rows
> that were deleted already.
>
> Apart from that, count(*) is almost always slow - and there is a
> default limit of 10.000 rows in a result.
>
> Do you really need the actual live count? To get a idea you can
> always look at nodetool cfstats (but those numbers also contain
> deleted rows).
>
>
> Am 16.02.2017 um 13:18 schrieb Selvam Raman:
>> Hi,
>>
>> I want to know the total records count in table.
>>
>> I fired the below query:
>>select count(*) from tablename;
>>
>> and i have got the below output
>>
>> Read 100 live rows and 1423 tombstone cells for query SELECT *
>> FROM keysace.table WHERE token(id) >
>> token(test:ODP0144-0883E-022R-002/047-052) LIMIT 100 (see
>> tombstone_warn_threshold)
>>
>> Read 100 live rows and 1435 tombstone cells for query SELECT *
>> FROM keysace.table WHERE token(id) > token(test:2565-AMK-2) LIMIT
>> 100 (see tombstone_warn_threshold)
>>
>> Read 96 live rows and 1385 tombstone cells for query SELECT *
>> FROM keysace.table WHERE token(id) > token(test:-2220-UV033/04)
>> LIMIT 100 (see tombstone_warn_threshold).
>>
>>
>>
>>
>> Can you please help me to get the total count of the table.
>>
>> -- 
>> Selvam Raman
>> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"
>
>
>
>
> -- 
> Selvam Raman
> "லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Re: How to measure disk space used by a keyspace?

2015-07-01 Thread Jan

nodetool cfstats
would be your best bet.   Sum all the column families info.,  within a keyspace 
to get to the number you are looking for. 
Jan/ 


 On Wednesday, July 1, 2015 9:05 AM, graham sanderson  
wrote:
   

 If you are pushing metric data to graphite, there is
org.apache.cassandra.metrics.keyspace..LiveDiskSpaceUsed.value
… for each node; Easy enough to graph the sum across machines.
Metrics/JMX are tied together in C*, so there is an equivalent value exposed 
via JMX… I don’t know what it is called off the top of my head, but would be 
something similar to the above.

On Jul 1, 2015, at 9:28 AM, sean_r_dur...@homedepot.com wrote:
That’s ok for a single node, but to answer the question, “how big is my table 
across the cluster?” it would be much better if the cluster could provide an 
answer. Sean Durity From: Jonathan Haddad [mailto:j...@jonhaddad.com] 
Sent: Monday, June 29, 2015 8:15 AM
To: user
Subject: Re: How to measure disk space used by a keyspace?  If you're looking 
to measure actual disk space, I'd use the du command, assuming you're on a 
linux: http://linuxconfig.org/du-1-manual-page  On Mon, Jun 29, 2015 at 2:26 AM 
shahab  wrote:
Hi,  Probably this question has been already asked in the mailing list, but I 
couldn't find it.  The question is how to measure disk-space used by a 
keyspace, column family wise, excluding snapshots?  best,/Shahab


The information in this Internet Email is confidential and may be legally 
privileged. It is intended solely for the addressee. Access to this Email by 
anyone else is unauthorized. If you are not the intended recipient, any 
disclosure, copying, distribution or any action taken or omitted to be taken in 
reliance on it, is prohibited and may be unlawful. When addressed to our 
clients any opinions or advice contained in this Email are subject to the terms 
and conditions expressed in any applicable governing The Home Depot terms of 
business or client engagement letter. The Home Depot disclaims all 
responsibility and liability for the accuracy and content of this attachment 
and for any damages or losses arising from any inaccuracies, errors, viruses, 
e.g., worms, trojan horses, etc., or other items of a destructive nature, which 
may be contained in this attachment and shall not be liable for direct, 
indirect, consequential or special damages in connection with this e-mail 
message or its attachment.

Re: Stream failure while adding a new node

2015-07-01 Thread Jan

David ;
bring down all the nodes with the exception of the 'seed' node.Now bring up the 
10th node.   Run 'nodetool status'  and wait until this 10th node is UP. Bring 
up the rest of the nodes after that. Run  'nodetool status'  again and check 
that all the nodes are UP.  
Alternatively;decommission the 10th node completely.drop it from the Cluster.  
Build a new node with the same IP and hostname  and have it join the running 
cluster. 
hope this helpsJan


 


 On Wednesday, July 1, 2015 7:56 AM, David CHARBONNIER 
 wrote:
   

 #yiv2507924157 #yiv2507924157 -- _filtered #yiv2507924157 {panose-1:2 4 5 3 5 
4 6 3 2 4;} _filtered #yiv2507924157 {font-family:Calibri;panose-1:2 15 5 2 2 2 
4 3 2 4;} _filtered #yiv2507924157 {font-family:Tahoma;panose-1:2 11 6 4 3 5 4 
4 2 4;}#yiv2507924157 #yiv2507924157 p.yiv2507924157MsoNormal, #yiv2507924157 
li.yiv2507924157MsoNormal, #yiv2507924157 div.yiv2507924157MsoNormal 
{margin:0cm;margin-bottom:.0001pt;font-size:12.0pt;}#yiv2507924157 a:link, 
#yiv2507924157 span.yiv2507924157MsoHyperlink 
{color:blue;text-decoration:underline;}#yiv2507924157 a:visited, #yiv2507924157 
span.yiv2507924157MsoHyperlinkFollowed 
{color:purple;text-decoration:underline;}#yiv2507924157 p 
{margin-right:0cm;margin-left:0cm;font-size:12.0pt;}#yiv2507924157 
p.yiv2507924157MsoAcetate, #yiv2507924157 li.yiv2507924157MsoAcetate, 
#yiv2507924157 div.yiv2507924157MsoAcetate 
{margin:0cm;margin-bottom:.0001pt;font-size:8.0pt;}#yiv2507924157 
span.yiv2507924157EmailStyle18 {color:#1F497D;}#yiv2507924157 
span.yiv2507924157TextedebullesCar {}#yiv2507924157 .yiv2507924157MsoChpDefault 
{} _filtered #yiv2507924157 {margin:70.85pt 70.85pt 70.85pt 
70.85pt;}#yiv2507924157 div.yiv2507924157WordSection1 {}#yiv2507924157 Hi 
Alain,    We still have the timeout problem in OPSCenter and we still didn’t 
solve this problem so no we didn’t ran an entire repair with the repair 
service. And yes, during this try, we’ve set auto_bootstrap to true and ran a 
repair on the 9th node after it finished streaming.    Thank you for your help. 
   Best regards,    
|   | 
| David CHARBONNIER  |
| Sysadmin  |
| T : +33 411 934 200  |
| david.charbonn...@rgsystem.com  |

 |  | 
| ZAC Aéroport  |
| 125 Impasse Adam Smith  |
| 34470 Pérols - France  |
| www.rgsystem.com  |

 |

   
|   |

      De : Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Envoyé : mardi 30 juin 2015 15:18
À : user@cassandra.apache.org
Objet : Re: Stream failure while adding a new node    Hi David,    Are you sure 
you ran the repair entirely (9 days + repair logs ok on opscenter server) 
before adding the 10th node ? This is important to avoid potential data loss ! 
Did you set auto_bootstrap to true on this 10th node ?    C*heers,    Alain     
     2015-06-29 14:54 GMT+02:00 David CHARBONNIER 
: Hi,   We’re using Cassandra 2.0.8.39 through 
Datastax Enterprise 4.5.1 with a 9 nodes cluster. We need to add a few new 
nodes to the cluster but we’re experiencing an issue we don’t know how to 
solve. Here is exactly what we did : - We had 8 nodes and need to add a 
few ones - We tried to add 9th node but stream stucked a very long time 
and bootstrap never finish (related to streaming_socket_timeout_in_ms default 
value in cassandra.yaml) - We ran a solution given by a Datastax’s 
architect : restart the node with auto_bootstrap set to false and run a repair 
- After this issue, we ran into pathing the default configuration on 
all our nodes to avoid this problem and made a rolling restart of the cluster - 
Then, we tried adding a 10th node but it receives stream from only one 
node (node2).   Here is the logs on this problematic node (node10) : INFO 
[main] 2015-06-26 15:25:59,490 StreamResultFuture.java (line 87) [Stream 
#a5226b30-1c17-11e5-a58b-e35f08264ca1] Executing streaming plan for Bootstrap 
INFO [main] 2015-06-26 15:25:59,490 StreamResultFuture.java (line 91) [Stream 
#a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node6 
INFO [main] 2015-06-26 15:25:59,491 StreamResultFuture.java (line 91) [Stream 
#a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node5 
INFO [main] 2015-06-26 15:25:59,492 StreamResultFuture.java (line 91) [Stream 
#a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node4 
INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream 
#a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node3 
INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream 
#a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node9 
INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream 
#a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node8 
INFO [main] 2015-06-26 15:25:59,493 StreamResultFuture.java (line 91) [Stream 
#a5226b30-1c17-11e5-a58b-e35f08264ca1] Beginning stream session with /node7 
INFO [main] 2015-06-26 15:25:59,494

Cassandra 2015 Summit videos

2016-01-23 Thread Jan

HI Folks
could you please point me to the  2015 Cassandra summit held in California. I 
do see the ones posted for the 2014 & 2013 conferences.   
ThanksJan

flipping ordering of returned query results

2016-01-30 Thread Jan

Folks; 
Need some advice. We have a time-series application that needs the data being 
returned from C*     to be flipped from the typical column based data to be row 
based. 
example :  C*    data :   A   B  C                     D  E  F 
need returned data to be :                          A  D                     B  
E                    C  F

Any input would be much appreciated. 
thanks,Jan

Re: Alternative approach to setting up new DC

2016-04-21 Thread Jan

Jens;

I am unsure that you need to enable Replication & also use the sstable loader.
You could load the data into the new DC and susbsequently alter the keyspace to 
replicate from the older DC. 

Cheers
Jan



On Thu, 4/21/16, Jens Rantil  wrote:

 Subject: Re: Alternative approach to setting up new DC
 To: user@cassandra.apache.org
 Date: Thursday, April 21, 2016, 9:00 AM
 
 Hi,
 I never got
 any response here, but just wanted to share that I went to a
 Cassandra meet-up in Stockholm yesterday where I talked to
 two knowledgable Cassandra people that verified that the
 approach below should work. The most important thing is that
 the backup must be fully imported before gc_grace_seconds
 after when the backup is taken.
 As of me, I managed to a get a more
 stable VPN setup and did not have to go down this
 path.
 Cheers,Jens
 
 On Mon, Apr
 18, 2016 at 10:15 AM Jens Rantil 
 wrote:
 Hi,
 I am
 provisioning a new datacenter for an existing cluster. A
 rather shaky VPN connection is hindering me from making a
 "nodetool rebuild" bootstrap on the new DC.
 Interestingly, I have a full fresh database snapshot/backup
 at the same location as the new DC (transferred outside of
 the VPN). I am now considering the following
 approach:Make
 sure my clients are using the old DC.
 Provision the new nodes in new
 DC.
 ALTER the keyspace to enable
 replicas on the new DC. This will start replicating all
 writes from old DC to new DC.
 Before
 gc_grace_seconds after operation 3) above, use sstableloader
 to stream my backup to the new nodes.
 For
 safety precaution, do a full repair.
 Could you see any issues with
 doing this?
 Cheers,Jens-- 
 
 
 
 
 
 
 
 
 Jens Rantil
 Backend Developer @ Tink
 Tink AB, Wallingatan 5, 111 60 Stockholm, Sweden
 For urgent matters you can reach me at
 +46-708-84 18
 32.-- 
 
 
 
 
 
 
 
 
 Jens Rantil
 Backend Developer @ Tink
 Tink AB, Wallingatan 5, 111 60
 Stockholm, Sweden
 For urgent matters you can
 reach me at +46-708-84 18 32.

Re: Does nodetool repair stop the node to answer requests ?

2015-01-22 Thread Jan

Running a  'nodetool repair'  will 'not'  bring the node down. 
Your question: does a nodetool repair make the server stop serving requests, or 
does it just use a lot of ressources but still serves request 

Answer:     NO, the server will not stop serving requests.      It will use 
some resources but not enough to affect the server serving requests.   
hope this helpsJan

Re: How to store weather station Details along with monitoring data efficiently?

2015-01-23 Thread Jan

The model you are using seems OK. 
Your question: This forces me to enter the wea_name and wea_add for each new 
row, so how to identify a new row has been created?

Answer:    You do 'not'  need to add the wea_name or wea_address during inserts 
for every new row.    Your insert could only include the  Primary & clustered 
keys and it should be fine. 
You identify the new row via : Primary & clustered keys.
Errata:   You could add  Longitude & Latitude too to the model to add a level 
of detail especially since its widely prevalent for weather station data. 
hope this helps. 
jan/ 

 On Friday, January 23, 2015 3:14 AM, Srinivasa T N  
wrote:
   

 I forgot, my task at hand is to generate a report of all the weather station's 
along with the sum of temperatures measured each day.

Regards,
Seenu.

On Fri, Jan 23, 2015 at 2:14 PM, Srinivasa T N  wrote:

Hi All,
   I was following the TimeSeries data modelling in PlanetCassandra by Patrick 
McFadin.  Regarding that, I had one query:

If I need to store the weather station name also, should it be in the same 
table, say:

create table test (wea_id int, wea_name text, wea_add text, eventday timeuuid, 
eventtime timeuuid, temp int, PRIMARY KEY ((wea_id, eventday), eventtime) );

This forces me to enter the wea_name and wea_add for each new row, so how to 
identify a new row has been created?  Or is there any better mechanism for 
modeling the above data?

Regards,
Seenu.

Re: Controlling the MAX SIZE of sstables after compaction

2015-01-26 Thread Jan

Parth  et al; 
the folks at Netflix seem to have built a solution for your problem. The 
Netflix Tech Blog: Aegisthus - A Bulk Data Pipeline out of Cassandra

|   |
|   |  |   |   |   |   |   |
| The Netflix Tech Blog: Aegisthus - A Bulk Data Pipeline ...By Charles Smith 
and Jeff Magnusson  |
|  |
| View on techblog.netflix.com | Preview by Yahoo |
|  |
|   |


May want to chase Jeff Magnuson & check if the solution is open sourced.   Pl.  
 report back to this forum if you get an answer to the problem. 
hope this helps. Jan 
C* Architect 

 On Monday, January 26, 2015 11:25 AM, Robert Coli  
wrote:
   

 On Sun, Jan 25, 2015 at 10:40 PM, Parth Setya  wrote:

1. Is there a way to configure the size of sstables created after compaction?


No, won'tfix : https://issues.apache.org/jira/browse/CASSANDRA-4897.
You could use the "sstablesplit" utility on your One Big SSTable to split it 
into files of your preferred size. 
2. Is there a better approach to generate the report?


The major compaction isn't too bad, but something that understands SSTables as 
an input format would be preferable to sstable2json. 
3. What are the flaws with this approach?

sstable2json is slow and transforms your data to JSON.
=Rob

Re: Fixtures / CI docker

2015-01-26 Thread Jan

Hi Alain; 
The requirements are impossible to meet, since you are expected to have a 
predictable and determinist tests  while you need "recent data" (max 1 week old 
data).Reason:   You cannot have a replicable result set when the data is 
variable on a weekly basis.
To obtain a replicable test result, I recommend the following: a)   Keep the 
'data' expectation to a point in time which is a known quanta. b)   Load some 
data into your cluster & take a snapshot.    Reload this snapshot before every 
Test for consistent results.   
hope this helps. 
Jan/C* Architect 

 On Monday, January 26, 2015 10:43 AM, Eric Stevens  
wrote:
   

 I don't have directly relevant advice, especially WRT getting a meaningful and 
coherent subset of your production data - that's probably too closely coupled 
with your business logic.  Perhaps you can run a testing cluster with a default 
TTL on all your tables of ~2 weeks, feeding it with real production data so 
that you have a rolling current snapshot of production.
We do this basic strategy to support integration tests with the rest of our 
platform.  We have a data access service with other internal teams acting as 
customers of that data.  But it's hard to write strong tests against this, 
because it becomes challenging to predict the values which you should expect to 
get back without rewriting the business logic directly into your tests (and 
then what exactly are you testing, are you testing your tests?)
But our data interaction layer tests all focus around inserting the data under 
test immediately before the assertions portion of the given test.  We use 
Specs2 as a testing framework, and that gives us access to a very nice 
"eventually { ... }" syntax which will retry the assertions portion several 
times with a backoff (so that we can account for the eventually consistent 
nature of Cassandra, and reduce the number of false failures without having to 
do test execution speed impacting operations like sleep before assert).
Basically our data access layer unit tests are strong and rely only on 
synthetic data (assert that the response is exact for every value), while 
integration tests from other systems use much softer tests against real data 
(more like is there data, and does that data seem to be the right format and 
for the right time range).
On Mon, Jan 26, 2015 at 3:26 AM, Alain RODRIGUEZ  wrote:

Hi guys,
We currently use a CI with tests based on docker containers.
We have a C* service "dockerized". Yet we have an issue since we would like 2 
things, hard to achieve:
- A fix data set to have predictable and determinist tests (that we can repeat 
at any time with the same result)- A recent data set to perform smoke testing 
on things services that need "recent data" (max 1 week old data)
As our dataset is very big and data is not sorted by dates in SSTable, it is 
hard to have a coherent extract of the production data. Does anyone of you 
achieve to have something like this ?
For "static" data, we could write queries by hand but I find it more relevant 
to have a real production extract. Regarding dynamic data we need a process 
that we could repeat every day / week to update data and have something light 
enough to keep fastness in containers start.
How do you guys do this kind of things ?
FWIW we are migrating to 2.0.11 very soon so solutions might use 2.0 features.
Any idea is welcome and if you need more info, please ask.
C*heers,
Alain

Syntax for using JMX term to connect to Cassandra

2015-01-29 Thread Jan

HI Folks; 
I am trying to use JMXterm,  a command line based tool to script & monitor C* 
cluster. Would anyone on this forum know the exact syntax to connect to 
Cassandra Domain using JMXterm  ?Please give me an example. 
I do 'not'   intend to use OpsCenter or any other UI based tool.
thanksJan

Re: Syntax for using JMX term to connect to Cassandra

2015-01-29 Thread Jan

Thanks Rob; 
here is what I am looking for : 
java -jar  /home/user/jmxterm-1.0-alpha-4-uber.jar 10.30.41.52:7199 -O 
org.apache.cassandra.internal:type=FlushWriter -A CurrentlyBlockedTask

It does Not work since there is something wrong with my syntax.  However once 
working, it would be scripted to connect to a large cluster from a single host 
that would store the results into logs.  
Any help with a single working example would greatly help.    I am running 
circles around this tool for a couple of hrs now.    
ThanksJan
 

 On Thursday, January 29, 2015 4:45 PM, Robert Coli  
wrote:
   

 On Thu, Jan 29, 2015 at 3:27 PM, Jan  wrote:

I am trying to use JMXterm,  a command line based tool to script & monitor C* 
cluster. 
Would anyone on this forum know the exact syntax to connect to Cassandra Domain 
using JMXterm  ?

Here's an example from an old JIRA at my shop :
1. Download the jmxterm-1.0-alpha-4-uber.jar from 
http://wiki.cyclopsgroup.org/jmxterm2. sudo java -jar 
jmxterm-1.0-alpha-4-uber.jar # then within the tool:3. open 4. bean org.apache.cassandra.db:type=StorageService # or whichever bean 
you're looking for 5. run setLog4jLevel 
org.apache.cassandra.db.index.keys.KeySearcher.java DEBUG # example of how to 
set log level=Rob

Re: Opscenter served reads / second

2015-01-29 Thread Jan


Mbean:    org.apache.cassandra.request 
Attribute: org.apache.cassandra.request:type=ReadStage

Hope this helpsJan/
 

 On Thursday, January 29, 2015 9:13 AM, Batranut Bogdan 
 wrote:
   

 Hello,
Is there a metric that will show how many reads per second C* serves? Read 
requests shows how many requests are issued to cassandra, but I want to know 
how many the cluster can actualy serve .

Re: Syntax for using JMX term to connect to Cassandra

2015-01-29 Thread Jan

Here is the answer :       
Put the following into a shell script & it would yield the results : 
JMXTERM_CMD="get -b org.apache.cassandra.db:type=StorageService -s Load"echo 
$JMXTERM_CMD |  java -jar  /home/xyz/jmxterm-1.0-alpha-4-uber.jar  -l 
10.32.22.45:7199 -v silent -n 

Variables are : -b    bean would change based on what you are looking to 
monitor 
-s     Attribute  being monitored
/home/xyz/    location where jmxterm jar file is located 
-l                    IP address of C* node  & the port configured for JMX 
monitoring 
Thanks Robert,  your example got me going in the right direction.
Hope this helpsJan/

 

 On Thursday, January 29, 2015 5:01 PM, Jan  wrote:
   

 Thanks Rob; 
here is what I am looking for : 
java -jar  /home/user/jmxterm-1.0-alpha-4-uber.jar 10.30.41.52:7199 -O 
org.apache.cassandra.internal:type=FlushWriter -A CurrentlyBlockedTask

It does Not work since there is something wrong with my syntax.  However once 
working, it would be scripted to connect to a large cluster from a single host 
that would store the results into logs.  
Any help with a single working example would greatly help.    I am running 
circles around this tool for a couple of hrs now.    
ThanksJan
 

 On Thursday, January 29, 2015 4:45 PM, Robert Coli  
wrote:
   

 On Thu, Jan 29, 2015 at 3:27 PM, Jan  wrote:

I am trying to use JMXterm,  a command line based tool to script & monitor C* 
cluster. 
Would anyone on this forum know the exact syntax to connect to Cassandra Domain 
using JMXterm  ?

Here's an example from an old JIRA at my shop :
1. Download the jmxterm-1.0-alpha-4-uber.jar from 
http://wiki.cyclopsgroup.org/jmxterm2. sudo java -jar 
jmxterm-1.0-alpha-4-uber.jar # then within the tool:3. open 4. bean org.apache.cassandra.db:type=StorageService # or whichever bean 
you're looking for 5. run setLog4jLevel 
org.apache.cassandra.db.index.keys.KeySearcher.java DEBUG # example of how to 
set log level=Rob

Re: Timeouts but returned consistency level is invalid

2015-01-30 Thread Jan

HI Michal; 
The consistency level defaults to ONE for all write and read operations.
However consistency level is also set for the keyspace. 
Could it be possible that your queries are spanning multiple keyspaces which 
bear different levels of consistency ?  
cheersJan
C* Architect 

 On Friday, January 30, 2015 1:36 AM, Michał Łowicki  
wrote:
   

 Hi,
We're using C* 2.1.2, django-cassandra-engine which in turn uses cqlengine. 
LOCAL_QUROUM is set as default consistency level. From time to time we get 
timeouts while talking to the database but what is strange returned consistency 
level is not LOCAL_QUROUM:
code=1200 [Coordinator node timed out waiting for replica nodes' responses] 
message="Operation timed out - received only 3 responses." 
info={'received_responses': 3, 'required_responses': 4, 'consistency': 'ALL'}
code=1200 [Coordinator node timed out waiting for replica nodes' responses] 
message="Operation timed out - received only 1 responses." 
info={'received_responses': 1, 'required_responses': 2, 'consistency': 
'LOCAL_QUORUM'}
code=1100 [Coordinator node timed out waiting for replica nodes' responses] 
message="Operation timed out - received only 0 responses." 
info={'received_responses': 0, 'required_responses': 1, 'consistency': 'ONE'}
Any idea why it might happen?
-- 
BR,
Michał Łowicki

Re: Unable to create a keyspace

2015-01-31 Thread Jan

Saurabh; 
a)   How exactly are the three nodes hosted.   b)  Can you take down node 2 and 
 create the keyspace from node 1c) Can you take down node 1 and  create the 
keyspace from node2d)   Do the nodes see each other with 'nodetool status'  
cheersJan/
C* Architect 

 On Saturday, January 31, 2015 5:40 AM, Carlos Rolo  
wrote:
   

 Something that can cause weird behavior is the machine clocks not being 
properly synced.  I didn't read the thread in full detail, so disregard this if 
it is not the case.

--

Re: Cassandra 2.0.11 with stargate-core read writes are slow

2015-01-31 Thread Jan

HI Asit; 
Question 1) Am I using the right hardware as of now I am testing say 10 
record reads.
Answer:  Recommend looking at either the 'sar' output logs &  watching nodetool 
cfstats & watching your system.log files to track hardware usage & JVM 
presssure.    As a rule of thumb, its recommeneded to have 8 GB for the C* JVM 
itself on production systems.  
Question 3)   Is unclear,  pl.   rephrase the question.    
hope this helpsJan  
C* Architect 

 On Saturday, January 31, 2015 5:33 AM, Carlos Rolo  
wrote:
   

 HI Asit,

The only help I'm going to give is on point 3), as I have little experience 
with 2) and 1) depends on a lot of factors.
For testing the workload use this: 
http://www.datastax.com/documentation/cassandra/2.1/cassandra/tools/toolsCStress_t.html
 It probably covers all your testing needs.

Regards,

Carlos Juzarte RoloCassandra Consultant Pythian - Love your data
rolo@pythian | Twitter: cjrolo | Linkedin: 
linkedin.com/in/carlosjuzarteroloTel: 1649www.pythian.com
On Sat, Jan 31, 2015 at 2:49 AM, Asit KAUSHIK  
wrote:

Hi all,
We are testing our logging application on 3 node cluster each system is virtual 
machine with 4 cores and 8GB RAM with RedHat enterprise. Now my question is in 
3 parts
1) Am I using the right hardware as of now I am testing say 10 record reads.
2) I am using Stargate-core for full text search is there any slowness observed 
because of that as ???
2) How can I simulate the write load I created an application which creates say 
20 threads and each tread I insert 1000 records and on each thread I open 
cluster connection session connection execute 1000 records and close the 
connection. This takes a lot of time please suggest if I missing something


--

Re: Cassandra on Ceph

2015-01-31 Thread Jan

Colin; 
Ceph is a block based storage architecture based on RADOS.    It comes with its 
own replication & rebalancing along with a map of the storage layer.    
Some more details & similarities: a)Ceph stores a client’s data as objects 
within storage pools.   (think of C* partitions)b) Using the CRUSH algorithm, 
Ceph calculates which placement group should contain the object, (C* primary 
keys & vnode data distribution) c) and further calculates which Ceph OSD Daemon 
should store the placement group   (C* node locality) d) The CRUSH algorithm 
enables the Ceph Storage Cluster to scale, rebalance, and recover dynamically 
(C* big table storage architecture). 
Summary: 
C*  comes with everything that Ceph provides (with the exception of block 
storage).      There is no value add that Ceph brings to the table that C* does 
not already provide. I seriously doubt if C* could even work out of the box 
with yet another level of replication & rebalancing.
Hope this helpsJan/
C* Architect


  

 On Saturday, January 31, 2015 7:28 PM, Colin Taylor 
 wrote:
   

 I may be forced to run Cassandra on top of Ceph. Does anyone have experience / 
tips with this. Or alternatively, strong reasons why this won't work. 
cheersColin

Re: Any problem mounting a keyspace directory in ram memory?

2015-02-01 Thread Jan

HI Gabriel; 
I don't think Apache Cassandra supports in-memory keyspaces. However Datastax 
Enterprise does support it. 
Quoting from Datastax: DataStax Enterprise includes the in-memory option for 
storing data to and accessing data from memory exclusively. No disk I/O occurs. 
Consider using the in-memory option for storing a modest amount of data, mostly 
composed of overwrites, such as an application for mirroring stock exchange 
data. Only the prices fluctuate greatly while the keys for the data remain 
relatively constant. Generally, the table you design for use in-memory should 
have the following characteristics:   
   - Store a small amount of data
   - Experience a workload that is mostly overwrites
   - Be heavily trafficked
Using the in-memory option | DataStax Enterprise 4.0 Documentation

|   |
|   |   |   |   |   |
| Using the in-memory option | DataStax Enterprise 4.0 DocumentationUsing the 
in-memory option |
|  |
| View on www.datastax.com | Preview by Yahoo |
|  |
|   |

  
hope this helpsJan
C* Architect
 

 On Sunday, February 1, 2015 1:32 PM, Gabriel Menegatti 
 wrote:
   

 Hi guys,
Please, does anyone here already mounted a specific keyspace directory to ram 
memory using tmpfs?
Do you see any problem doing so, except by the fact that the data can be lost?
Thanks in advance.
Regards,Gabriel.

Re: Help on modeling a table

2015-02-02 Thread Jan

HI Asit; 
The Partition key is only a part of the performance. Recommend reading this 
article:  Advanced Time Series with Cassandra  
|   |
|   |  |   |   |   |   |   |
| Advanced Time Series with CassandraDataStax - Software, support, and training 
for Apache Cassandra |
|  |
| View on www.datastax.com | Preview by Yahoo |
|  |
|   |



hope this helpsJan/ 
 

 On Monday, February 2, 2015 8:33 AM, Asit KAUSHIK 
 wrote:
   

 HI All
We are working on a application logging project and this is one of the search 
tables  as below :

CREATE TABLE logentries (    logentrytimestamputcguid timeuuid PRIMARY KEY,    
context text,    date_to_hour bigint,    durationinseconds float,    
eventtimestamputc timestamp,    ipaddress inet,    logentrytimestamputc 
timestamp,    loglevel int,    logmessagestring text,    logsequence int,    
message text,    modulename text,    productname text,    searchitems map,    servername text,    sessionname text,    stacktrace text,    
threadname text,    timefinishutc timestamp,    timestartutc timestamp,    
urihostname text,    uripathvalue text,    uriquerystring text,    
useragentstring text,    username text);
I have some queries on the design of this table :
1) Does a timeuuid is a good candidate for partition key  as we would be 
querying other fields with stargate-core full text project
This table is actually be used for search like username like '*john' likewise 
and uing this present model the performance is very slow .
Please advise
RegardsAsit

Re: OOM and high SSTables count

2015-03-04 Thread Jan

HI Roni; 
You mentioned: DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 
16GB of RAM and 5GB HEAP.

Best practices would be be to:a)  have a consistent type of node across both 
DC's.  (CPUs, Memory, Heap & Disk)
b)  increase heap on DC2 servers to be  8GB for C* Heap 
The leveled compaction issue is not addressed by this. hope this helps
Jan/

 

 On Wednesday, March 4, 2015 8:41 AM, Roni Balthazar 
 wrote:
   

 Hi there,

We are running C* 2.1.3 cluster with 2 DataCenters: DC1: 30 Servers /
DC2 - 10 Servers.
DC1 servers have 32GB of RAM and 10GB of HEAP. DC2 machines have 16GB
of RAM and 5GB HEAP.
DC1 nodes have about 1.4TB of data and DC2 nodes 2.3TB.
DC2 is used only for backup purposes. There are no reads on DC2.
Every writes and reads are on DC1 using LOCAL_ONE and the RF DC1: 2 and DC2: 1.
All keyspaces have STCS (Average 20~30 SSTables count each table on
both DCs) except one that is using LCS (DC1: Avg 4K~7K SSTables / DC2:
Avg 3K~14K SSTables).

Basically we are running into 2 problems:

1) High SSTables count on keyspace using LCS (This KS has 500GB~600GB
of data on each DC1 node).
2) There are 2 servers on DC1 and 4 servers in DC2 that went down with
the OOM error message below:

ERROR [SharedPool-Worker-111] 2015-03-04 05:03:26,394
JVMStabilityInspector.java:94 - JVM state determined to be unstable.
Exiting forcefully due to:
java.lang.OutOfMemoryError: Java heap space
        at 
org.apache.cassandra.db.composites.CompoundSparseCellNameType.copyAndMakeWith(CompoundSparseCellNameType.java:186)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.composites.AbstractCompoundCellNameType$CompositeDeserializer.readNext(AbstractCompoundCellNameType.java:286)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.AtomDeserializer.readNext(AtomDeserializer.java:104)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:426)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:350)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:142)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:44)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
~[guava-16.0.jar:na]
        at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
~[guava-16.0.jar:na]
        at 
org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:172)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:155)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
~[guava-16.0.jar:na]
        at 
com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138)
~[guava-16.0.jar:na]
        at 
org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:203)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:107)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:81)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:320)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:62)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:1915)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at 
org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1748)
~[apache-cassandra-2.1.3.jar:2.1.3]
        at org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:342)
~[apache-cassandra-2.1.3.jar:2.1.3]

Re: Write timeout under load but Read is fine

2015-03-04 Thread Jan

HI Jaydeep; 
   
   - look at the i/o  on all three nodes   

   - Increase the write_request_timeout_in_ms: 1   

   - check the time-outs if any on the client inserting the Writes   

   - check the Network for  dropped/lost packets   


hope this helpsJan/
 

 On Wednesday, March 4, 2015 12:26 PM, Jaydeep Chovatia 
 wrote:
   

 Hi,
In my test program when I increase load then I keep getting few "write timeout" 
from Cassandra say every 10~15 mins. My read:write ratio is 50:50. My reads are 
fine but only writes time out.

Here is my Cassandra details:Version: 2.0.11
Ring of 3 nodes with RF=3Node configuration: 24 core + 64GB RAM + 2TB
"write_request_timeout_in_ms: 5000", rest of Cassandra.yaml configuration is 
default
I've also checked IO on Cassandra nodes and looks very low (around 5%). I've 
also checked Cassandra log file and do not see any GC happening. Also CPU on 
Cassandra is low (around 20%). I have 20GB data on each node.
My test program creates connection to all three Cassandra nodes and sends 
read+write request randomly. 
Any idea what should I look for?
Jaydeep

Re: cassandra node jvm stall intermittently

2015-03-04 Thread Jan

HI Jason; 
Whats in the log files at the moment jstat shows 100%. What is the activity on 
the cluster & the node at the specific point in time (reads/ writes/ joins etc)
Jan/ 

 On Wednesday, March 4, 2015 5:59 AM, Jason Wee  wrote:
   

 Hi, our cassandra node using java 7 update 72 and we ran jstat on one of the 
node, and notice some strange behaviour as indicated by output below. any idea 
why when eden space stay the same for few seconds like 100% and 18.02% for few 
seconds? we suspect such "stalling" cause timeout to our cluster.
any idea what happened, what went wrong and what could cause this?

$ jstat -gcutil 32276 1s
  0.00   5.78  91.21  70.94  60.07   2657   73.437     4    0.056   73.493  
0.00   5.78 100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00  
 5.78 100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   4.65  
29.66  71.00  60.07   2659   73.488     4    0.056   73.544  0.00   4.65  70.88 
 71.00  60.07   2659   73.488     4    0.056   73.544  0.00   4.65  71.58  
71.00  60.07   2659   73.488     4    0.056   73.544  0.00   4.65  72.15  71.00 
 60.07   2659   73.488     4    0.056   73.544  0.00   4.65  72.33  71.00  
60.07   2659   73.488     4    0.056   73.544  0.00   4.65  72.73  71.00  60.07 
  2659   73.488     4    0.056   73.544  0.00   4.65  73.20  71.00  60.07   
2659   73.488     4    0.056   73.544  0.00   4.65  73.71  71.00  60.07   2659  
 73.488     4    0.056   73.544  0.00   4.65  73.84  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  73.91  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.18  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   5.43  12.64  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  69.24  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  78.05  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  78.97  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  79.07  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  79.18  71.09  60.07   2661   
73.534     4    0.056

Re: Write timeout under load but Read is fine

2015-03-05 Thread Jan

Hello Jaydeep;
Run cassandra-stress with R/W options enabled  for about the same time and 
check if you have dropped packets. It would eliminate the client as the source 
of the error & also give you a replicable tool to base subsequent tests/ 
findings. 
Jan/ 

 

 On Thursday, March 5, 2015 12:19 PM, Jaydeep Chovatia 
 wrote:
   

 I have tried increasing timeout to 1 but no help. Also verified that there 
is no network lost packets.
Jaydeep
On Wed, Mar 4, 2015 at 12:19 PM, Jan  wrote:

HI Jaydeep; 
   
   - look at the i/o  on all three nodes   

   - Increase the write_request_timeout_in_ms: 1   

   - check the time-outs if any on the client inserting the Writes   

   - check the Network for  dropped/lost packets   


hope this helpsJan/
 

 On Wednesday, March 4, 2015 12:26 PM, Jaydeep Chovatia 
 wrote:
   

 Hi,
In my test program when I increase load then I keep getting few "write timeout" 
from Cassandra say every 10~15 mins. My read:write ratio is 50:50. My reads are 
fine but only writes time out.

Here is my Cassandra details:Version: 2.0.11
Ring of 3 nodes with RF=3Node configuration: 24 core + 64GB RAM + 2TB
"write_request_timeout_in_ms: 5000", rest of Cassandra.yaml configuration is 
default
I've also checked IO on Cassandra nodes and looks very low (around 5%). I've 
also checked Cassandra log file and do not see any GC happening. Also CPU on 
Cassandra is low (around 20%). I have 20GB data on each node.
My test program creates connection to all three Cassandra nodes and sends 
read+write request randomly. 
Any idea what should I look for?
Jaydeep






-- 
Jaydeep

Re: cassandra node jvm stall intermittently

2015-03-06 Thread Jan

HI Jason; 
The single node showing the anomaly is a hint that the problem is probably 
local to a node (as you suspected).    
   - How many nodes do you have on the ring ?    

   - What is the activity when this occurs  - reads / writes/ compactions  ?    
  

   - Is there anything that is unique about this node that makes it different 
from the other nodes ?    

   - Is this a periodic occurance OR a single occurence -   I am trying to 
determine a pattern about when this shows up.    

   - What is the load distribution the ring (ie: is this node carrying more 
load than the others).   


The system.log should have  more info.,    about it.       
hope this helpsJan/


 

 On Friday, March 6, 2015 4:50 AM, Jason Wee  wrote:
   

 well, StatusLogger.java started shown in cassandra system.log, 
MessagingService.java also shown some stage (e.g. read, mutation) dropped. 
It's strange it only happen in this node but this type of message does not 
shown in other node log file at the same time... 
Jason
On Thu, Mar 5, 2015 at 4:26 AM, Jan  wrote:

HI Jason; 
Whats in the log files at the moment jstat shows 100%. What is the activity on 
the cluster & the node at the specific point in time (reads/ writes/ joins etc)
Jan/ 

 On Wednesday, March 4, 2015 5:59 AM, Jason Wee  wrote:
   

 Hi, our cassandra node using java 7 update 72 and we ran jstat on one of the 
node, and notice some strange behaviour as indicated by output below. any idea 
why when eden space stay the same for few seconds like 100% and 18.02% for few 
seconds? we suspect such "stalling" cause timeout to our cluster.
any idea what happened, what went wrong and what could cause this?

$ jstat -gcutil 32276 1s
  0.00   5.78  91.21  70.94  60.07   2657   73.437     4    0.056   73.493  
0.00   5.78 100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00  
 5.78 100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   5.78 
100.00  70.94  60.07   2657   73.437     4    0.056   73.493  0.00   4.65  
29.66  71.00  60.07   2659   73.488     4    0.056   73.544  0.00   4.65  70.88 
 71.00  60.07   2659   73.488     4    0.056   73.544  0.00   4.65  71.58  
71.00  60.07   2659   73.488     4    0.056   73.544  0.00   4.65  72.15  71.00 
 60.07   2659   73.488     4    0.056   73.544  0.00   4.65  72.33  71.00  
60.07   2659   73.488     4    0.056   73.544  0.00   4.65  72.73  71.00  60.07 
  2659   73.488     4    0.056   73.544  0.00   4.65  73.20  71.00  60.07   
2659   73.488     4    0.056   73.544  0.00   4.65  73.71  71.00  60.07   2659  
 73.488     4    0.056   73.544  0.00   4.65  73.84  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  73.91  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.18  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   4.65  74.29  71.00  60.07   2659   
73.488     4    0.056   73.544  0.00   5.43  12.64  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73.534     4    0.056   73.590  0.00   5.43  18.02  71.09  60.07   2661   
73

Pointers on deploying snitch for Multi region cluster

2015-03-09 Thread Jan

 HI Folks; 
We are planning to deploy a Multi region C* Cluster with   nodes on both US 
coasts. Need some advice : 
a)  As I do not have Public IP address access,  is there an alternative way to 
deploy EC2MultiRegion snitch using Private IP addresses ? b)  Has anyone used 
EC2_Snitch  with nodes on either coast & connected  multiple VPC's with EC2 
instances using  IPSec tunnels.  Did this work ? c)  Has anyone used  
"Gossiping_File_Property"  snitch & got it working successfully in a Multi 
region deployment. 
Advice/ gotchas/ input/ do's/  don'ts     much appreciated.
ThanksJan

Re: Best way to alert/monitor "nodetool status” down.

2015-03-09 Thread Jan

You could set up an Alert  for Node down within OpsCenter. OpsCenter also 
offers you the option to send an email to a paging system with reminders. 

Jan/ 

 On Sunday, March 8, 2015 6:10 AM, Vasileios Vlachos 
 wrote:
   

  We use Nagios for monitoring, and we call the following through NRPE:
 
 #!/bin/bash
 
 # Just for reference:
 # Nodetool's output represents "Status" ans "State" in this order.
 # Status values: U (up), D (down)
 # State values: N (normal), L (leaving), J (joining), M (moving)
 
 NODETOOL=$(which nodetool);
 NODES_DOWN=$(${NODETOOL} --host localhost status | grep --count -E '^D[A-Z]');
 
 if [[ ${NODES_DOWN} -gt 0 ]]; then
     output="CRITICAL - Nodes down: ${NODES_DOWN}";
     return_code=2;
 elif [[ ${NODES_DOWN} -eq 0 ]]; then
     output="OK - Nodes down: ${NODES_DOWN}";
     return_code=0;
 else
     output="UNKNOWN - Couldn't retrieve cluster information.";
     return_code=3;
 fi
 
 echo "${output}";
 exit "${return_code}";
 
 I've not used zabbix so I'm not sure the exit codes etc are the same for you. 
Also, you may need to modify the REGEX slightly depending on the Cassandra 
version you are using. There must be a way to get this via the JMX console as 
well, which might be easier for you to monitor.
 
 On 07/03/15 00:37, Kevin Burton wrote:
  
 What’s the best way to monitor nodetool status being down? IE if a specific 
server things a node is down (DN). 
  Does this just use JMX?  IS there an API we can call? 
  We want to tie it into our zabbix server so we can detect if here is failure. 
  -- 
 Founder/CEO Spinn3r.com
  Location: San Francisco, CA
  blog: http://burtonator.wordpress.com … or check out my Google+ profile   
 
 -- 
Kind Regards,

Vasileios Vlachos

IT Infrastructure Engineer
MSc Internet & Wireless Computing
BEng Electronics Engineering
Cisco Certified Network Associate (CCNA)

Re: Deleted snapshot files filling up /var/lib/cassandra

2015-03-16 Thread Jan

David; 
all the packaged installations use the /var/lib/cassandra directory. Could you 
check your yaml config files and see if you are using this default directory  
for backups 
May want to change it to a location with more disk space. 


hope this helpsJan/  


 On Monday, March 16, 2015 2:52 PM, David Wahler  wrote:
   

 We have a 16-node, globally-distributed cluster. running Cassandra
2.0.12. We're using the Datastax packages on CentOS 6.5.

Even though the total amount of data on each server is only a few
hundred MB (as measured by both du and the "load" metric), we're
seeing a problem where the disk usage is steadily increasing and
eventually filling up the 10GB /var/lib/cassandra partition. Running
"lsof" on the Cassandra process shows that it has open file handles
for thousands of deleted snapshot files:

$ sudo lsof -p 4753 | grep DEL -c
13314
$ sudo lsof -p 4753 | grep DEL | head
java 4753 cassandra DEL REG 253,6 538873
/var/lib/cassandra/data/keyspace/cf/snapshots/65bc8170-cc20-11e4-a355-0d37e54cc22e/keyspace-cf-jb-3979-Index.db
java 4753 cassandra DEL REG 253,6 538899
/var/lib/cassandra/data/keyspace/cf/snapshots/8cb41770-cc20-11e4-a355-0d37e54cc22e/keyspace-cf-jb-3983-Index.db
...etc...

We're not manually creating these snapshots; they're being generated
by periodic runs of "nodetool repair -pr". There are some errors in
system.log that seem to be related:

ERROR [RepairJobTask:10] 2015-03-16 02:02:12,485 RepairJob.java (line
143) Error occurred during snapshot phase
java.lang.RuntimeException: Could not create snapshot at /10.1.1.188
        at 
org.apache.cassandra.repair.SnapshotTask$SnapshotCallback.onFailure(SnapshotTask.java:81)
        at 
org.apache.cassandra.net.MessagingService$5$1.run(MessagingService.java:344)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
ERROR [AntiEntropySessions:4] 2015-03-16 02:02:12,486
RepairSession.java (line 288) [repair
#55a8eb50-cbaa-11e4-9af9-27d7677e5965] session completed with the
following error
java.io.IOException: Failed during snapshot creation.
        at 
org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:323)
        at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:144)
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1160)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
ERROR [AntiEntropySessions:4] 2015-03-16 02:02:12,488
CassandraDaemon.java (line 199) Exception in thread
Thread[AntiEntropySessions:4,5,RMI Runtime]
java.lang.RuntimeException: java.io.IOException: Failed during
snapshot creation.
        at com.google.common.base.Throwables.propagate(Throwables.java:160)
        at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:32)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Failed during snapshot creation.
        at 
org.apache.cassandra.repair.RepairSession.failedSnapshot(RepairSession.java:323)
        at org.apache.cassandra.repair.RepairJob$2.onFailure(RepairJob.java:144)
        at com.google.common.util.concurrent.Futures$4.run(Futures.java:1160)
        ... 3 more

Has anyone encountered this problem before? The same stack trace shows
up in CASSANDRA-8020, but that bug was supposedly introduced in 2.1.0
and fixed in 2.1.1. In any case, we don't want to upgrade to 2.1.x,
since the consensus on this list seems to be that it's not yet
production-ready.

I'm fairly new to Cassandra, so general troubleshooting tips would
also be much appreciated.

Thanks,
-- David

Re: Problems after trying a migration

2015-03-18 Thread Jan


Hi David; 
some input to get back to where you were : a) Start with the French cluster 
only and get it working with DSE 4.5.1 b) Opscenter keyspace is by default RF1; 
  alter the keyspace to RF3 c) Take a full snapshot of all your nodes & copy 
the files to a safe location on all the nodes 
To migrate the data into new cluster: a) Use the same version DSE 4.5.1 in 
Luxembourg & bring up 1 node at a time.    Check that the node has comeup in 
the new Datacenter.b) Bring up new nodes into the new Datacenter one at a 
timec)  After all your new nodes are UP in Luxembourg, conduct a 'nodetool 
repair -parallel'    d)  Check in OpsCenter that you have all your nodes 
showing up (new and old)e) Start taking down your nodes in France, one at  a 
timef) After all the nodes in France are down,  conduct a 'nodetool repair 
-parallel'  again g) Upgrade the nodes in Luxembourg to DSE 4.6.1 h)  conduct a 
'nodetool repair -parallel'  again i) Upgrade to  OpsCenter 5.1  
Best of luck,  hope this helps. 
Jan/
 



 On Wednesday, March 18, 2015 1:01 PM, Robert Coli  
wrote:
   

 On Wed, Mar 18, 2015 at 9:05 AM, David CHARBONNIER 
 wrote:

- New nodes in the other country have been installed like French nodes 
except for Datastax Enterprise version (4.5.1 in France and 4.6.1 in the other 
country which means Cassandra version 2.0.8.39 in France and 2.0.12.200 in the 
other country)

This is officially unsupported, and might cause of problems during this process.
=Rob

Re: best way to measure repair times?

2015-03-19 Thread Jan

Ian; 
to respond to your specific question:
You could pipe the output of your repair into a file and subsequently determine 
the time taken.    example: nodetool repair -dc DC1
[2014-07-24 21:59:55,326] Nothing to repair for keyspace 'system'
[2014-07-24 21:59:55,617] Starting repair command #2, repairing 490 ranges 
  for keyspace system_traces (seq=true, full=true)
[2014-07-24 22:23:14,299] Repair session 323b9490-137e-11e4-88e3-c972e09793ca 
  for range (820981369067266915,822627736366088177] finished
[2014-07-24 22:23:14,320] Repair session 38496a61-137e-11e4-88e3-c972e09793ca 
  for range (2506042417712465541,2515941262699962473] finished
What to look for: a)  Look for the specific name of the Keyspace & the word 
'starting repair'b)  Look for the word 'finished'. c)  Compute the average time 
per keyspace and you would be able to have a rough idea of how long your 
repairs would take on a regular basis.This is only for continual 
operational repair, not the first time its done.  
hope this helpsJan/
 



 On Thursday, March 19, 2015 12:55 PM, Paulo Motta 
 wrote:
   

 From: http://www.datastax.com/dev/blog/modern-hinted-handoff

Repair and the fine print
At first glance, it may appear that Hinted Handoff lets you safely get away 
without needing repair. This is only true if you never have hardware failure. 
Hardware failure means that   
   - We lose “historical” data for which the write has already finished, so 
there is nothing to tell the rest of the cluster exactly what data has gone 
missing
   - We can also lose hints-not-yet-replayed from requests the failed node 
coordinated
With sufficient dedication, you can get by with “only run repair after hardware 
failure and rely on hinted handoff the rest of the time,” but as your clusters 
grow (and hardware failure becomes more common) performing repair as a one-off 
special case will become increasingly difficult to do perfectly. Thus, we 
continue to recommend running a full repair weekly.

2015-03-19 16:42 GMT-03:00 Robert Coli :

On Thu, Mar 19, 2015 at 12:13 PM, Ali Akhtar  wrote:

Cassandra doesn't guarantee eventual consistency? 

If you run regularly scheduled repair, it does. If you do not run repair, it 
does not.
Hinted handoff, for example, is considered an optimization for repair, and does 
not assert that it provides a consistency guarantee.
=Rob http://twitter.com/rcolidba



-- 
Paulo Ricardo
-- 
European Master in Distributed Computing
Royal Institute of Technology - KTH
Instituto Superior Técnico - ISThttp://paulormg.com

Re: active queries

2015-03-19 Thread Jan

HI Rahul; 
your question: Can we see active queries on cassandra cluster. Is there any 
tool?
Answer:     nodetool tpstats  &  nodetool  cfsstats  The nodetool tpstats 
command provides statistics about the number of active, pending, and completed 
tasks for each stage of Cassandra operations by thread pool.   You should be  
looking for the very first  Row :  ReadStage

The nodetool cfstats command displays statistics for each table and keyspace.   
 You should be  looking for the first row: Read Count:

cheersJan/
 



 On Thursday, March 19, 2015 12:13 AM, Rahul Bhardwaj 
 wrote:
   

 Hi ,
Can we see active queries on cassandra cluster. Is there any tool?
Please help.

Regards:Rahul Bhardwaj

Follow IndiaMART.com for latest updates on this and more:Mobile Channel:   

Watch how IndiaMART Maximiser helped Mr. Khanna expand his business. kyunki 
Kaam Yahin Banta Hai!!!

Re: Delete columns

2015-03-19 Thread Jan

Benyi ; 
have you considered using the TTL option in case your columns are meant to be 
deleted after a predetermined amount of time ? Its probably the easiest way to 
get the task accomplished.
cheersJan 


 On Friday, February 27, 2015 10:38 AM, Benyi Wang  
wrote:
   

 In C* 2.1.2, is there a way you can delete without specifying the row key?
create table (  guid text,  key1 text,  key2 text,  data int  primary key 
(guid, key1, key2));
delete from a_table where key1='' and key2='';
I'm trying to avoid doing like this:* query the table to get guids (32 bytes 
long) * send back delete queries like this
delete from a_table where guid in (...) and key1='' and kye2=''.
key1 and key2 only have 3~4 values, if I try to create multiple tables like 
table_kvi_kvj, it will be easy to delete, but results in the large dataset 
because of the duplicated guids.
Because the CQL model will create a cassandra column family like 
guid, kv1-kv2, .., kvi-kvj, ..., kvn-kvm, ...
Is there an API can drop columns in a column familty?

Re: FileNotFoundException

2015-03-19 Thread Jan

HI Batranut;
In both errors you described above the files seem to be missing while 
compaction is running. Without knowing what else is going on your system,  I 
would presume that this error occurs on this single node only and not your 
entire cluster. 
Some guesses:a)  You may have a disk corruption problem.     Take the node 
offline and run a diskcheck.b)   Take the node offline, wipe it clean of 
everything and have it rejoin the cluster.   Check if the problem recurs.
Hope this helpsJan

 


 On Tuesday, February 24, 2015 2:56 AM, Batranut Bogdan 
 wrote:
   

 Also I must add that grepping the logs for a particular file I see this:
 INFO [CompactionExecutor:19] 2015-02-24 10:44:35,618 CompactionTask.java (line 
120) Compacting 
[SSTableReader(path='/data/ranks/positions/ranks-positions-jb-339-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-354-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-408-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-286-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-20-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-127-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-357-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-257-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-316-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-41-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-285-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-338-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-180-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-398-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-249-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-284-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-294-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-248-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-377-Data.db'), 
SSTableReader(path='/data/ranks/positions/ranks-positions-jb-395-Data.d ...

also several entries like this in the log after 
grep.java.lang.RuntimeException: java.io.FileNotFoundException: 
/data/ranks/positions/ranks-positions-jb-41-Data.db (No such file or 
directory)Caused by: java.io.FileNotFoundException: 
/data/ranks/positions/ranks-positions-jb-41-Data.db (No such file or directory)


I was grepping for jb-41-Data.db ...  seems that this file does not exist for 
some reason. I must say that when I first added the node I included it's IP in 
the seeds list. Then I have decommissioned it, removed it's IP from the seed 
list, deleted all data / commit log / saved caches and started it. Since then I 
have not manualy deleted any files . 
Any ideeas?

  On Tuesday, February 24, 2015 11:46 AM, Batranut Bogdan 
 wrote:
   

 Hello all,
One of my C* throws a big amount of exceptions like this:

ERROR [ReadStage:792] 2015-02-24 10:43:54,183 CassandraDaemon.java (line 199) 
Exception in thread Thread[ReadStage:792,5,main]java.lang.RuntimeException: 
java.lang.RuntimeException: java.io.FileNotFoundException: 
/data/ranks/positions/ranks-positions-jb-174-Data.db (No such file or 
directory)        at 
org.apache.cassandra.service.StorageProxy$DroppableRunnable.run(StorageProxy.java:2008)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
       at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
       at java.lang.Thread.run(Thread.java:744)Caused by: 
java.lang.RuntimeException: java.io.FileNotFoundException: 
/data/ranks/positions/ranks-positions-jb-174-Data.db (No such file or 
directory)        at 
org.apache.cassandra.io.compress.CompressedRandomAccessReader.open(CompressedRandomAccessReader.java:47)
        at 
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile.createReader(CompressedPoolingSegmentedFile.java:48)
        at 
org.apache.cassandra.io.util.PoolingSegmentedFile.getSegment(PoolingSegmentedFile.java:39)
        at 
org.apache.cassandra.io.sstable.SSTableReader.getFileDataInput(SSTableReader.java:1239)
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.getNextBlock(IndexedSliceReader.java:417)
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader$IndexedBlockFetcher.fetchMoreData(IndexedSliceReader.java:387)
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:145)
        at 
org.apache.cassandra.db.columniterator.IndexedSliceReader.computeNext(IndexedSliceReader.java:45)
        at 
com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143)
        at 
com.google.common.collect.AbstractIterator.hasNext(A

Re: Cassandra Read Timeout

2015-03-19 Thread Jan

Yulian; 
Quote :Raw size is aroung 190MB.There are bigger raws with similar structure ( 
its index raws , which actually stores keys ) and everything is working fine on 
them, everything is working also fine on this cf but on other raw.
Tables data from CFStats ( First table has bigger raws but works fine , where 
second has timeout ) :
---
You asked: There are bigger raws with similar structure Question:  Do you mean 
bigger rows  ?      What is the structure of the statuspindexes Keyspace & 
which table are you querying within it ?
you asked:  its index raws , which actually stores keysQuestion:  do you mean 
Index rows ?     how are you creating Indexes  , what type of Indexes ?  
you asked: Tables data from CFStats,  where second has timeout Question:  What 
is the time out value set at  & whats different about both these tables ?     
What are you querying from the second table ? 

Unfortunately,  I have more questions that answers;  however  despite the 
sacrilege of using super-columns (lol), there has got to be a logical answer to 
the Performance problem you are having.       Hopefully we could dig in and 
find an answer . 

Jan/  



 



 On Tuesday, February 24, 2015 12:00 PM, Robert Coli  
wrote:
   

 On Tue, Feb 24, 2015 at 8:50 AM, Yulian Oifa  wrote:

The structure is the same , the CFs are super column CFs , where key is long  ( 
timestamp to partition the index , so each 11 days new row is created ) , super 
Column is int32 and columns / values are timeuuids.I am running same queries , 
getting reversed slice by raw key and super column.

Obligatory notice that Super Columns are not really recommended for use. I have 
no idea if the performance problem you are seeing is related to the use of 
Super Columns.
=Rob

Re: Out of Memory Error While Opening SSTables on Startup

2015-03-19 Thread Jan

Paul Nickerson; 
curious, did you get a solution to your problem ? 
Regards,Jan/  



 On Tuesday, February 10, 2015 5:48 PM, Flavien Charlon 
 wrote:
   

 I already experienced the same problem (hundreds of thousands of SSTables) 
with Cassandra 2.1.2. It seems to appear when running an incremental repair 
while there is a medium to high insert load on the cluster. The repair goes in 
a bad state and starts creating way more SSTables than it should (even when 
there should be nothing to repair).
On 10 February 2015 at 15:46, Eric Stevens  wrote:

This kind of recovery is definitely not my strong point, so feedback on this 
approach would certainly be welcome.
As I understand it, if you really want to keep that data, you ought to be able 
to mv it out of the way to get your node online, then move those files in a 
several thousand at a time, nodetool refresh OpsCenter rollups60 && nodetool 
compact OpsCenter rollups60; rinse and repeat.  This should let you 
incrementally restore the data in that keyspace without putting so many 
sstables in there that it ooms your cluster again.
On Tue, Feb 10, 2015 at 3:38 PM, Chris Lohfink  wrote:

yeah... probably just 2.1.2 things and not compactions.  Still probably want to 
do something about the 1.6 million files though.  It may be worth just 
mv/rm'ing to 60 sec rollup data though unless really attached to it.
Chris
On Tue, Feb 10, 2015 at 4:04 PM, Paul Nickerson  wrote:

I was having trouble with snapshots failing while trying to repair that table 
(http://www.mail-archive.com/user@cassandra.apache.org/msg40686.html). I have a 
repair running on it now, and it seems to be going successfully this time. I am 
going to wait for that to finish, then try a manual nodetool compact. If that 
goes successfully, then would it be safe to chalk the lack of compaction on 
this table in the past up to 2.1.2 problems?

 ~ Paul Nickerson
On Tue, Feb 10, 2015 at 3:34 PM, Chris Lohfink  wrote:

Your cluster is probably having issues with compactions (with STCS you should 
never have this many).  I would probably punt with OpsCenter/rollups60. Turn 
the node off and move all of the sstables off to a different directory for 
backup (or just rm if you really don't care about 1 minute metrics), than turn 
the server back on. 
Once you get your cluster running again go back and investigate why compactions 
stopped, my guess is you hit an exception in past that killed your 
CompactionExecutor and things just built up slowly until you got to this point.
Chris
On Tue, Feb 10, 2015 at 2:15 PM, Paul Nickerson  wrote:

Thank you Rob. I tried a 12 GiB heap size, and still crashed out. There are 
1,617,289 files under OpsCenter/rollups60.
Once I downgraded Cassandra to 2.1.1 (apt-get install cassandra=2.1.1), I was 
able to start up Cassandra OK with the default heap size formula.
Now my cluster is running multiple versions of Cassandra. I think I will 
downgrade the rest to 2.1.1.
 ~ Paul Nickerson
On Tue, Feb 10, 2015 at 2:05 PM, Robert Coli  wrote:

On Tue, Feb 10, 2015 at 11:02 AM, Paul Nickerson  wrote:

I am getting an out of memory error why I try to start Cassandra on one of my 
nodes. Cassandra will run for a minute, and then exit without outputting any 
error in the log file. It is happening while SSTableReader is opening a couple 
hundred thousand things.
... 
Does anyone know how I might get Cassandra on this node running again? I'm not 
very familiar with correctly tuning Java memory parameters, and I'm not sure if 
that's the right solution in this case anyway.

Try running 2.1.1, and/or increasing heap size beyond 8gb.
Are there actually that many SSTables on disk?
=Rob

Re: Logging client ID for YCSB workloads on Cassandra?

2015-03-20 Thread Jan

HI Jatin; 
besides enabling Tracing,   is there any other way to get the task done  ?  (to 
log the client ID for every operation)Please share with the community the 
solution, so that we could collectively learn from your experience. 
cheersJan/ 


 On Friday, February 20, 2015 12:48 PM, Jatin Ganhotra 
 wrote:
   

 Never mind, got it working.
Thanks :)
— 
Jatin GanhotraGraduate Student, Computer ScienceUniversity of Illinois at 
Urbana Champaignhttp://jatinganhotra.comhttp://linkedin.com/in/jatinganhotra

On Wed, Feb 18, 2015 at 7:09 PM, Jatin Ganhotra  
wrote:

Hi,
I'd like to log the client ID for every operation performed by the YCSB on my 
Cassandra cluster.
The purpose is to identify & analyze various other consistency measures other 
than eventual consistency.
I wanted to know if people have done something similar in the past. Or am I 
missing something really basic here?
Please let me know if you need more information. Thanks
— 
Jatin Ganhotra

Re: Cluster status instability

2015-04-02 Thread Jan

Marcin  ; 
are all your nodes within the same Region   ?   If not in the same region,   
what is the Snitch type that you are using   ? 
Jan/ 


 On Thursday, April 2, 2015 3:28 AM, Michal Michalski 
 wrote:
   

 Hey Marcin,
Are they actually going up and down repeatedly (flapping) or just down and they 
never come back?There might be different reasons for flapping nodes, but to 
list what I have at the top of my head right now:
1. Network issues. I don't think it's your case, but you can read about the 
issues some people are having when deploying C* on AWS EC2 (keyword to look 
for: phi_convict_threshold)
2. Heavy load. Node is under heavy load because of massive number of reads / 
writes / bulkloads or e.g. unthrottled compaction etc., which may result in 
extensive GC.
Could any of these be a problem in your case? I'd start from investigating GC 
logs e.g. to see how long does the "stop the world" full GC take (GC logs 
should be on by default from what I can see [1])
[1] https://issues.apache.org/jira/browse/CASSANDRA-5319
Michał

Kind regards,Michał Michalski,michal.michal...@boxever.com
On 2 April 2015 at 11:05, Marcin Pietraszek  wrote:

Hi!

We have 56 node cluster with C* 2.0.13 + CASSANDRA-9036 patch
installed. Assume we have nodes A, B, C, D, E. On some irregular basis
one of those nodes starts to report that subset of other nodes is in
DN state although C* deamon on all nodes is running:

A$ nodetool status
UN B
DN C
DN D
UN E

B$ nodetool status
UN A
UN C
UN D
UN E

C$ nodetool status
DN A
UN B
UN D
UN E

After restart of A node, C and D report that A it's in UN and also A
claims that whole cluster is in UN state. Right now I don't have any
clear steps to reproduce that situation, do you guys have any idea
what could be causing such behaviour? How this could be prevented?

It seems like when A node is a coordinator and gets request for some
data being replicated on C and D it respond with Unavailable
exception, after restarting A that problem disapears.

--
mp

Re: Combining two clusters/keyspaces into single cluster

2016-04-25 Thread Jan Kesten


Hi,

one way I think might work (but not tested in any way by me and there 
will be some lag / stale data):


- create the keyspace2 von cluster1
- use nodetool flush and snapshot on cluster2, remember the timestamp
- use sstableloader to write all sstables from cluster2 snapshot to cluster1
- you can repeat last two steps and use sstableload only on tables with 
mtime > timestamp to add the differencens to cluster1

- shutdown cluster2 when done

Of course, data written by old clients to cluster2 wont be available in 
cluster1 until loading that data into it.


Just my 2 cents :)

Jan


Am 22.04.2016 um 01:15 schrieb Arlington Albertson:

Hey Folks,

I've been looking through various documentations, but I'm either 
overlooking something obvious or not wording it correctly, but the 
gist of my problem is this:


I have two cassandra clusters, with two separate keyspaces on EC2. 
We'll call them as follows:


*cluster1* (DC name, cluster name, etc...)
*keyspace1* (only exists on cluster1)

*cluster2* (DC name, cluster name, etc...)
*keyspace2*(only exists on cluster2)

I need to perform the following:
- take keyspace2, and add it to cluster1 so that all nodes can serve 
the traffic
- needs to happen "live" so that I can repoint new instances to the 
cluster1 endpoints and they'll just start working, and no longer 
directly use cluster2
- eventually, tear down cluster2 (easy with a `nodetool decommission` 
after verifying all seeds have been changed, etc...)


This doc seems to be the closest I've found thus far:
https://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_add_dc_to_cluster_t.html

Is that the appropriate guide for this and I'm just over thinking it? 
Or is there something else I should be looking at?


Also, this is DSC C* 2.1.13.

TIA!

-AA

Nodetool Cleanup Problem

2016-05-08 Thread Jan Ali

Hi All, 

I use cassandra 3.4.When running 'nodetool cleanup' command , see this error?
error: Expecting URI in variable: [cassandra.config]. Found[cassandra.yaml]. 
Please prefix the file with [file:///] for local files and [file:///] 
for remote files. If you are executing this from an external tool, it needs to 
set Config.setClientMode(true) to avoid loading configuration.
-- StackTrace --
org.apache.cassandra.exceptions.ConfigurationException: Expecting URI in 
variable: [cassandra.config]. Found[cassandra.yaml]. Please prefix the file 
with [file:///] for local files and [file:///] for remote files. If you 
are executing this from an external tool, it needs to set 
Config.setClientMode(true) to avoid loading configuration.
    at 
org.apache.cassandra.config.YamlConfigurationLoader.getStorageConfigURL(YamlConfigurationLoader.java:78)
    at 
org.apache.cassandra.config.YamlConfigurationLoader.(YamlConfigurationLoader.java:92)
    at 
org.apache.cassandra.config.DatabaseDescriptor.loadConfig(DatabaseDescriptor.java:134)
    at 
org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:121)
    at 
org.apache.cassandra.config.CFMetaData$Builder.(CFMetaData.java:1160)
    at 
org.apache.cassandra.config.CFMetaData$Builder.create(CFMetaData.java:1175)
    at 
org.apache.cassandra.config.CFMetaData$Builder.create(CFMetaData.java:1170)
    at 
org.apache.cassandra.cql3.statements.CreateTableStatement.metadataBuilder(CreateTableStatement.java:118)
    at org.apache.cassandra.config.CFMetaData.compile(CFMetaData.java:413)
    at 
org.apache.cassandra.schema.SchemaKeyspace.compile(SchemaKeyspace.java:238)
    at 
org.apache.cassandra.schema.SchemaKeyspace.(SchemaKeyspace.java:88)
    at org.apache.cassandra.config.Schema.(Schema.java:96)
    at org.apache.cassandra.config.Schema.(Schema.java:50)
    at org.apache.cassandra.tools.nodetool.Cleanup.execute(Cleanup.java:45)
    at org.apache.cassandra.tools.NodeTool$NodeToolCmd.run(NodeTool.java:248)
    at org.apache.cassandra.tools.NodeTool.main(NodeTool.java:162)

Can anyone help me?
Best regards,
Jan Ali

Are updates on map columns causing tombstones?

2016-07-11 Thread Jan Algermissen


Hi,

when I replace the content of a map-valued column (when I replace the complete 
map), will this create tombstones for those map entries that are not present in 
the new map?

My expectation is 'yes', because the map is laid out as normal columns 
internally so keys not in the new map should lead to a delete.

Is that correct?

Jan

Re: Thousands of SSTables generated in only one node

2016-10-25 Thread Jan Kesten


Hi Lahiru,

maybe your node was running out of memory before. I saw this behaviour 
if available heap is low forcing to flush out memtables to sstables 
quite often.


If this is that what is hitting you, you should see that the sstables 
are really small.


To cleanup, nodetool compact would do the job - but if you do not need 
data from one of the keyspaces at all just drop and recreate it (but 
look into your data directory if there are snapshots left). Prevent this 
in future have a close look at heap consumption and maybe give it more 
memory.


HTH,
Jan

Re: Thousands of SSTables generated in only one node

2016-10-25 Thread Jan Kesten


Hi Lahiru,

2.1.0 is also quite old (Sep 2014) - and just from my memory I 
remembered that there was an issue whe had with cold_reads_to_omit:


http://grokbase.com/t/cassandra/user/1523sm4y0r/how-to-deal-with-too-many-sstables
https://www.mail-archive.com/search?l=user@cassandra.apache.org&q=subject:%22Re%3A+Compaction+failing+to+trigger%22&o=newest&f=1

That's just a random google hits but maybe that also helps.

I ended up with a few thousand sstables smaller than 1MB in size. 
However I would suggest upgrading to a newer version of cassandra first 
before diving too deep into this - maybe 2.1.16 or 2.2.8 - as chances 
are really good your problems will be gone after that.


Regards.
Jan

Re: Hotspots / Load on Cassandra node

2016-10-25 Thread Jan Kesten

Hi,

can you check the size of your data directories on that machine to verify in 
comparison to the others?

Have a look for snapshot directories which could still be there from a former 
table or keyspace.

Regards,
Jan

Am 26. Oktober 2016 06:53:03 MESZ, schrieb Harikrishnan A :
>Hello,
>When I am issuing nodetool status, I see the load ( in GB) on one of
>the node is high compare to the other nodes in my ring.
>I do not see any issues with the Data Modeling, and it looks like the
>Partition sizes are almost evenly sized and distributed across the
>nodes.  Repairs are running properly.   
>How do I approach and fix this issue?. 
>
>Thanks & Regards,Hari

-- 
Diese Nachricht wurde von meinem Android-Mobiltelefon mit K-9 Mail gesendet.

Rust Cassandra Driver?

2016-11-26 Thread Jan Algermissen


Hi,

I am looking for a driver for the Rust language. I found some projects 
which seem quite abandoned.


Can someone point me to the driver that makes the most sense to look at 
or help working on?


Cheers,

Jan

Re: Cluster scaling

2017-02-08 Thread Jan Kesten


Hi Branislav,

what is it you would expect?

Some thoughts:

Batches are often misunderstood, they work well only if they contain 
only one partition key - think of a batch of different sensor data to 
one key. If you group batches with many partition keys and/or do large 
batches this puts high load on the coordinator node with then itself 
needs to talk to the nodes holding the partitions. This could explain 
the scaling you see in your second try without batches. Keep in mind 
that the driver supports executeAsync and ResultSetFutures.


Second, put commitlog and data directories on seperate disks when using 
spindles.


Third, have you monitored iostats and cpustats while running your tests?

Cheers,

Jan

Am 08.02.2017 um 16:39 schrieb Branislav Janosik -T (bjanosik - AAP3 INC 
at Cisco):


Hi all,

I have a cluster of three nodes and would like to ask some questions 
about the performance.


I wrote a small benchmarking tool in java that mirrors (read, write) 
operations that we do in the real project.


Problem is that it is not scaling like it should. The program runs two 
tests: one using batch statement and one without using the batch.


The operation sequence is: optional select, insert, update, insert. I 
run the tool on my server with 128 threads (# of threads has no 
influence on the performance),


creating usually 100K resources for testing purposes.

The average results (operations per second) with the use of batch 
statement are:


Replication Factor = 1   with readingwithout reading

1-node cluster 37K 46K

2-node cluster 37K 47K

3-node cluster 39K 70K

Replication Factor = 2   with readingwithout reading

2-node cluster 21K 40K

3-node cluster 30K 48K

The average results (operations per second) without the use of batch 
statement are:


Replication Factor = 1   with readingwithout reading

1-node cluster 31K 20K

2-node cluster 38K 39K

3-node cluster 45K 87K

Replication Factor = 2   with readingwithout reading

2-node cluster 19K 22K

3-node cluster 26K 36K

The Cassandra VMs specs are: 16 CPUs,  16GB and two 32GB of RAM, at 
least 30GB of disk space for each node. Non SSD, each VM is on 
separate physical server.


The code is available here 
https://github.com/bjanosik/CassandraBenchTool.git . It can be built 
with Maven and then you can use jar in target directory with java -jar 
target/cassandra-test-1.0-SNAPSHOT-jar-with-dependencies.jar .


Thank you for any help.



--
Jan Kesten, mailto:j.kes...@enercast.de
Tel.: +49 561/4739664-0 FAX: -9 Mobil: +49 160 / 90 98 41 68
enercast GmbH Universitätsplatz 12 D-34127 Kassel HRB15471
http://www.enercast.de Online-Prognosen für erneuerbare Energien
Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO)

Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese 
E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail 
oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank.

This e-mail and any attachment may contain confidential and/or privileged 
information. If you are not the named addressee or if this transmission has 
been addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or disclose the 
contents to any other person. Thank you for your cooperation.

Re: Count(*) is not working

2017-02-16 Thread Jan Kesten


Hi,

do you got a result finally?

Those messages are simply warnings telling you that c* had to read many 
tombstones while processing your query - rows that are deleted but not 
garbage collected/compacted. This warning gives you some explanation why 
things might be much slower than expected because per 100 rows that 
count c* had to read about 15 times rows that were deleted already.


Apart from that, count(*) is almost always slow - and there is a default 
limit of 10.000 rows in a result.


Do you really need the actual live count? To get a idea you can always 
look at nodetool cfstats (but those numbers also contain deleted rows).



Am 16.02.2017 um 13:18 schrieb Selvam Raman:

Hi,

I want to know the total records count in table.

I fired the below query:
 select count(*) from tablename;

and i have got the below output

Read 100 live rows and 1423 tombstone cells for query SELECT * FROM 
keysace.table WHERE token(id) > 
token(test:ODP0144-0883E-022R-002/047-052) LIMIT 100 (see 
tombstone_warn_threshold)


Read 100 live rows and 1435 tombstone cells for query SELECT * FROM 
keysace.table WHERE token(id) > token(test:2565-AMK-2) LIMIT 100 (see 
tombstone_warn_threshold)


Read 96 live rows and 1385 tombstone cells for query SELECT * FROM 
keysace.table WHERE token(id) > token(test:-2220-UV033/04) LIMIT 100 
(see tombstone_warn_threshold).





Can you please help me to get the total count of the table.

--
Selvam Raman
"லஞ்சம் தவிர்த்து நெஞ்சம் நிமிர்த்து"

Re: Read after Write inconsistent at times

2017-02-24 Thread Jan Kesten


Hi,

are your nodes at high load? Are there any dropped messages (nodetool 
tpstats) on any node?


Also have a look at your system clocks. C* needs them in thight sync - 
via ntp for example. Side hint: if you use ntp use the same set of 
upstreams on all of your nodes - ideal your own one. Using pool.ntp.org 
might lead to minimal dirfts in time across your cluster.


Another thing that could help you out is using client side timestamps: 
https://docs.datastax.com/en/developer/java-driver/3.1/manual/query_timestamps/ 
(of course only when you are using a single client or all clients are in 
sync via ntp).



Am 24.02.2017 um 07:29 schrieb Charulata Sharma (charshar):


Hi All,

In my application sometimes I cannot read data that just got inserted. 
This happens very intermittently. Both write and read use LOCAL QUOROM.


We have a cluster of 12 nodes which spans across 2 Data Centers and a 
RF of 3.


Has anyone encountered this problem and if yes what steps have you 
taken to solve it


Thanks,
Charu


--
Jan Kesten, mailto:j.kes...@enercast.de
Tel.: +49 561/4739664-0 FAX: -9 Mobil: +49 160 / 90 98 41 68
enercast GmbH Universitätsplatz 12 D-34127 Kassel HRB15471
http://www.enercast.de Online-Prognosen für erneuerbare Energien
Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), Philipp Rinder (CSO)

Diese E-Mail und etwaige Anhänge können vertrauliche und/oder rechtlich 
geschützte Informationen enthalten. Falls Sie nicht der angegebene Empfänger 
sind oder falls diese E-Mail irrtümlich an Sie adressiert wurde, 
benachrichtigen Sie uns bitte sofort durch Antwort-E-Mail und löschen Sie diese 
E-Mail nebst etwaigen Anlagen von Ihrem System. Ebenso dürfen Sie diese E-Mail 
oder ihre Anlagen nicht kopieren oder an Dritte weitergeben. Vielen Dank.

This e-mail and any attachment may contain confidential and/or privileged 
information. If you are not the named addressee or if this transmission has 
been addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or disclose the 
contents to any other person. Thank you for your cooperation.

LOCAL_SERIAL

2015-10-15 Thread Jan Algermissen


Hi,

suppose I have two data centers and want to coordinate a bunch of services in 
each data center (for example to load data into a per-DC system that is not 
DC-aware (Solr)).

Does it make sense to use CAS functionality with explicit LOCAL_SERIAL to 
'elect' a leader per data center to do the work?

So instead of saying 'for this query, LOCAL_SERIAL is enough for me' this would 
be like saying 'I want XYZ to happen exactly once, per data center'. - All 
services would try to do XYZ, but only one instance *per datacenter* will 
actually become the leader and succeed.

Makes sense?

Jan

ClosedChannelExcption while nodetool repair

2016-01-12 Thread Jan Kesten

Hi,

I have some problems recently on my cassandra cluster. I am running 12
nodes with 2.2.4 and while repairing with a plain "nodetool repair". In
system.log I can find

ERROR [STREAM-IN-/172.17.2.233] 2016-01-08 08:32:38,327
StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]
Streaming error occurred
java.nio.channels.ClosedChannelException: null

on one node and at the same time in the the node mentioned in the Log:

INFO  [STREAM-IN-/172.17.2.223] 2016-01-08 08:32:38,073
StreamResultFuture.java:168 - [Stream
#5f96e8b0-b5e2-11e5-b4da-4321ac9959ef ID#0] Prepare completed. Receiving
2 files(46708049 bytes), sending 2 files(1856721742 bytes)
ERROR [STREAM-OUT-/172.17.2.223] 2016-01-08 08:32:38,325
StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]
Streaming error occurred
org.apache.cassandra.io.FSReadError: java.io.IOException: Datenübergabe
unterbrochen (broken pipe)
at
org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144)
~[apache-cassandra-2.2.4.jar:2.2.4]


Full relevant NFO  [STREAM-IN-/172.17.2.223] 2016-01-08 08:32:38,073
StreamResultFuture.java:168 - [Stream
#5f96e8b0-b5e2-11e5-b4da-4321ac9959ef ID#0] Prepare completed. Receiving
2 files(46708049 bytes), sending 2 files(1856721742 bytes)
ERROR [STREAM-OUT-/172.17.2.223] 2016-01-08 08:32:38,325
StreamSession.java:524 - [Stream #5f96e8b0-b5e2-11e5-b4da-4321ac9959ef]
Streaming error occurred
org.apache.cassandra.io.FSReadError: java.io.IOException: Datenübergabe
unterbrochen (broken pipe)
at
org.apache.cassandra.io.util.ChannelProxy.transferTo(ChannelProxy.java:144)
~[apache-cassandra-2.2.4.jar:2.2.4]

More complete log can be found here:

http://pastebin.com/n6DjCCed
http://pastebin.com/6rD5XNwU

I already did a nodetool scrub.

Any suggestions what is causing this?

Thanks in advance,
Jan

Re: Cassandra is consuming a lot of disk space

2016-01-13 Thread Jan Kesten

Hi Rahul,

just an idea, did you have a look at the data directorys on disk 
(/var/lib/cassandra/data)? It could be that there are some from old keyspaces 
that have been deleted and snapshoted before. Try something like "du -sh 
/var/lib/cassandra/data/*" to verify which keyspace is consuming your space.

Jan

Von meinem iPhone gesendet

> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh :
> 
> Thanks for your suggestion. 
> 
> Compaction was happening on one of the large tables. The disk space did not 
> decrease much after the compaction. So I ran an external compaction. The disk 
> space decreased by around 10%. However it is still consuming close to 750Gb 
> for load of 250Gb. 
> 
> I even restarted cassandra thinking there may be some open files. However it 
> didnt help much. 
> 
> Is there any way to find out why so much of data is being consumed? 
> 
> I checked if there are any open files using lsof. There are not any open 
> files.
> 
> Recovery:
> Just a wild thought 
> I am using replication factor of 2 and I have two nodes. If I delete complete 
> data on one of the node, will I be able to recover all the data from the 
> active node? 
> I don't want to pursue this path as I want to find out the root cause of the 
> issue! 
> 
> 
> Any help will be greatly appreciated
> 
> Thank you,
> 
> Rahul
> 
> 
> 
> 
> 
> 
>> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo  wrote:
>> You can check if the snapshot exists in the snapshot folder.
>> Repairs stream sstables over, than can temporary increase disk space. But I 
>> think Carlos Alonso might be correct. Running compactions might be the issue.
>> 
>> Regards,
>> 
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>  
>> Pythian - Love your data
>> 
>> rolo@pythian | Twitter: @cjrolo | Linkedin: linkedin.com/in/carlosjuzarterolo
>> Mobile: +351 91 891 81 00 | Tel: +1 613 565 8696 x1649
>> www.pythian.com
>> 
>>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso  wrote:
>>> I'd have a look also at possible running compactions.
>>> 
>>> If you have big column families with STCS then large compactions may be 
>>> happening.
>>> 
>>> Check it with nodetool compactionstats
>>> 
>>> Carlos Alonso | Software Engineer | @calonso
>>> 
>>>> On 13 January 2016 at 05:22, Kevin O'Connor  wrote:
>>>> Have you tried restarting? It's possible there's open file handles to 
>>>> sstables that have been compacted away. You can verify by doing lsof and 
>>>> grepping for DEL or deleted. 
>>>> 
>>>> If it's not that, you can run nodetool cleanup on each node to scan all of 
>>>> the sstables on disk and remove anything that it's not responsible for. 
>>>> Generally this would only work if you added nodes recently. 
>>>> 
>>>> 
>>>>> On Tuesday, January 12, 2016, Rahul Ramesh  wrote:
>>>>> We have a 2 node Cassandra cluster with a replication factor of 2. 
>>>>> 
>>>>> The load factor on the nodes is around 350Gb
>>>>> 
>>>>> Datacenter: Cassandra
>>>>> ==
>>>>> Address  RackStatus State   LoadOwns  
>>>>>   Token   
>>>>>   
>>>>>   -5072018636360415943
>>>>> 172.31.7.91  rack1   Up Normal  328.5 GB100.00%   
>>>>>   -7068746880841807701   
>>>>> 172.31.7.92  rack1   Up Normal  351.7 GB100.00%   
>>>>>   -5072018636360415943
>>>>> 
>>>>> However,if I use df -h, 
>>>>> 
>>>>> /dev/xvdf   252G  223G   17G  94% /HDD1
>>>>> /dev/xvdg   493G  456G   12G  98% /HDD2
>>>>> /dev/xvdh   197G  167G   21G  90% /HDD3
>>>>> 
>>>>> 
>>>>> HDD1,2,3 contains only cassandra data. It amounts to close to 1Tb in one 
>>>>> of the machine and in another machine it is close to 650Gb. 
>>>>> 
>>>>> I started repair 2 days ago, after running repair, the amount of disk 
>>>>> space consumption has actually increased. 
>>>>> I also checked if this is because of snapshots. nodetool listsnapshot 
>>>>> intermittently lists a snapshot but it goes away after sometime. 
>>>>> 
>>>>> Can somebody please help me understand, 
>>>>> 1. why so much disk space is consumed?
>>>>> 2. Why did it increase after repair?
>>>>> 3. Is there any way to recover from this state.
>>>>> 
>>>>> 
>>>>> Thanks,
>>>>> Rahul
>> 
>> 
>> --
>> 
>

Re: Cassandra is consuming a lot of disk space

2016-01-14 Thread Jan Kesten

Hi Rahul,

it should work as you would expect - simply copy over the sstables from
your extra disk to the original one. To minimize downtime of the node
you can do something like this:

- rsync the files while the node is still running (sstables are
immutable) to copy most of the data
- edit cassandra.yaml to remove the additional datadir
- shutdown the node
- rsync again (just for the case, a new sstable got written while the
first one was running)
- restart

HTH
Jan

Am 14.01.2016 um 08:38 schrieb Rahul Ramesh:
> One update. I cleared the snapshot using nodetool clearsnapshot command.
> Disk space is recovered now. 
> 
> Because of this issue, I have mounted one more drive to the server and
> there are some data files there. How can I migrate the data so that I
> can decommission the drive? 
> Will it work if I just copy all the contents in the table directory to
> one of the drives? 
> 
> Thanks for all the help.
> 
> Regards,
> Rahul
> 
> On Thursday 14 January 2016, Rahul Ramesh  <mailto:rr.ii...@gmail.com>> wrote:
> 
> Hi Jan,
> I checked it. There are no old Key Spaces or tables.
> Thanks for your pointer, I started looking inside the directories. I
> see lot of snapshots directory inside the table directory. These
> directories are consuming space.
> 
> However these snapshots are not shown  when I issue listsnapshots
> ./bin/nodetool listsnapshots
> Snapshot Details: 
> There are no snapshots
> 
> Can I safely delete those snapshots? why listsnapshots is not
> showing the snapshots? Also in future, how can we find out if there
>     are snapshots?
> 
> Thanks,
> Rahul
> 
> 
> 
> On Thu, Jan 14, 2016 at 12:50 PM, Jan Kesten  > wrote:
> 
> Hi Rahul,
> 
> just an idea, did you have a look at the data directorys on disk
> (/var/lib/cassandra/data)? It could be that there are some from
> old keyspaces that have been deleted and snapshoted before. Try
> something like "du -sh /var/lib/cassandra/data/*" to verify
> which keyspace is consuming your space.
> 
> Jan
> 
> Von meinem iPhone gesendet
> 
> Am 14.01.2016 um 07:25 schrieb Rahul Ramesh  >:
> 
>> Thanks for your suggestion. 
>>
>> Compaction was happening on one of the large tables. The disk
>> space did not decrease much after the compaction. So I ran an
>> external compaction. The disk space decreased by around 10%.
>> However it is still consuming close to 750Gb for load of 250Gb. 
>>
>> I even restarted cassandra thinking there may be some open
>> files. However it didnt help much. 
>>
>> Is there any way to find out why so much of data is being
>> consumed? 
>>
>> I checked if there are any open files using lsof. There are
>> not any open files.
>>
>> *Recovery:*
>> Just a wild thought 
>> I am using replication factor of 2 and I have two nodes. If I
>> delete complete data on one of the node, will I be able to
>> recover all the data from the active node? 
>> I don't want to pursue this path as I want to find out the
>> root cause of the issue! 
>>
>>
>> Any help will be greatly appreciated
>>
>> Thank you,
>>
>> Rahul
>>
>>
>>
>>
>>
>>
>> On Wed, Jan 13, 2016 at 3:37 PM, Carlos Rolo > > wrote:
>>
>> You can check if the snapshot exists in the snapshot folder.
>> Repairs stream sstables over, than can temporary increase
>> disk space. But I think Carlos Alonso might be correct.
>> Running compactions might be the issue.
>>
>> Regards,
>>
>> Carlos Juzarte Rolo
>> Cassandra Consultant
>>  
>> Pythian - Love your data
>>
>> rolo@pythian | Twitter: @cjrolo | Linkedin:
>> _linkedin.com/in/carlosjuzarterolo
>> <http://linkedin.com/in/carlosjuzarterolo>_
>> Mobile: +351 91 891 81 00
>>  | Tel: +1 613 565 8696
>> x1649 
>> www.pythian.com <http://www.pythian.com/>
>>
>> On Wed, Jan 13, 2016 at 9:24 AM, Carlos Alonso
>> > > wrote:
>>
>> I'd have a look also at possible running compactions.
>>
>>

Re: compaction throughput

2016-01-29 Thread Jan Karlsson

Keep in mind that compaction in LCS can only run 1 compaction per level. 
Even if it wants to run more compactions in L0 it might be blocked 
because it is already running a compaction in L0.


BR
Jan

On 01/16/2016 01:26 AM, Sebastian Estevez wrote:


LCS is IO ontensive but CPU is also relevant.

On slower disks compaction may not be cpu bound.

If you aren't seeing more than one compaction thread at a time, I 
suspect your system is not compaction bound.


all the best,

Sebastián

On Jan 15, 2016 7:20 PM, "Kai Wang" <mailto:dep...@gmail.com>> wrote:


Sebastian,

Because I have this impression that LCS is IO intensive and it's
recommended only on SSDs. So I am curious to see how far it can
stress those SSDs. But it turns out the most expensive part about
LCS is not IO bound but CUP bound, or more precisely single core
speed bound. This is a little surprising.

Of course LCS is still superior in other aspects.

On Jan 15, 2016 6:34 PM, "Sebastian Estevez"
mailto:sebastian.este...@datastax.com>> wrote:

Correct.

Why are you concerned with the raw throughput, are you
accumulating pending compactions? Are you seeing high sstables
per read statistics?

all the best,

Sebastián

On Jan 15, 2016 6:18 PM, "Kai Wang" mailto:dep...@gmail.com>> wrote:

Jeff & Sebastian,

Thanks for the reply. There are 12 cores but in my case C*
only uses one core most of the time. *nodetool
compactionstats* shows there's only one compactor running.
I can see C* process only uses one core. So I guess I
should've asked the question more clearly:

1. Is ~25 M/s a reasonable compaction throughput for one core?
2. Is there any configuration that affects single core
compaction throughput?
3. Is concurrent_compactors the only option to parallelize
compaction? If so, I guess it's the compaction strategy
itself that decides when to parallelize and when to block
on one core. Then there's not much we can do here.

Thanks.

On Fri, Jan 15, 2016 at 5:23 PM, Jeff Jirsa
mailto:jeff.ji...@crowdstrike.com>> wrote:

With SSDs, the typical recommendation is up to 0.8-1
compactor per core (depending on other load). How many
CPU cores do you have?


From: Kai Wang
Reply-To: "user@cassandra.apache.org
<mailto:user@cassandra.apache.org>"
Date: Friday, January 15, 2016 at 12:53 PM
To: "user@cassandra.apache.org
<mailto:user@cassandra.apache.org>"
Subject: compaction throughput

Hi,

I am trying to figure out the bottleneck of compaction
on my node. The node is CentOS 7 and has SSDs
installed. The table is configured to use LCS. Here is
my compaction related configs in cassandra.yaml:

compaction_throughput_mb_per_sec: 160
concurrent_compactors: 4

I insert about 10G of data and start observing compaction.

*nodetool compaction* shows most of time there is one
compaction. Sometimes there are 3-4 (I suppose this is
controlled by concurrent_compactors). During the
compaction, I see one CPU core is 100%. At that point,
disk IO is about 20-25 M/s write which is much lower
than the disk is capable of. Even when there are 4
compactions running, I see CPU go to +400% but disk IO
is still at 20-25M/s write. I use *nodetool
setcompactionthroughput 0* to disable the compaction
throttle but don't see any difference.

Does this mean compaction is CPU bound? If so 20M/s is
kinda low. Is there anyway to improve the throughput?

Thanks.

Re: Sudden disk usage

2016-02-13 Thread Jan Kesten

Hi,

what kind of compaction strategy do you use? What you are about to see is a 
compaction likely - think of 4 sstables of 50gb each, compacting those can take 
up 200g while rewriting the new sstable. After that the old ones are deleted 
and space will be freed again. 

If using SizeTieredCompaction you can end up with very huge sstables as I do 
(>250gb each). In the worst case you could possibly need twice the space - a 
reason why I set up my monitoring for disk to 45% usage.

Just my 2 cents.
Jan

Von meinem iPhone gesendet

> Am 13.02.2016 um 08:48 schrieb Branton Davis :
> 
> One of our clusters had a strange thing happen tonight.  It's a 3 node 
> cluster, running 2.1.10.  The primary keyspace has RF 3, vnodes with 256 
> tokens.
> 
> This evening, over the course of about 6 hours, disk usage increased from 
> around 700GB to around 900GB on only one node.  I was at a loss as to what 
> was happening and, on a whim, decided to run nodetool cleanup on the 
> instance.  I had no reason to believe that it was necessary, as no nodes were 
> added or tokens moved (not intentionally, anyhow).  But it immediately 
> cleared up that extra space.
> 
> I'm pretty lost as to what would have happened here.  Any ideas where to look?
> 
> Thanks!
>

Re: Forming a cluster of embedded Cassandra instances

2016-02-14 Thread Jan Kesten

Hi,

the embedded cassandra to speedup entering the project may will work for 
developers, we used it for junit. But a simple clone and maven build - I guess 
it will end in a single node cassandra cluster. Remember cassandra is a 
distributed database, one will need more than one node to get performance and 
fault tolerance. Also I would not recommend adding and removing of cluster 
nodes at high frequency with application start-stop-cycles.

To help in getting things up and running, provide a small readme for 
downloading and starting cassandra. For mac and linux unpacking the tar.gz and 
running cassandra.sh is not too complicated. Or use a hint to the DataStax 
Community Edition installers. Apart from installing Java that is a five minute 
stop to a single node "TestCluster".

Configuring a distributed setup is a bit more or a lot more difficult and 
definitly needs more understanding and planning. 

Just as a hint and offtopic: I saw people using cassandra as application glue 
for interprocess communication where every app server started a node (for 
communication, sessions and as queue and so on).  If that is eventually a use 
case - have a look at hazelcast. 

Jan

Von meinem iPhone gesendet

> Am 14.02.2016 um 23:26 schrieb John Sanda :
> 
> The motivation was to make it easy for someone to get up and running quickly 
> with the project. Clone the git repo, run the maven build, and then you are 
> all set. It definitely does lower the learning curve for someone just getting 
> started with a project and who is not really thinking about Cassandra. It 
> also is convenient for non-devs who need to quickly get the project up and 
> running. For development, we have people working on Linux, Mac OS X, and 
> Windows. I am not a Windows user and not even sure if ccm works on Windows, 
> so ccm can't be the de factor standard for development.
> 
>> On Sun, Feb 14, 2016 at 2:52 PM, Jack Krupansky  
>> wrote:
>> What motivated the use of an embedded instance for development - as opposed 
>> to simply spawning a process for Cassandra?
>> 
>> 
>> 
>> -- Jack Krupansky
>> 
>>> On Sun, Feb 14, 2016 at 2:05 PM, John Sanda  wrote:
>>> The project I work on day to day uses an embedded instance of Cassandra, 
>>> but it is intended for primarily for development. We embed Cassandra in a 
>>> WildFly (i.e., JBoss) server. It is packaged and deployed as an EAR. I 
>>> personally do not do this. I use and recommend ccm for development. If you 
>>> do you WildFly, there is also wildfly-cassandra which deploys Cassandra as 
>>> a custom WildFly extension. In other words it is deployed in WildFly like 
>>> other subsystems like EJB, web, etc, not like an application. There isn't a 
>>> whole lot of active development on this, but it could be another option.
>>> 
>>> For production, we have to support single node clusters (not embedded 
>>> though), and it has been challenging for pretty much all the reasons you 
>>> find people saying not to do so.
>>> 
>>> As for failure detection and cluster membership changes, are you using the 
>>> Datastax driver? You can register an event listener with the driver to 
>>> receive notifications for those things.
>>> 
>>>> On Sat, Feb 13, 2016 at 6:33 PM, Jonathan Haddad  
>>>> wrote:
>>>> +1 to what jack said. Don't mess with embedded till you understand the 
>>>> basics of the db. You're not making your system any less complex, I'd say 
>>>> you're most likely going to shoot yourself in the foot. 
>>>>> On Sat, Feb 13, 2016 at 2:22 PM Jack Krupansky  
>>>>> wrote:
>>>>> HA requires an odd number of replicas - 3, 5, 7 - so that split-brain can 
>>>>> be avoided. Two nodes would not support HA. You need to be able to reach 
>>>>> a quorum, which is defined as n/2+1 where n is the number of replicas. 
>>>>> IOW, you cannot update the data if a quorum cannot be reached. The data 
>>>>> on any given node needs to be replicated on at least two other nodes.
>>>>> 
>>>>> Embedded Cassandra is only for extremely sophisticated developers - not 
>>>>> those who are new to Cassandra, with a "superficial understanding".
>>>>> 
>>>>> As a general proposition, you should not be running application code on 
>>>>> Cassandra nodes.
>>>>> 
>>>>> That said, if any of the senior Cassandra developers wish to personally 
>>>>> support your efforts towards embedded clusters, they are

Re: Cassandra nodes reduce disks per node

2016-02-18 Thread Jan Kesten

Hi Branton,

two cents from me - I didnt look through the script, but for the rsyncs I do 
pretty much the same when moving them. Since they are immutable I do a first 
sync while everything is up and running to the new location which runs really 
long. Meanwhile new ones are created and I sync them again online, much less 
files to copy now. After that I shutdown the node and my last rsync now has to 
copy only a few files which is quite fast and so the downtime for that node is 
within minutes.

Jan



Von meinem iPhone gesendet

> Am 18.02.2016 um 22:12 schrieb Branton Davis :
> 
> Alain, thanks for sharing!  I'm confused why you do so many repetitive 
> rsyncs.  Just being cautious or is there another reason?  Also, why do you 
> have --delete-before when you're copying data to a temp (assumed empty) 
> directory?
> 
>> On Thu, Feb 18, 2016 at 4:12 AM, Alain RODRIGUEZ  wrote:
>> I did the process a few weeks ago and ended up writing a runbook and a 
>> script. I have anonymised and share it fwiw.
>> 
>> https://github.com/arodrime/cassandra-tools/tree/master/remove_disk
>> 
>> It is basic bash. I tried to have the shortest down time possible, making 
>> this a bit more complex, but it allows you to do a lot in parallel and just 
>> do a fast operation sequentially, reducing overall operation time.
>> 
>> This worked fine for me, yet I might have make some errors while making it 
>> configurable though variables. Be sure to be around if you decide to run 
>> this. Also I automated this more by using knife (Chef), I hate to repeat 
>> ops, this is something you might want to consider.
>> 
>> Hope this is useful,
>> 
>> C*heers,
>> -
>> Alain Rodriguez
>> France
>> 
>> The Last Pickle
>> http://www.thelastpickle.com
>> 
>> 2016-02-18 8:28 GMT+01:00 Anishek Agarwal :
>>> Hey Branton,
>>> 
>>> Please do let us know if you face any problems  doing this.
>>> 
>>> Thanks
>>> anishek
>>> 
>>>> On Thu, Feb 18, 2016 at 3:33 AM, Branton Davis 
>>>>  wrote:
>>>> We're about to do the same thing.  It shouldn't be necessary to shut down 
>>>> the entire cluster, right?
>>>> 
>>>>> On Wed, Feb 17, 2016 at 12:45 PM, Robert Coli  
>>>>> wrote:
>>>>> 
>>>>> 
>>>>>> On Tue, Feb 16, 2016 at 11:29 PM, Anishek Agarwal  
>>>>>> wrote:
>>>>>> To accomplish this can I just copy the data from disk1 to disk2 with in 
>>>>>> the relevant cassandra home location folders, change the cassanda.yaml 
>>>>>> configuration and restart the node. before starting i will shutdown the 
>>>>>> cluster.
>>>>> 
>>>>> Yes.
>>>>> 
>>>>> =Rob
>

Thrift composite partition key to cql migration

2016-03-30 Thread Jan Kesten


Hi,

while migrating the reminder of thrift operations in my application I 
came across a point where I cant find a good hint.


In our old code we used a composite with two strings as row / partition 
key and a similar composite as column key like this:


public Composite rowKey() {
final Composite composite = new Composite();
composite.addComponent(key1, StringSerializer.get());
composite.addComponent(key2, StringSerializer.get());
return composite;
}

public Composite columnKey() {
final Composite composite = new Composite();
composite.addComponent(key3, StringSerializer.get());
composite.addComponent(key4, StringSerializer.get());
return composite;
}

In cql this columnfamiliy looks like this:

CREATE TABLE foo.bar (
key blob,
column1 text,
column2 text,
value blob,
PRIMARY KEY (key, column1, column2)
)

For the columns key3 and key4 became column1 and column2 - but the old 
rowkey is presented as blob (I can put it into a hex editor and see that 
key1 and key2 values are in there).


Any pointers to handle this or is this a known issue? I am using now 
DataStax Java driver for CQL, old connector used thrift. Is there any 
way to get key1 and key2 back apart from completly rewriting the table? 
This is what I had expected it to be:


CREATE TABLE foo.bar (
key1 text,
key2 text,
column1 text,
column2 text,
value blob,
PRIMARY KEY ((key1, key2), column1, column2)
)

Cheers,
Jan

Re: NTP Synchronization Setup Changes

2016-03-30 Thread Jan Kesten

Hi Mickey,

I would strongly suggest to setup a NTP server on your site - this is not 
really a big deal and with some tutorials on the net done quickly. Then 
configure your cassandra nodes (and all the rest if you like) to use your ntp 
instead of public ones. As I have learned the hard way - cassandra is not 
really happy when nodes have different times in some cases.

Benefit of this is, that your nodes will keep time in sync even without 
connection to the internet. Of course "your time" may drift without a proper 
timesource or connection but all nodes will have the same drift and so no 
problems with consistency. If your ntp syncs your nodes will be adjusted 
smoothly.

Pro(?)-solution (what I did before): Attach a gps mouse to your ntp server and 
use that as time source. So you can have synchronized _and_ accurate time 
without any connection to public ntp servers as the gps satellites are flying 
atom clocks :)

Just my 2 cents,
Jan

Von meinem iPhone gesendet

> Am 31.03.2016 um 03:07 schrieb Mukil Kesavan :
> 
> Hi,
> 
> We run a 3 server cassandra cluster that is initially NTP synced to a single 
> physical server over LAN. This server does not have connectivity to the 
> internet for a few hours to sometimes even days. In this state we perform 
> some schema operations and reads/writes with QUORUM consistency.
> 
> Later on, the physical server has connectivity to the internet and we 
> synchronize its time to an external NTP server on the internet. 
> 
> Are there any issues if this causes a huge time correction on the cassandra 
> cluster? I know that NTP gradually corrects the time on all the servers. I 
> just wanted to understand if there were any corner cases that will cause us 
> to lose data/schema updates when this happens. In particular, we seem to be 
> having some issues around missing secondary indices at the moment (not all 
> but some).
> 
> Also, for our situation where we have to work with cassandra for a while 
> without internet connectivity, what is the preferred NTP architecture/steps 
> that have worked for you in the field?
> 
> Thanks,
> Micky

Re: Large primary keys

2016-04-11 Thread Jan Kesten


Hi Robert,

why do you need the actual text as a key? I sounds a bit unatural at 
least for me. Keep in mind that you cannot do "like" queries on keys in 
cassandra. For performance and keeping things more readable I would 
prefer hashing your text and use the hash as key.


You should also take into account to store the keys (hashes) in a 
seperate table per day / hour or something like that, so you can quickly 
get all keys for a time range. A query without the partition key may be 
very slow.


Jan

Am 11.04.2016 um 23:43 schrieb Robert Wille:

I have a need to be able to use the text of a document as the primary key in a 
table. These texts are usually less than 1K, but can sometimes be 10’s of K’s 
in size. Would it be better to use a digest of the text as the key? I have a 
background process that will occasionally need to do a full table scan and 
retrieve all of the texts, so using the digest doesn’t eliminate the need to 
store the text. Anyway, is it better to keep primary keys small, or is C* okay 
with large primary keys?

Robert

Re: Fwd: Cassandra Load spike

2016-04-15 Thread Jan Kesten


Hi,

you should check the "snapshot" directories on your nodes - it is very 
likely there are some old ones from failed operations taking up some space.


Am 15.04.2016 um 01:28 schrieb kavya:

Hi,

We are running a 6 node cassandra 2.2.4 cluster and we are seeing a 
spike in the disk Load as per the ‘nodetool status’ command that does 
not correspond with the actual disk usage. Load reported by nodetool 
was as high as 3 times actual disk usage on certain nodes.
We noticed that the periodic repair failed with below error on running 
the command : ’nodetool repair -pr’


ERROR [RepairJobTask:2] 2016-04-12 15:46:29,902 
RepairRunnable.java:243 - Repair session 
64b54d50-0100-11e6-b46e-a511fd37b526 for range 
(-3814318684016904396,-3810689996127667017] failed with error [….] 
Validation failed in /
org.apache.cassandra.exceptions.RepairException: [….] Validation 
failed in 
at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.4.jar:2.2.4]
at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) 
~[apache-cassandra-2.2.4.jar:2.2.4]
at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:410) 
~[apache-cassandra-2.2.4.jar:2.2.4]
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163) 
~[apache-cassandra-2.2.4.jar:2.2.4]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-2.2.4.jar:2.2.4]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_40]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_40]

at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40

We restarted all nodes in the cluster and ran a full repair which 
completed successfully without any validation errors, however we still 
see Load spike on the same nodes after a while. Please advice.


Thanks!

Replacing dead node when num_tokens is used

2013-03-05 Thread Jan Kesten


Hello,

while trying out cassandra I read about the steps necessary to replace a 
dead node. In my test cluster I used a setup using num_tokens instead of 
initial_tokens. How do I replace a dead node in this scenario?


Thanks,
Jan

Re: Replacing dead node when num_tokens is used

2013-03-05 Thread Jan Kesten


Hello Aaron,

thanks for your reply.

Found it just an hour ago on my own, yesterday I accidentally looked at 
the 1.0 docs. Right now my replacement node is streaming from the others 
- than more testing can follow.


Thanks again,
Jan

sstablesplit - status

2017-05-17 Thread Jan Kesten


Hi all,

I have some problem with really large sstables which dont get compacted 
anymore and I know there are many duplicated rows in them. Splitting the 
tables into smaller ones to get them compacted again would help I 
thought, so I tried sstablesplit, but:


cassandra@cassandra01 /tmp/cassandra $ 
./apache-cassandra-3.10/tools/bin/sstablesplit lb-388151-big-Data.db

Skipping non sstable file lb-388151-big-Data.db
No valid sstables to split
cassandra@cassandra01 /tmp/cassandra $ sstablesplit lb-388151-big-Data.db
Skipping non sstable file lb-388151-big-Data.db
No valid sstables to split

It seems that sstablesplit cant handle the "new" filename pattern 
anymore (acutally running 2.2.8 on those nodes).


Any hints or other suggestions to split those sstables or get rid of them?

Thanks in advance,
Jan

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: sstablesplit - status

2017-05-17 Thread Jan Kesten


Hi again,

and thanks for the input. It's not tombstoned data I think, but over a 
really long time very many rows are inserted over and over again - but 
with some significant pauses between the inserts. I found some examples 
where a specific row (for example pk=xyz, value=123) exists in more than 
one or two tables, with exactly the same content but different timestamps.


The largest sstables compacted a while ago are now 300-400G in size over 
some nodes, and it's very unlikely they will be compacted some time soon 
as there are only one or two sstables of that size on a single node.


I think I will try rebootstraping a node to see if that helps. 
sstablesplit exists in 2.x - but as far as I know is deprecated and in 
my 3.6 test-cluster it was gone.


I was trying sstabledump to have a deeper look - but that says "pre-3.0 
SSTabe is not supported" (fair, I am on a 2.2.8 cluster).


Jan


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Effect of frequent mutations / memtable

2017-05-25 Thread Jan Algermissen


Hi,

I am using a updates to a column with a ttl to represent a lock. The 
owning process keeps updating the lock's TTL as long as it is running. 
If the process crashes, the lock will timeout and be deleted. Then 
another process can take over.


I have used this pattern very successfully over years with TTLs in the 
order of tens of seconds.


Now I have a use case in mind that would require much smaller TTLs, e.g. 
1 or two seconds and I am worried about the increased number of 
mutations and possible effect on SSTables.


However: I'd assume these frequent updates on a cell to mostly happen in 
the memtable resulting in only occasional manifestation in SSTables.


Is that assumption correct and if so, what config parameters should I 
tweak to keep the memtable from being flushed for longer periods of 
time?



Jan

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Effect of frequent mutations / memtable

2017-05-25 Thread Jan Algermissen


Hi Jayesh,


On 25 May 2017, at 18:31, Thakrar, Jayesh wrote:


Hi Jan,

I would suggest looking at using Zookeeper for such a usecase.


thanks - yes, it is an alternative.

Out of curiosity: since both, Zk and C* implement Paxos to enable such 
kind of thing, why do you think Zookeeper would be a better fit?


Jan



See http://zookeeper.apache.org/doc/trunk/recipes.html for some 
examples.


Zookeeper is used for such purposes in Apache HBase (active master), 
Apache Kafka (active controller), Apache Hadoop, etc.


Look for the "Leader Election" usecase.
Examples
http://techblog.outbrain.com/2011/07/leader-election-with-zookeeper/
https://www.tutorialspoint.com/zookeeper/zookeeper_leader_election.htm

Its more/new work, but should be an elegant solution.

Hope that helps.
Jayesh

On 5/25/17, 9:19 AM, "Jan Algermissen"  
wrote:


Hi,

I am using a updates to a column with a ttl to represent a lock. 
The
owning process keeps updating the lock's TTL as long as it is 
running.

If the process crashes, the lock will timeout and be deleted. Then
another process can take over.

I have used this pattern very successfully over years with TTLs in 
the

order of tens of seconds.

Now I have a use case in mind that would require much smaller 
TTLs, e.g.

1 or two seconds and I am worried about the increased number of
mutations and possible effect on SSTables.

However: I'd assume these frequent updates on a cell to mostly 
happen in
the memtable resulting in only occasional manifestation in 
SSTables.


Is that assumption correct and if so, what config parameters 
should I
tweak to keep the memtable from being flushed for longer periods 
of

time?


Jan


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: Effect of frequent mutations / memtable

2017-05-26 Thread Jan Algermissen


Jonathan,

On 26 May 2017, at 17:00, Jonathan Haddad wrote:

If you have a small amount of hot data, enable the row cache. The 
memtable

is not designed to be a cache. You will not see a massive performance
impact of writing one to disk. Sstables will be in your page cache, 
meaning

you won't be hitting disk very often.


What I (and AFAIU Max, too) am concerned with is very frequent updates 
on certain cells and their impact on the amount of SSTables created.


Suppose I have a row that sees tens of thousands of mutations during the 
first minutes of its lifetime but isn't changed afterwards. The 
hope/assumption is that tuning C* can help having all those mutations 
take place in the memtable so we end up with only a single SSTable in 
the end (roughly speaking).


Besides such an exceptional case I'd consider high-frequent mutations an 
anti pattern due to the SSTables bloat.


Makes sense?

Jan





On Fri, May 26, 2017 at 7:41 AM Max C  wrote:

In my case, we're using Cassandra to store QA test data — so the 
pattern
is that we may do a bunch of updates within a few minutes / hours, 
and then

the data will essentially be read-only for the rest of its lifetime
(years).  My question is the same — do we need to worry about the
performance impact of having N mutations written to the SSTable — 
or will

these mutations generally be constrained to the mem table?

- Max


Hi,

I am using a updates to a column with a ttl to represent a lock. The
owning process keeps updating the lock's TTL as long as it is 
running. If
the process crashes, the lock will timeout and be deleted. Then 
another

process can take over.


I have used this pattern very successfully over years with TTLs in 
the

order of tens of seconds.


Now I have a use case in mind that would require much smaller TTLs, 
e.g.
1 or two seconds and I am worried about the increased number of 
mutations

and possible effect on SSTables.


However: I'd assume these frequent updates on a cell to mostly 
happen in

the memtable resulting in only occasional manifestation in SSTables.


Is that assumption correct and if so, what config parameters should 
I
tweak to keep the memtable from being flushed for longer periods of 
time?


-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org






-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

How to know when repair repaired something?

2017-05-29 Thread Jan Algermissen


Hi,

is it possible to extract from repair logs the writetime of the writes 
that needed to be repaired?


I have some processes I would like to re-trigger from a time point if 
repair found problems.


Is that useful? Possible?

Jan

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Re: How to know when repair repaired something?

2017-05-30 Thread Jan Algermissen



On 30 May 2017, at 21:11, Varun Gupta wrote:


I am missing the point, why do you want to re-trigger the process post
repair. Repair will sync the data correctly.


Sorry - I mis-represented  that. I want to trigger something else, not 
repair.


I am investigating a CQRS/Event Sourced pattern which C* as a 
distributed event log and a process reading from that log, changing 
state in other data bases (Solr, Graph-DB, other C* tables, etc.)


Since I do not want to write to/read from the commit log with 
EACH_QUORUM or LOCAL_QUORUM it could happen that the process processing 
the event log misses an event that only later pops up during repair.


What that happens, I'd like to re-process the log (my processing is 
idempotent, so it can just go again).


This is why I was looking for a way to learn that a repair has actually 
repaired something.



Jan



On Mon, May 29, 2017 at 8:07 AM, Jan Algermissen 

wrote:



Hi,

is it possible to extract from repair logs the writetime of the 
writes

that needed to be repaired?

I have some processes I would like to re-trigger from a time point if
repair found problems.

Is that useful? Possible?

Jan

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org






-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Write / read cost of *QUORUM

2017-06-18 Thread Jan Algermissen


Hi,

my understanding is that

- for writes using any of the quorum CLs will not put more overall load 
on the cluster because writes will be sent to all nodes responsible for 
a partition anyhow. So quorum only increases response time of the 
coordinator, not cluster load.


Correct?

- for reads all quorum CLs will yield more requests sent by the 
coordinator to other nodes and hence *QUORUM reads definitely increase 
cluster load. (And of course response time of the coordinator, too).


Correct?

Jan

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Tolerable amount of CAS queries?

2017-07-21 Thread Jan Algermissen


Hi,

I just read [1] which describes a lease implementation using CAS 
queries. It applies a TTL to the lease which needs to be refreshed 
periodically by the lease holder.


I use such a pattern myself since a couple of years, so no surprise 
there.


However, the article uses CAS queries not only for acquiring the lease, 
but also for TTL updates and active lease canceling. Given a default TTL 
of three minutes is described, the amount of CAS queries might be ok.


What if I go for much shorter TTLs, eg 5 seconds to minimise the time 
another peer takes over if the current lease owner crashes or is 
stopped? Using some safety margin for updating the TTL, we'd end up with 
a CAS query every 3 seconds or so. If we have a bunch of such leases, 
we'd likely see 10 or more such CAS queries a second.


I am looking for advice whether such a high number of CAS queries could 
be tolerable at all? I'd assume there is not much contention on the same 
lease, is the overhead of a CAS query basically that it leads to 4 or 
sometimes significantly more 'queries' in the C* cluster?


IOW, suppose I

- have a cluster spanning geographic regions
- restrict the CAS queries to key spaces that are only replicated in a 
single region and I use LOCAL_SERIAL CL


would 100 CAS queries per second that in the normal case do not conflict 
(== work in different partition keys) be sort of 'ok'?


Or should it rather be in the range of 10/s?

Jan


[1] https://www.datastax.com/dev/blog/consensus-on-cassandra

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org

Reducing tombstones impact in queue access patterns through rolling shards?

2014-08-28 Thread Jan Algermissen

Hi,

I just came across this recipe by Netflix, that addresses the impact of 
tombstones in queue access patterns with a time based rolling shard to allow 
compaction to happen in one shard while the other is ‘busy’. (At least this is 
what understand from the intro)

https://github.com/Netflix/astyanax/wiki/Message-Queue

Has anyone adopted such a pattern and can share experience?

Jan

Re: Scala driver

2014-08-31 Thread Jan Algermissen

Hi Gary,

On 31 Aug 2014, at 07:19, Gary Zhao  wrote:

> Hi
> 
> Could you recommend a Scala driver and share your experiences of using it. Im 
> thinking if i use java driver in Scala directly
> 
> 

I am using Martin’s approach without any problems:

https://github.com/magro/play2-scala-cassandra-sample

The actual mapping from Java to Scala futures for the async case is in 

https://github.com/magro/play2-scala-cassandra-sample/blob/master/app/models/Utils.scala

HTH,

Jan



> Thanks

Re: Concurrents deletes and updates

2014-09-17 Thread Jan Algermissen

On 17 Sep 2014, at 20:55, Sávio S. Teles de Oliveira  
wrote:

> I'm using the Cassandra 2.0.9 with JAVA datastax driver.
> I'm running the tests in a cluster with 3 nodes, RF=3 and CL=ALL for each 
> operation.
> 
> I have a Column family filled with some keys (for example 'a' and 'b').
> When this keys are deleted and inserted hereafter, sporadically this keys 
> disappear. 

Could it be that the delete and insert have the same timestamp? Are you using 
batched queries maybe? In my current project a team experienced similar 
behavior during automated tests

If you delete with T1 and insert with T1 the delete wins, which was the reason 
in our case.

You might want to test this with client provided timestamps and make sure the 
insert has a T_insert > T_delete

Jan

> 
> Is it a bug on Cassandra or on Datastax driver?
> Any suggestions?
> 
> Tks

Exploring Simply Queueing

2014-10-05 Thread Jan Algermissen

Hi,

I have put together some thoughts on realizing simple queues with Cassandra.

https://github.com/algermissen/cassandra-ruby-queue

The design is inspired by (the much more sophisticated) Netfilx approach[1] but 
very reduced.

Given that I am still a C* newbie, I’d be very glad to hear some thoughts on 
the design path I took.

Jan

[1] https://github.com/Netflix/astyanax/wiki/Message-Queue

Re: Exploring Simply Queueing

2014-10-05 Thread Jan Algermissen

Chris,

thanks for taking a look.

On 06 Oct 2014, at 04:44, Chris Lohfink  wrote:

> It appears you are aware of the tombstones affect that leads people to label 
> this an anti-pattern.  Without "due" or any time based value being part of 
> the partition key means you will still get a lot of buildup.  You only have 1 
> partition per shard which just linearly decreases the tombstones.  That isn't 
> likely to be enough to really help in a situation of high queue throughput, 
> especially with the default of 4 shards. 

Yes, dealing with the tombstones effect is the whole point. The work loads I 
have to deal with are not really high throughput, it is unlikely we’ll ever 
reach multiple messages per second.The emphasis is also more on coordinating 
producer and consumer than on high volume capacity problems.

Your comment seems to suggest to include larger time frames (e.g. the due-hour) 
in the partition keys and use the current time to select the active partitions 
(e.g. the shards of the hour). Once an hour has passed, the corresponding 
shards will never be touched again.

Am I understanding this correctly?

> 
> You may want to consider switching to LCS from the default STCS since 
> re-writing to same partitions a lot. It will still use STCS in L0 so in high 
> write/delete scenarios, with low enough gc_grace, when it never gets higher 
> then L1 it will be sameish write throughput. In scenarios where you get more 
> LCS will shine I suspect by reducing number of obsolete tombstones.  Would be 
> hard to identify difference in small tests I think.

Thanks, I’ll try to explore the various effects

> 
> Whats the plan to prevent two consumers from reading same message off of a 
> queue?  You mention in docs you will address it at a later point in time but 
> its kinda a biggy.  Big lock & batch reads like astyanax recipe?

I have included a static column per shard to act as a lock (the ’lock’ column 
in the examples) in combination with conditional updates.

I must admit, I have not quite understood what Netfix is doing in terms of 
coordination - but since performance isn’t our concern, CAS should do fine, I 
guess(?)

Thanks again,

Jan


> 
> ---
> Chris Lohfink
> 
> 
> On Oct 5, 2014, at 6:03 PM, Jan Algermissen  
> wrote:
> 
>> Hi,
>> 
>> I have put together some thoughts on realizing simple queues with Cassandra.
>> 
>> https://github.com/algermissen/cassandra-ruby-queue
>> 
>> The design is inspired by (the much more sophisticated) Netfilx approach[1] 
>> but very reduced.
>> 
>> Given that I am still a C* newbie, I’d be very glad to hear some thoughts on 
>> the design path I took.
>> 
>> Jan
>> 
>> [1] https://github.com/Netflix/astyanax/wiki/Message-Queue
>

Re: Exploring Simply Queueing

2014-10-06 Thread Jan Algermissen

Shane,

On 06 Oct 2014, at 16:34, Shane Hansen  wrote:

> Sorry if I'm hijacking the conversation, but why in the world would you want
> to implement a queue on top of Cassandra? It seems like using a proper 
> queuing service
> would make your life a lot easier.

Agreed - however, the use case simply does not justify the additional 
operations.

> 
> That being said, there might be a better way to play to the strengths of C*. 
> Ideally everything you do
> is append only with few deletes or updates. So an interesting way to 
> implement a queue might be
> to do one insert to put the job in the queue and another insert to mark the 
> job as done or in process
> or whatever. This would also give you the benefit of being able to replay the 
> state of the queue.

Thanks, I’ll try that, too.

Jan


> 
> 
> On Mon, Oct 6, 2014 at 12:57 AM, Jan Algermissen  
> wrote:
> Chris,
> 
> thanks for taking a look.
> 
> On 06 Oct 2014, at 04:44, Chris Lohfink  wrote:
> 
> > It appears you are aware of the tombstones affect that leads people to 
> > label this an anti-pattern.  Without "due" or any time based value being 
> > part of the partition key means you will still get a lot of buildup.  You 
> > only have 1 partition per shard which just linearly decreases the 
> > tombstones.  That isn't likely to be enough to really help in a situation 
> > of high queue throughput, especially with the default of 4 shards.
> 
> Yes, dealing with the tombstones effect is the whole point. The work loads I 
> have to deal with are not really high throughput, it is unlikely we’ll ever 
> reach multiple messages per second.The emphasis is also more on coordinating 
> producer and consumer than on high volume capacity problems.
> 
> Your comment seems to suggest to include larger time frames (e.g. the 
> due-hour) in the partition keys and use the current time to select the active 
> partitions (e.g. the shards of the hour). Once an hour has passed, the 
> corresponding shards will never be touched again.
> 
> Am I understanding this correctly?
> 
> >
> > You may want to consider switching to LCS from the default STCS since 
> > re-writing to same partitions a lot. It will still use STCS in L0 so in 
> > high write/delete scenarios, with low enough gc_grace, when it never gets 
> > higher then L1 it will be sameish write throughput. In scenarios where you 
> > get more LCS will shine I suspect by reducing number of obsolete 
> > tombstones.  Would be hard to identify difference in small tests I think.
> 
> Thanks, I’ll try to explore the various effects
> 
> >
> > Whats the plan to prevent two consumers from reading same message off of a 
> > queue?  You mention in docs you will address it at a later point in time 
> > but its kinda a biggy.  Big lock & batch reads like astyanax recipe?
> 
> I have included a static column per shard to act as a lock (the ’lock’ column 
> in the examples) in combination with conditional updates.
> 
> I must admit, I have not quite understood what Netfix is doing in terms of 
> coordination - but since performance isn’t our concern, CAS should do fine, I 
> guess(?)
> 
> Thanks again,
> 
> Jan
> 
> 
> >
> > ---
> > Chris Lohfink
> >
> >
> > On Oct 5, 2014, at 6:03 PM, Jan Algermissen  
> > wrote:
> >
> >> Hi,
> >>
> >> I have put together some thoughts on realizing simple queues with 
> >> Cassandra.
> >>
> >> https://github.com/algermissen/cassandra-ruby-queue
> >>
> >> The design is inspired by (the much more sophisticated) Netfilx 
> >> approach[1] but very reduced.
> >>
> >> Given that I am still a C* newbie, I’d be very glad to hear some thoughts 
> >> on the design path I took.
> >>
> >> Jan
> >>
> >> [1] https://github.com/Netflix/astyanax/wiki/Message-Queue
> >
> 
>

Re: Exploring Simply Queueing

2014-10-06 Thread Jan Algermissen

Robert,

On 06 Oct 2014, at 17:50, Robert Coli  wrote:

> In theory they can also be designed such that history is not infinite, which 
> mitigates the buildup of old queue state.
> 

Hmm, I was under the impression that issues with old queue state disappear 
after gc_grace_seconds and that the goal primarily is to keep the rows ‘short’ 
enough to achieve a tombstones read performance impact that one can live with 
in a given use case.

Is that understanding wrong?

Jan

Re: Exploring Queueing

2014-10-12 Thread Jan Algermissen

Hi all,

thanks again for the comments.

I have created an (improved?) design, this time using dedicated consumers per 
shard and time-based row expire, hence without immediate deletes.

https://github.com/algermissen/cassandra-ruby-sharded-workers

As before, comments are welcome.

Jan

On 06 Oct 2014, at 22:50, Robert Coli  wrote:

> On Mon, Oct 6, 2014 at 1:40 PM, Jan Algermissen  
> wrote:
> Hmm, I was under the impression that issues with old queue state disappear 
> after gc_grace_seconds and that the goal primarily is to keep the rows 
> ‘short’ enough to achieve a tombstones read performance impact that one can 
> live with in a given use case.
> 
> The design I pasted does a link to does not include specifics regarding 
> pruning old history. Yes, you can just delete it, if your system design 
> doesn't require replay from the start.
> 
> =Rob
>

high context switches

2014-11-21 Thread Jan Karlsson

Hello,

We are running a 3 node cluster with RF=3 and 5 clients in a test environment. 
The C* settings are mostly default. We noticed quite high context switching 
during our tests. On 100 000 000 keys/partitions we averaged around 260 000 cs 
(with a max of 530 000).

We were running 12 000~ transactions per second. 10 000 reads and 2000 updates.

Nothing really wrong with that however I would like to understand why these 
numbers are so high. Have others noticed this behavior? How much context 
switching is expected and why? What are the variables that affect this?

/J

RE: high context switches

2014-11-24 Thread Jan Karlsson

We use CQL with 1 session per client and default connection settings.

I do not think that we are using too many client threads. Number of native 
transport threads is set to default (max 128).

From: Robert Coli [mailto:rc...@eventbrite.com]
Sent: den 21 november 2014 19:30
To: user@cassandra.apache.org
Subject: Re: high context switches

On Fri, Nov 21, 2014 at 1:21 AM, Jan Karlsson 
mailto:jan.karls...@ericsson.com>> wrote:
Nothing really wrong with that however I would like to understand why these 
numbers are so high. Have others noticed this behavior? How much context 
switching is expected and why? What are the variables that affect this?

I +1 Nikolai's conjecture that you are probably using a very high number of 
client threads.

However as a general statement Cassandra is highly multi-threaded. Threads are 
assigned within thread pools and these thread pools can be thought of as a type 
of processing pipeline, such that one is often the input to another. When 
pushing Cassandra near its maximum capacity, you will therefore spend a lot of 
time switching between threads.

=Rob
http://twitter.com/rcolidba

Re: Cassandra schema migrator

2014-11-25 Thread Jan Kesten


Hi Jens,

maybe you should have a look at mutagen for cassandra:

https://github.com/toddfast/mutagen-cassandra

It is a litte quiet around this for some months, but maybe still worth it.

Cheers,
Jan

Am 25.11.2014 um 10:22 schrieb Jens Rantil:

Hi,

Anyone who is using, or could recommend, a tool for versioning 
schemas/migrating in Cassandra? My list of requirements is:

 * Support for adding tables.
 * Support for versioning of table properties. All our tables are to 
be defaulted to LeveledCompactionStrategy.

 * Support for adding non-existing columns.
 * Optional: Support for removing columns.
 * Optional: Support for removing tables.

We are preferably a Java shop, but could potentially integrate 
something non-Java. I understand I could write a tool that would make 
these decisions using system.schema_columnfamilies and 
system.schema_columns, but as always reusing a proven tool would be 
preferable.


So far I only know of Spring Data Cassandra that handles creating 
tables and adding columns. However, it does not handle table 
properties in any way.


Thanks,
Jens

——— Jens Rantil Backend engineer Tink AB Email: jens.ran...@tink.se 
Phone: +46 708 84 18 32 Web: www.tink.se Facebook Linkedin Twitter

sstablemetadata and sstablerepairedset not working with DSC on Debian

2014-12-18 Thread Jan Kesten


Hi,

while curious on the new incremental repairs I updated our cluster to C* 
version 2.1.2 via the Debian apt-repository. Everything went quite well, 
but trying to start the tools sstablemetadata and sstablerepairedset 
lead to the following error:


root@a01:/home/ifjke# sstablerepairedset
Error: Could not find or load main class 
org.apache.cassandra.tools.SSTableRepairedAtSetter

root@a01:/home/ifjke#

Looking at the scripts starting these tools I found that the java 
classpath is build via


for jar in `dirname $0`/../../lib/*.jar; do
CLASSPATH=$CLASSPATH:$jar
done

Because of the scripts beeing located in /usr/bin/ this leads to search 
for libs in /lib. Obviously there are no java or cassandra libraries 
there - nodetool instead uses a different way:


if [ "x$CASSANDRA_INCLUDE" = "x" ]; then
for include in "`dirname "$0"`/cassandra.in.sh" \
   "$HOME/.cassandra.in.sh" \
   /usr/share/cassandra/cassandra.in.sh \
/usr/local/share/cassandra/cassandra.in.sh \
   /opt/cassandra/cassandra.in.sh; do
if [ -r "$include" ]; then
. "$include"
break
fi
done
elif [ -r "$CASSANDRA_INCLUDE" ]; then
. "$CASSANDRA_INCLUDE"
fi

I created a simple patch which works for both sstablemetadata and 
sstablerepairedset for me, but maybe that's worth sharing it:


---SNIP---

--- sstablerepairedset2014-11-11 15:50:02.0 +
+++ sstablerepairedset_new2014-12-18 07:52:26.967368891 +
@@ -16,22 +16,19 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.

-if [ "x$CLASSPATH" = "x" ]; then
-
-# execute from the build dir.
-if [ -d `dirname $0`/../../build/classes ]; then
-for directory in `dirname $0`/../../build/classes/*; do
-CLASSPATH=$CLASSPATH:$directory
-done
-else
-if [ -f `dirname $0`/../lib/stress.jar ]; then
-CLASSPATH=`dirname $0`/../lib/stress.jar
+if [ "x$CASSANDRA_INCLUDE" = "x" ]; then
+for include in "`dirname "$0"`/cassandra.in.sh" \
+   "$HOME/.cassandra.in.sh" \
+   /usr/share/cassandra/cassandra.in.sh \
+ /usr/local/share/cassandra/cassandra.in.sh \
+   /opt/cassandra/cassandra.in.sh; do
+if [ -r "$include" ]; then
+. "$include"
+break
 fi
-fi
-
-for jar in `dirname $0`/../../lib/*.jar; do
-CLASSPATH=$CLASSPATH:$jar
     done
+elif [ -r "$CASSANDRA_INCLUDE" ]; then
+. "$CASSANDRA_INCLUDE"
 fi

 # Use JAVA_HOME if set, otherwise look for java in PATH


---SNIP---

Worked for me on both tools.

Jan

Re: Replacing nodes disks

2014-12-18 Thread Jan Kesten


Hi Or,

I did some sort of this a while ago. If your machines do have a free 
disk slot - just put another disk there and use it as another 
data_file_directory.


If not - as in my case:

- grab an usb dock for disks
- put the new one in there, plug in, format, mount to /mnt etc.
- I did an online rsync from /var/lib/cassandra/data to /mnt
- after that, bring cassandra down
- do another rsync from /var/lib/cassandra/data to /mnt (should be 
faster, as sstables do not change, minimizes downtime)

- if you need adjust /etc/fstab if needed
- shutdown the node
- swap disks
- power on the node
- everything should be fine ;-)

Of course you will need a replication factor > 1 for this to work ;-)

Just my 2 cents,
Jan

rsync the full contents there,

Am 18.12.2014 um 16:17 schrieb Or Sher:

Hi all,

We have a situation where some of our nodes have smaller disks and we 
would like to align all nodes by replacing the smaller disks to bigger 
ones without replacing nodes.
We don't have enough space to put data on / disk and copy it back to 
the bigger disks so we would like to rebuild the nodes data from other 
replicas.


What do you think should be the procedure here?

I'm guessing it should be something like this but I'm pretty sure it's 
not enough.

1. shutdown C* node and server.
2. replace disks + create the same vg lv etc.
3. start C* (Normally?)
4. nodetool repair/rebuild?
*I think I might get some consistency issues for use cases relying on 
Quorum reads and writes for strong consistency.

What do you say?

Another question is (and I know it depends on many factors but I'd 
like to hear an experienced estimation): How much time would take to 
rebuild a 250G data node?


Thanks in advance,
Or.

--
Or Sher

Re: Replacing nodes disks

2014-12-22 Thread Jan Kesten

Hi,

even if recovery like a dead node would work - backup and restore (like
my way with an usb docking station) will be much faster and produce less
IO and CPU impact on your cluster.

Keep that in Mind :-)

Cheers,
Jan

Am 22.12.2014 um 10:58 schrieb Or Sher:

Great. replace_address works great.
From some reason I thought it won't work with the same IP.

On Sun, Dec 21, 2014 at 5:14 PM, Ryan Svihla <mailto:rsvi...@datastax.com>> wrote:

Cassandra is designed to rebuild a node from other nodes, whether
a node is dead by your hand because you killed it or fate is
irrelevant, the process is the same, a "new node" can be the same
hostname and ip or it can have totally different ones.

On Sun, Dec 21, 2014 at 6:01 AM, Or Sher mailto:or.sh...@gmail.com>> wrote:

If I'll use the replace_address parameter with the same IP
address, would that do the job?

On Sun, Dec 21, 2014 at 11:20 AM, Or Sher mailto:or.sh...@gmail.com>> wrote:

What I want to do is kind of replacing a dead node -

http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_replace_node_t.html

But replacing it with a clean node with the same IP and
hostname.

On Sun, Dec 21, 2014 at 9:53 AM, Or Sher
mailto:or.sh...@gmail.com>> wrote:

Thanks guys.
I have to replace all data disks, so I don't have
another large enough local disk to move the data to.
If I'll have no choice, I will backup the data before
on some other node or something, but I'd like to avoid it.
I would really love letting Cassandra do it thing and
rebuild itself.
Did anybody handled such cases that way (Letting
Cassandra rebuild it's data?)
Although there are no documented procedure for it, It
should be possible right?

On Fri, Dec 19, 2014 at 8:41 AM, Jan Kesten
mailto:j.kes...@enercast.de>>
wrote:

Hi Or,

I did some sort of this a while ago. If your
machines do have a free disk slot - just put
another disk there and use it as another
data_file_directory.

If not - as in my case:

- grab an usb dock for disks
- put the new one in there, plug in, format, mount
to /mnt etc.
- I did an online rsync from
/var/lib/cassandra/data to /mnt
- after that, bring cassandra down
- do another rsync from /var/lib/cassandra/data to
/mnt (should be faster, as sstables do not change,
minimizes downtime)
- if you need adjust /etc/fstab if needed
- shutdown the node
- swap disks
- power on the node
- everything should be fine ;-)

Of course you will need a replication factor > 1
for this to work ;-)

Just my 2 cents,
Jan

rsync the full contents there,

Am 18.12.2014 um 16:17 schrieb Or Sher:

Hi all,

We have a situation where some of our nodes
have smaller disks and we would like to align
all nodes by replacing the smaller disks to
bigger ones without replacing nodes.
We don't have enough space to put data on /
disk and copy it back to the bigger disks so
we would like to rebuild the nodes data from
other replicas.

What do you think should be the procedure here?

I'm guessing it should be something like this
but I'm pretty sure it's not enough.
1. shutdown C* node and server.
2. replace disks + create the same vg lv etc.
3. start C* (Normally?)
4. nodetool repair/rebuild?
*I think I might get some consistency issues
for use cases relying on Quorum reads and
writes for strong consistency.
What do you say?

Another question is (and I know it depends on
many factors but I'd like to hear an
experienced estimation): How much time would
tak

Repair producing validation failed errors regularly

2015-01-13 Thread Jan Karlsson

at 
org.apache.cassandra.db.compaction.CompactionManager.doValidationCompaction(CompactionManager.java:930)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.db.compaction.CompactionManager.access$600(CompactionManager.java:97)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at 
org.apache.cassandra.db.compaction.CompactionManager$9.call(CompactionManager.java:557)
 ~[apache-cassandra-2.1.1.jar:2.1.1]
at java.util.concurrent.FutureTask.run(FutureTask.java:262) 
~[na:1.7.0_51]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) 
~[na:1.7.0_51]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) 
[na:1.7.0_51]
at java.lang.Thread.run(Thread.java:744) [na:1.7.0_51]

BR
Jan

Re: Nodetool clearsnapshot

2015-01-13 Thread Jan Kesten


Hi,

I have read that snapshots are basicaly symlinks and they do not take 
that much space.
Why if I run nodetool clearsnapshot it frees a lot of space? I am 
seeing GBs freed...


both together makes sense. Creating a snaphot just creates links for all 
files unter the snapshot directory. This is very fast and takes no 
space. But those links are hard links, not symbolic ones.


After a while your running cluster will compact some of its sstables and 
writing it to a new one as deleting the old ones. Now for example you 
had SSTable1..4 and a snapshot with the links to those four after 
compaction you will have one active SSTable5 which is newly written and 
consumes space. The snapshot-linked ones are still there, still 
consuming their space. Only when this snapshot is cleared you get your 
disk space back.


HTH,
Jan

Re: Many really small SSTables

2015-01-15 Thread Jan Kesten


Hi Eric and all,

I almost expected this kind answer. I did a nodetool compactionstats 
already to see if those sstables are beeing compacted, but on all nodes 
there are 0 outstanding compactions (right now in the morning, not 
running any tests on this cluster).


The reported read latency is about 1-3ms and on nodes which have many 
sstables (new highscore are ~18k sstables). The 99% percentile is about 
30-40 micros and a cell count of about 80-90 (if I got the docs right 
these are the number of sstables accessed, that changed from 2.0 to 2.1 
I think as I see this only on testing cluster).


I looks to me that compactions were not triggered. I tried a nodetool 
compact on one node overnight - but that crashed the entire node.


Roland

Am 15.01.2015 um 19:14 schrieb Eric Stevens:
Yes, many sstables can have a huge negative impact read performance, 
and will also create memory pressure on that node.


There are a lot of things which can produce this effect, and it 
strongly also suggests you're falling behind on compaction in general 
(check nodetool compactionstats, you should have <5 
outstanding/pending, preferably 0-1).  To see whether and how much it 
is impacting your read performance, check nodetool cfstats 
 and nodetool cfhistograms  .



On Thu, Jan 15, 2015 at 2:11 AM, Roland Etzenhammer 
mailto:r.etzenham...@t-online.de>> wrote:


Hi,

I'm testing around with cassandra fair a bit, using 2.1.2 which I
know has some major issues,but it is a test environment. After
some bulk loading, testing with incremental repairs and running
out of heap once I found that now I have a quit large number of
sstables which are really small:

<1k  0  0,0%
<10k  2780 76,8%
<100k 3392 93,7%
<1000k3461 95,6%
<1k   3471 95,9%
<10k  3517 97,1%
<100k 3596 99,3%
all   3621100,0%

76,8% of all sstables in this particular column familiy are
smaller that 10kB, 93.7% are smaller then 100kB.

Just for my understanding - does that impact performance? And is
there any way to reduce the number of sstables? A full run of
nodetool compact is running for a really long time (more than 1day).

Thanks for any input,
Roland




--
i.A. Jan Kesten Systemadministration enercast GmbH Friedrich - Ebert - 
Straße 104 D–34119 Kassel Tel.: +49 561 / 4739664-0 Fax: 
(+49)561/4739664-9 mailto: j.kes...@enercast.de http://www.enercast.de 
AG Kassel HRB 15471 Thomas Landgraf Geschäftsführer 
t.landg...@enercast.de Tel.: (+49)561/4739664-0 FAX: -9 Mobil: 
(+49)172/6565087 enercast GmbH Friedrich-Ebert-Str. 104 D-34119 Kassel 
HRB15471 http://www.enercast.de Online-Prognosen für erneuerbare 
Energien Geschäftsführung: Thomas Landgraf (CEO), Bernd Kratz (CTO), 
Philipp Rinder (CSO) Diese E-Mail und etwaige Anhänge können 
vertrauliche und/oder rechtlich geschützte Informationen enthalten. 
Falls Sie nicht der angegebene Empfänger sind oder falls diese E-Mail 
irrtümlich an Sie adressiert wurde, benachrichtigen Sie uns bitte sofort 
durch Antwort-E-Mail und löschen Sie diese E-Mail nebst etwaigen Anlagen 
von Ihrem System. Ebenso dürfen Sie diese E-Mail oder ihre Anlagen nicht 
kopieren oder an Dritte weitergeben. Vielen Dank. This e-mail and any 
attachment may contain confidential and/or privileged information. If 
you are not the named addressee or if this transmission has been 
addressed to you in error, please notify us immediately by reply e-mail 
and then delete this e-mail and any attachment from your system. Please 
understand that you must not copy this e-mail or any attachment or 
disclose the contents to any other person. Thank you for your cooperation.

Re: Node joining take a long time

2015-02-20 Thread Jan Kesten


Hi,

a short hint for those upgrading: If you upgrade to 2.1.3 - there is a 
bug in the config builder when rpc_interface is used.  If you use 
rpc_address in your cassandra.yaml you will be fine - I ran into it this 
morning and filed an issue for it.


https://issues.apache.org/jira/browse/CASSANDRA-8839

Jan

Re: Node stuck in joining the ring

2015-02-26 Thread Jan Kesten


Hi Batranut,

apart from the other suggestions - do you have ntp running on all your 
cluster nodes and are times in sync?


Jan

Strange Sizes after 2.1.3 upgrade

2015-03-03 Thread Jan Kesten


Hi,

I found something strange this morning on our secondary cluster. I 
upgraded to 2.1.3 - hoping for incremental repairs to work - recently 
and this morning OpsCenter showed me disk usages to be very unequal. 
Most irritating is that some nodes show data sizes of > 3TB on one node, 
but they have only 3 TB drives. I made a screenshot.


https://www.dropbox.com/s/0qhbpm1znwd07rj/strange_sizes.png?dl=0

Did this occur somewhere else? Maybe it is totally unrelated to 2.1.3 
upgrade.


Thanks for any pointers,
Jan

RE: Read Repair in cassandra

2015-04-07 Thread Jan Karlsson

The request would return with the latest data.

The read request would fire against node 1 and node 3. The coordinator would 
get answers from both and would merge the answers and return the latest.

Then read repair might run to update node 3.

QUORUM does not take into consideration whether an answer is the latest or not. 
It just makes sure a QUORUM of nodes reply.

From: ankit tyagi [mailto:ankittyagi.mn...@gmail.com]
Sent: April 08, 2015 6:37 AM
To: user@cassandra.apache.org
Subject: Read Repair in cassandra

Hi All,

I have a doubt regarding read repair while reading data. I and  using QUORUM 
for both read and write operations with RF 3 for strong consistency

suppose while write data node1 and node2 replicate the data but it doesn't get 
replicate on node3 because of various factors. coordinator node will save 
hinted handoff for node3.

now read request comes, if at the time node2 gets down, so data will be served 
from node1 and node3. node3 may return older data as hinted handoff may not be 
run from coordinator nofr.

In that case read request will fail as only 1 node has the latest data or 
latest data will get returned from node1 and read repair request will be fired 
for node3?

Re: java.io.FileNotFoundException when setting up internode_compression

2013-11-15 Thread Jan Schmidle

I had this error as well some time ago. It was due to the noexec mount flag of 
the tmp directory. Worked again when I removed that flag from the tmp directory.

Cheers

-- 
Jan Schmidle
Founder & CEO
P+49 89 999540-41 
mschmi...@cospired.com

cospired GmbH 
Roßmarkt 6
D-80331 Munich
P+49 89 999540-40 
F+49 89 999540-49 
mhe...@cospired.com
T@cospired 
Whttp://cospired.com
HRB 196843UID DE281743865




Am 14.11.2013 um 03:39 schrieb srmore:

> Yes it does, the stack trace is in the first thread. I did not try to create 
> CF (was trying to enable it in cassandra.yaml), I have an existing CF and 
> wanted to use compression for inter-node communication. When I enable snappy 
> compression (in yaml) I get the error and cassandra quits. I figured this 
> might be a snappy issue and nothing to do with cassandra, will log a bug 
> there.
> 
> 
> On Wed, Nov 13, 2013 at 8:01 PM, Aaron Morton  wrote:
> IIRC there is a test for snappy when the node starts does that log an error ? 
> 
> And / or can you create a CF that uses snappy compression (it was the default 
> for a while in 1.2). 
> 
> Cheers
> 
> -
> Aaron Morton
> New Zealand
> @aaronmorton
> 
> Co-Founder & Principal Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> On 13/11/2013, at 3:09 am, srmore  wrote:
> 
>> Thanks Christopher !
>> I don't think glibc is an issue (as it did go that far) 
>> /usr/tmp/snappy-1.0.5-libsnappyjava.so is not there, permissions look ok, 
>> are there any special settings (like JVM args) that I should be using ? I 
>> can see libsnappyjava.so in the jar though 
>> (snappy-java-1.0.5.jar\org\xerial\snappy\native\Linux\i386\) one other thing 
>> I am using RedHat 6. I will try updating glibc ans see what happens.
>> 
>> Thanks ! 
>> 
>> 
>> 
>> 
>> On Mon, Nov 11, 2013 at 5:01 PM, Christopher Wirt  
>> wrote:
>> I had this the other day when we were accidentally provisioned a centos5 
>> machine (instead of 6). Think it relates to the version of glibc. Notice it 
>> wants the native binary .so not the .jar
>> 
>>  
>> 
>> So maybe update to a newer version of glibc? Or possibly make sure the .so 
>> exists at /usr/tmp/snappy-1.0.5-libsnappyjava.so?
>> 
>> I was lucky and just did an OS reload to centos6.
>> 
>>  
>> 
>> Here is someone having a similar issue.
>> 
>> http://mail-archives.apache.org/mod_mbox/cassandra-commits/201307.mbox/%3CJIRA.12616012.1352862646995.6820.1373083550278@arcas%3E
>> 
>>  
>> 
>>  
>> 
>> From: srmore [mailto:comom...@gmail.com] 
>> Sent: 11 November 2013 21:32
>> To: user@cassandra.apache.org
>> Subject: java.io.FileNotFoundException when setting up internode_compression
>> 
>>  
>> 
>> I might be missing something obvious here, for some reason I cannot seem to 
>> get internode_compression = all to work. I am getting  the following 
>> exception. I am using cassandra 1.2.9 and have snappy-java-1.0.5.jar in my 
>> classpath. Google search did not return any useful result, has anyone seen 
>> this before ?
>> 
>> 
>> java.io.FileNotFoundException: /usr/tmp/snappy-1.0.5-libsnappyjava.so (No 
>> such file or directory)
>> at java.io.FileOutputStream.open(Native Method)
>> at java.io.FileOutputStream.(FileOutputStream.java:194)
>> at java.io.FileOutputStream.(FileOutputStream.java:145)
>> at 
>> org.xerial.snappy.SnappyLoader.extractLibraryFile(SnappyLoader.java:394)
>> at 
>> org.xerial.snappy.SnappyLoader.findNativeLibrary(SnappyLoader.java:468)
>> at 
>> org.xerial.snappy.SnappyLoader.loadNativeLibrary(SnappyLoader.java:318)
>> at org.xerial.snappy.SnappyLoader.load(SnappyLoader.java:229)
>> at org.xerial.snappy.Snappy.(Snappy.java:48)
>> at 
>> org.apache.cassandra.io.compress.SnappyCompressor.create(SnappyCompressor.java:45)
>> at 
>> org.apache.cassandra.io.compress.SnappyCompressor.isAvailable(SnappyCompressor.java:55)
>> at 
>> org.apache.cassandra.io.compress.SnappyCompressor.(SnappyCompressor.java:37)
>> at 
>> org.apache.cassandra.config.CFMetaData.(CFMetaData.java:82)
>> at 
>> org.apache.cassandra.config.KSMetaData.systemKeyspace(KSMetaData.java:81)
>> at 
>> org.apache.cassandra.config.DatabaseDescriptor.loadYaml(DatabaseDescriptor.java:471)
>> at 
>> org.apache.cassandra.config.DatabaseDescriptor.(DatabaseDescriptor.java:123)
>> 
>> Caused by: java.lang.UnsatisfiedLinkError: no snappyjava in java.library.path
>> at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1738)
>> at java.lang.Runtime.loadLibrary0(Runtime.java:823)
>> at java.lang.System.loadLibrary(System.java:1028)
>> at 
>> org.xerial.snappy.SnappyNativeLoader.loadLibrary(SnappyNativeLoader.java:52)
>> ... 18 more
>> 
>> 
> 
>

Paging error after upgrade from C* 2.0.1 to 2.0.3 , Driver from 2.0.0-rc1 to 2.0.0-rc2

2013-12-19 Thread Jan Algermissen

Hi all,

after upgrading C* and the java-driver I am running into problems with paging. 
Maybe someone can provide a quick clue.

Upgrading was 
  C* from 2.0.1 to 2.0.3
  Java Driver from 2.0.0-rc1 to 2.0.0-rc2



Client side, I get the following messages (apparently during a call to 
resultSet.one() ):


com.datastax.driver.core.exceptions.DriverInternalError: An unexpected error 
occured server side on /37.139.24.133: java.l
ang.AssertionError
at 
com.datastax.driver.core.exceptions.DriverInternalError.copy(DriverInternalError.java:42)
at 
com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271)
at 
com.datastax.driver.core.ResultSet.fetchMoreResultsBlocking(ResultSet.java:252)
at com.datastax.driver.core.ResultSet.one(ResultSet.java:166)
   


Server Side:

 INFO [HANDSHAKE-/37.139.3.70] 2013-12-19 09:55:11,277 
OutboundTcpConnection.java (line 386) Handshaking version with /37.139.3.70
 INFO [HANDSHAKE-/37.139.24.133] 2013-12-19 09:55:11,284 
OutboundTcpConnection.java (line 386) Handshaking version with /37.139.24.133
 INFO [HANDSHAKE-/37.139.24.133] 2013-12-19 09:55:11,309 
OutboundTcpConnection.java (line 386) Handshaking version with /37.139.24.133
 INFO [HANDSHAKE-/146.185.135.226] 2013-12-19 10:00:10,077 
OutboundTcpConnection.java (line 386) Handshaking version with /146.185.135.226
 WARN [ReadStage:87] 2013-12-19 10:00:16,490 SliceQueryFilter.java (line 209) 
Read 111 live and 1776 tombstoned cells (see tombstone_warn_threshold)
 WARN [ReadStage:87] 2013-12-19 10:00:16,976 SliceQueryFilter.java (line 209) 
Read 48 live and 1056 tombstoned cells (see tombstone_warn_threshold)
 WARN [ReadStage:87] 2013-12-19 10:00:18,588 SliceQueryFilter.java (line 209) 
Read 80 live and 1160 tombstoned cells (see tombstone_warn_threshold)
 WARN [ReadStage:88] 2013-12-19 10:00:24,675 SliceQueryFilter.java (line 209) 
Read 48 live and 1056 tombstoned cells (see tombstone_warn_threshold)
 WARN [ReadStage:88] 2013-12-19 10:00:25,715 SliceQueryFilter.java (line 209) 
Read 80 live and 1160 tombstoned cells (see tombstone_warn_threshold)
 WARN [ReadStage:89] 2013-12-19 10:00:31,406 SliceQueryFilter.java (line 209) 
Read 300 live and 6300 tombstoned cells (see tombstone_warn_threshold)
 WARN [ReadStage:89] 2013-12-19 10:00:32,075 SliceQueryFilter.java (line 209) 
Read 65 live and 1040 tombstoned cells (see tombstone_warn_threshold)
 WARN [ReadStage:89] 2013-12-19 10:00:33,207 SliceQueryFilter.java (line 209) 
Read 72 live and 1224 tombstoned cells (see tombstone_warn_threshold)
 WARN [ReadStage:90] 2013-12-19 10:00:37,183 SliceQueryFilter.java (line 209) 
Read 135 live and 1782 tombstoned cells (see tombstone_warn_threshold)
 INFO [ScheduledTasks:1] 2013-12-19 10:00:58,523 GCInspector.java (line 116) GC 
for ParNew: 213 ms for 1 collections, 720697792 used; max is 2057306112
ERROR [Native-Transport-Requests:216] 2013-12-19 10:00:58,913 ErrorMessage.java 
(line 222) Unexpected exception during request
java.lang.AssertionError
at 
org.apache.cassandra.service.pager.AbstractQueryPager.discardFirst(AbstractQueryPager.java:183)
at 
org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:102)
at 
org.apache.cassandra.service.pager.RangeSliceQueryPager.fetchPage(RangeSliceQueryPager.java:36)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:171)
at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:58)
at 
org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:188)
at 
org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:222)
at 
org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119)
at 
org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:304)
at 
org.jboss.netty.handler.execution.ChannelUpstreamEventRunnable.doRun(ChannelUpstreamEventRunnable.java:43)
at 
org.jboss.netty.handler.execution.ChannelEventRunnable.run(ChannelEventRunnable.java:67)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)




Jan

Re: Paging error after upgrade from C* 2.0.1 to 2.0.3 , Driver from 2.0.0-rc1 to 2.0.0-rc2

2013-12-19 Thread Jan Algermissen

Sylvain,

thanks.

Is there anything I can do except waiting for a fix?

Could I do something to my data? Or data model?

I moved to 2.0.3 because I think I experienced missing rows in 2.0.1 paging - 
is this related to the 2.0.3 bug? Meaning: going back to 2.0.1 will fix the 
exception, but leave me with the faulty situation the assertion is there to 
detect?

Jan


On 19.12.2013, at 11:39, Sylvain Lebresne  wrote:

> https://issues.apache.org/jira/browse/CASSANDRA-6447
> 
> 
> On Thu, Dec 19, 2013 at 11:16 AM, Jan Algermissen 
>  wrote:
> Hi all,
> 
> after upgrading C* and the java-driver I am running into problems with 
> paging. Maybe someone can provide a quick clue.
> 
> Upgrading was
>   C* from 2.0.1 to 2.0.3
>   Java Driver from 2.0.0-rc1 to 2.0.0-rc2
> 
> 
> 
> Client side, I get the following messages (apparently during a call to 
> resultSet.one() ):
> 
> 
> com.datastax.driver.core.exceptions.DriverInternalError: An unexpected error 
> occured server side on /37.139.24.133: java.l
> ang.AssertionError
> at 
> com.datastax.driver.core.exceptions.DriverInternalError.copy(DriverInternalError.java:42)
> at 
> com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271)
> at 
> com.datastax.driver.core.ResultSet.fetchMoreResultsBlocking(ResultSet.java:252)
> at com.datastax.driver.core.ResultSet.one(ResultSet.java:166)
>
> 
> 
> Server Side:
> 
>  INFO [HANDSHAKE-/37.139.3.70] 2013-12-19 09:55:11,277 
> OutboundTcpConnection.java (line 386) Handshaking version with /37.139.3.70
>  INFO [HANDSHAKE-/37.139.24.133] 2013-12-19 09:55:11,284 
> OutboundTcpConnection.java (line 386) Handshaking version with /37.139.24.133
>  INFO [HANDSHAKE-/37.139.24.133] 2013-12-19 09:55:11,309 
> OutboundTcpConnection.java (line 386) Handshaking version with /37.139.24.133
>  INFO [HANDSHAKE-/146.185.135.226] 2013-12-19 10:00:10,077 
> OutboundTcpConnection.java (line 386) Handshaking version with 
> /146.185.135.226
>  WARN [ReadStage:87] 2013-12-19 10:00:16,490 SliceQueryFilter.java (line 209) 
> Read 111 live and 1776 tombstoned cells (see tombstone_warn_threshold)
>  WARN [ReadStage:87] 2013-12-19 10:00:16,976 SliceQueryFilter.java (line 209) 
> Read 48 live and 1056 tombstoned cells (see tombstone_warn_threshold)
>  WARN [ReadStage:87] 2013-12-19 10:00:18,588 SliceQueryFilter.java (line 209) 
> Read 80 live and 1160 tombstoned cells (see tombstone_warn_threshold)
>  WARN [ReadStage:88] 2013-12-19 10:00:24,675 SliceQueryFilter.java (line 209) 
> Read 48 live and 1056 tombstoned cells (see tombstone_warn_threshold)
>  WARN [ReadStage:88] 2013-12-19 10:00:25,715 SliceQueryFilter.java (line 209) 
> Read 80 live and 1160 tombstoned cells (see tombstone_warn_threshold)
>  WARN [ReadStage:89] 2013-12-19 10:00:31,406 SliceQueryFilter.java (line 209) 
> Read 300 live and 6300 tombstoned cells (see tombstone_warn_threshold)
>  WARN [ReadStage:89] 2013-12-19 10:00:32,075 SliceQueryFilter.java (line 209) 
> Read 65 live and 1040 tombstoned cells (see tombstone_warn_threshold)
>  WARN [ReadStage:89] 2013-12-19 10:00:33,207 SliceQueryFilter.java (line 209) 
> Read 72 live and 1224 tombstoned cells (see tombstone_warn_threshold)
>  WARN [ReadStage:90] 2013-12-19 10:00:37,183 SliceQueryFilter.java (line 209) 
> Read 135 live and 1782 tombstoned cells (see tombstone_warn_threshold)
>  INFO [ScheduledTasks:1] 2013-12-19 10:00:58,523 GCInspector.java (line 116) 
> GC for ParNew: 213 ms for 1 collections, 720697792 used; max is 2057306112
> ERROR [Native-Transport-Requests:216] 2013-12-19 10:00:58,913 
> ErrorMessage.java (line 222) Unexpected exception during request
> java.lang.AssertionError
> at 
> org.apache.cassandra.service.pager.AbstractQueryPager.discardFirst(AbstractQueryPager.java:183)
> at 
> org.apache.cassandra.service.pager.AbstractQueryPager.fetchPage(AbstractQueryPager.java:102)
> at 
> org.apache.cassandra.service.pager.RangeSliceQueryPager.fetchPage(RangeSliceQueryPager.java:36)
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:171)
> at 
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:58)
> at 
> org.apache.cassandra.cql3.QueryProcessor.processStatement(QueryProcessor.java:188)
> at 
> org.apache.cassandra.cql3.QueryProcessor.process(QueryProcessor.java:222)
> at 
> org.apache.cassandra.transport.messages.QueryMessage.execute(QueryMessage.java:119)
> at 
> org.apache.cassandra.transport.Message$Dispatcher.messageReceived(Message.java:304)
> at 
> org.jboss.netty.handler.execution.ChannelUpstreamEventRunna

1 2 >

1 - 100 of 185 matches

Mail list logo