Binary Protocol Version and CQL version supported in 2.0.14

2015-04-13 Thread Anishek Agarwal
Hello,

I was trying to find what protocol versions are supported in Cassandara
2.0.14 and after reading multiple links i am very very confused.

Please correct me if my understanding is correct:

   - Binary Protocol version and CQL Spec version are different ?
   - Cassandra 2.0.x supports CQL 3 ?
   - Is there a different Binary Protocol version between 2.0.x and 2.1.x ?


Is there some link which states what version of cassandra supports what
binary protocol version and CQL spec version (Additionally showing which
drivers support what will be great too) ?

 The  link
 shows
some info but i am not sure what the supported protocol versions are
referring to(binary or CQL spec).

Thanks
Anishek


Re: Uderstanding Read after update

2015-04-13 Thread Graham Sanderson
Yes it will look in each sstable that according to the bloom filter may have 
data for that partition key and use time stamps to figure out the latest 
version (or none in case of newer tombstone) to return for each clustering key

Sent from my iPhone

> On Apr 12, 2015, at 11:18 PM, Anishek Agarwal  wrote:
> 
> Thanks Tyler for the validations, 
> 
> I have a follow up question. 
> 
> " One SSTable doesn't have precedence over another.  Instead, when the same 
> cell exists in both sstables, the one with the higher write timestamp wins."
> 
> if my table has 5(non partition key columns) and i update only 1 of them then 
> the new SST table should have only that entry, which means if i query 
> everything for that parition key,  cassandra has to have the timestamp 
> matched per column for a partition key across SST tables to get me the data ?
> 
> 
>> On Fri, Apr 10, 2015 at 10:52 PM, Tyler Hobbs  wrote:
>> 
>>> 
>>> SST Table level bloom filters have details as to what partition keys are in 
>>> that table. So to clear up my understanding, if I insert and then have a 
>>> update to the same row after some time (assuming both go to different SST 
>>> Tables), then during read cassandra will read data from both SST Tables and 
>>> merge them in order of time series with Data in Second SST table for the 
>>> row taking precedence over the First SST Table and return the result ?
>> 
>> That's approximately correct.  The only part that's incorrect is how merging 
>> works.  One SSTable doesn't have precedence over another.  Instead, when the 
>> same cell exists in both sstables, the one with the higher write timestamp 
>> wins.
>>  
>>> Does it mark the old column as tombstone in the previous SST Table or wait 
>>> for compaction to remove the old data ?
>> 
>> It just waits for compaction to remove the old data, there's no tombstone.
>> 
>> 
>>> when the data is in mem cache it also keep tracks of unique keys in that 
>>> memtable so when it writes to disk it can use that to derive the right size 
>>> of bloom filter for that SST Table ?
>> 
>> 
>> That's correct, it knows the number of keys before the bloom filter is 
>> created.
>> 
>> -- 
>> Tyler Hobbs
>> DataStax
> 


Delete-only work loads crash Cassandra

2015-04-13 Thread Robert Wille
Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I 
did lots of deletes and no upserts, Cassandra would report that the memtable 
was 0 bytes because an accounting error. The memtable would never flush and 
Cassandra would eventually die. Someone was kind enough to create a patch, 
which seemed to have fixed the problem, but last night it reared its ugly head.

I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes, RF=3, 
CL=1). The workload was pretty light, because this cleanup process is 
single-threaded and does everything synchronously. It was performing 4 reads 
per second and about 3000 deletes per second. Over the course of many hours, 
heap slowly grew on all nodes. CPU utilization also increased as GC consumed an 
ever-increasing amount of time. Eventually a couple of nodes shed 3.5 GB of 
their 7.5 GB. Other nodes weren’t so fortunate and started flapping due to 30 
second GC pauses.

The workaround is pretty simple. This cleanup process can simply write a dummy 
record with a TTL periodically so that Cassandra can flush its memtables and 
function properly. However, I think this probably ought to be fixed. 
Delete-only workloads can’t be that rare. I can’t be the only one that needs to 
go through and cleanup their tables.

Robert



Re: Delete-only work loads crash Cassandra

2015-04-13 Thread Philip Thompson
Did the original patch make it into upstream? That's unclear. If so, what
was the JIRA #? Have you filed a JIRA for the new problem?

On Mon, Apr 13, 2015 at 12:21 PM, Robert Wille  wrote:

> Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If
> I did lots of deletes and no upserts, Cassandra would report that the
> memtable was 0 bytes because an accounting error. The memtable would never
> flush and Cassandra would eventually die. Someone was kind enough to create
> a patch, which seemed to have fixed the problem, but last night it reared
> its ugly head.
>
> I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes,
> RF=3, CL=1). The workload was pretty light, because this cleanup process is
> single-threaded and does everything synchronously. It was performing 4
> reads per second and about 3000 deletes per second. Over the course of many
> hours, heap slowly grew on all nodes. CPU utilization also increased as GC
> consumed an ever-increasing amount of time. Eventually a couple of nodes
> shed 3.5 GB of their 7.5 GB. Other nodes weren’t so fortunate and started
> flapping due to 30 second GC pauses.
>
> The workaround is pretty simple. This cleanup process can simply write a
> dummy record with a TTL periodically so that Cassandra can flush its
> memtables and function properly. However, I think this probably ought to be
> fixed. Delete-only workloads can’t be that rare. I can’t be the only one
> that needs to go through and cleanup their tables.
>
> Robert
>
>


Keyspace Replication changes not synchronized after adding Datacenter

2015-04-13 Thread Thunder Stumpges
Hi guys,

We have recently added two datacenters to our existing 2.0.6 cluster. We
followed the process here pretty much exactly:
http://docs.datastax.com/en/cassandra/2.0/cassandra/operations/ops_add_dc_to_cluster_t.html

We are using GossipingPropertyFileSnitch and NetworkTopologyStrategy across
the board. All property files are identical in each of the three
datacenters, and we use two nodes from each DC in the seed list.

However when we came to step 7.a. we ran the ALTER KEYSPACE command on one
of the new datacenters (to add it as a replica). This change was reflected
on the new datacenter where it ran as returned by DESCRIBE KEYSPACE.
However the change was NOT propagated to either of the other two
datacenters. We effectively had to run the ALTER KEYSPACE command 3 times,
one in each datacenter. Is this expected? I could find no documentation
stating that this needed to be done, nor any documentation around how the
system keyspace was kept in sync across datacenters in general.

If this is indicative of a larger problem with our installation, how would
we go about troubleshooting it?

Thanks in advance!
Thunder


Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-13 Thread Anuj Wadehra
Any comments on side effects of Major compaction especially when sstable 
generated is 100+ GB? 


After Cassandra 1.2 , automated tombstone compaction occurs even on a single 
sstable if tombstone percentage increases the tombstone_threshold sub property 
specified in compaction strategy. So, even if the huge sstable is not compacted 
with any new table, still tombstones will be collected. Any other disadvantage 
of having a giant sstable of hundreds of GB? I understand that sstables have a 
summary and index which helps finding correct data blocks directly from a large 
data file. Still are there any disadvantages?


Thanks

Anuj Wadehra


Sent from Yahoo Mail on Android

From:"Anuj Wadehra" 
Date:Mon, 13 Apr, 2015 at 12:33 am
Subject:Re: Drawbacks of Major Compaction now that Automatic Tombstone 
Compaction Exists

No.


Anuj Wadehra




On Monday, 13 April 2015 12:23 AM, Sebastian Estevez 
 wrote:



Have you tried user defined compactions via JMX?

On Apr 12, 2015 1:40 PM, "Anuj Wadehra"  wrote:

Recently we faced an issue where every repair operation caused addition of 
hundreds of sstables (CASSANDRA-9146). In order to bring situation under 
control and make sure reads are not impacted, we were left with no option but 
to run major compaction to ensure that thousands of tiny sstables are compacted.

Queries:
Does major compaction has any drawback after automatic tombstone compaction got 
implemented in 1.2 via tombstone_threshold sub-property(CASSANDRA-3442)? 
I understand that the huge SSTable created after major compaction wont be 
compacted with new data any time soon but is that a problem if purged data is 
removed via automatic tombstone compaction? If we major compaction results in a 
huge file say 500GB, what are the drawbacks of it?

If one big sstable is a problem, is there any way of solving the problem? We 
tried running sstablesplit after major compaction to split the big sstable but 
as new sstables were of same size they are again compacted into single huge 
table once Cassandra was started after executing sstablesplit.



Thanks

Anuj Wadehra





Re: Impact of removing compactions_in_progress folder

2015-04-13 Thread Anuj Wadehra
Any comments on exceptions related to unfinished compactions on Cassandra start 
up? Best way to deal with them? Side effects of deleting 
compactions_in_progress folder to resolve the issue?


Thanks

Anuj Wadehra

Sent from Yahoo Mail on Android

From:"Anuj Wadehra" 
Date:Mon, 13 Apr, 2015 at 12:32 am
Subject:Impact of removing compactions_in_progress folder

Often we face errors on Cassandra start regarding unfinished compactions 
particularly when cassandra was abrupty shut down . Problem gets resolved when 
we delete /var/lib/cassandra/data/system/compactions_in_progress folder. Does 
deletion of the folder has any impact on  integrity of data or any other aspect?



Thanks

Anuj Wadehra



Re: Delete-only work loads crash Cassandra

2015-04-13 Thread Robert Wille
Unfortunately, I’ve switched email systems and don’t have my emails from that 
time period. I did not file a Jira, and I don’t remember who made the patch for 
me or if he filed a Jira on my behalf.

I vaguely recall seeing the fix in the Cassandra change logs, but I just went 
and read them and I don’t see it. I’m probably remembering wrong.

My suspicion is that the original patch did not make it into the main branch, 
and I just have always had enough concurrent writing to keep Cassandra happy.

Hopefully the author of the patch will read this and be able to chime in.

This issue is very reproducible. I’ll try to come up with some time to write a 
simple program that illustrates the problem and file a Jira.

Thanks

Robert

On Apr 13, 2015, at 10:39 AM, Philip Thompson 
mailto:philip.thomp...@datastax.com>> wrote:

Did the original patch make it into upstream? That's unclear. If so, what was 
the JIRA #? Have you filed a JIRA for the new problem?

On Mon, Apr 13, 2015 at 12:21 PM, Robert Wille 
mailto:rwi...@fold3.com>> wrote:
Back in 2.0.4 or 2.0.5 I ran into a problem with delete-only workloads. If I 
did lots of deletes and no upserts, Cassandra would report that the memtable 
was 0 bytes because an accounting error. The memtable would never flush and 
Cassandra would eventually die. Someone was kind enough to create a patch, 
which seemed to have fixed the problem, but last night it reared its ugly head.

I’m now running 2.0.14. I ran a cleanup process on my cluster (10 nodes, RF=3, 
CL=1). The workload was pretty light, because this cleanup process is 
single-threaded and does everything synchronously. It was performing 4 reads 
per second and about 3000 deletes per second. Over the course of many hours, 
heap slowly grew on all nodes. CPU utilization also increased as GC consumed an 
ever-increasing amount of time. Eventually a couple of nodes shed 3.5 GB of 
their 7.5 GB. Other nodes weren’t so fortunate and started flapping due to 30 
second GC pauses.

The workaround is pretty simple. This cleanup process can simply write a dummy 
record with a TTL periodically so that Cassandra can flush its memtables and 
function properly. However, I think this probably ought to be fixed. 
Delete-only workloads can’t be that rare. I can’t be the only one that needs to 
go through and cleanup their tables.

Robert





Re: Impact of removing compactions_in_progress folder

2015-04-13 Thread Robert Coli
On Sun, Apr 12, 2015 at 12:02 PM, Anuj Wadehra 
wrote:

> Often we face errors on Cassandra start regarding unfinished compactions
> particularly when cassandra was abrupty shut down . Problem gets resolved
> when we delete /var/lib/cassandra/data/system/compactions_in_progress
> folder. Does deletion of the folder has any impact on  integrity of data or
> any other aspect?
>

While I have no specific knowledge about this case, it is difficult to
imagine how canceling a compaction could have any meaningful negative
effect other than the normal penalty one pays for uncompacted data.

"nodetool stop" can also cancel compactions, probably by approximately the
same mechanism as removing an in-progress-compactions file.

However if you can reproduce this reliably, you should :

1) file a ticket on http://issues.apache.org
2) respond to this mail letting the list know the JIRA # of the ticket

=Rob


Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-13 Thread Robert Coli
On Mon, Apr 13, 2015 at 10:52 AM, Anuj Wadehra 
wrote:

> Any comments on side effects of Major compaction especially when sstable
> generated is 100+ GB?
>

I have no idea how this interacts with the automatic compaction stuff; if
you find out, let us know?

But if you want to do a major and don't want to deal with One Big SSTable
afterwards, stop the node and then run sstable_split utility.

=Rob


Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-13 Thread Rahul Neelakantan
Rob,
Does that mean once you split it back into small ones, automatic compaction a 
will continue to happen on a more frequent basis now that it's no longer a 
single large monolith?

Rahul

> On Apr 13, 2015, at 3:23 PM, Robert Coli  wrote:
> 
>> On Mon, Apr 13, 2015 at 10:52 AM, Anuj Wadehra  
>> wrote:
>> 
>> Any comments on side effects of Major compaction especially when sstable 
>> generated is 100+ GB? 
> 
> I have no idea how this interacts with the automatic compaction stuff; if you 
> find out, let us know?
> 
> But if you want to do a major and don't want to deal with One Big SSTable 
> afterwards, stop the node and then run sstable_split utility. 
> 
> =Rob
> 


Do I need to run repair and compaction every node?

2015-04-13 Thread Benyi Wang
I read the document for several times, but I still not quite sure how to
run repair and compaction.

To my understanding,

   - I need to run compaction one each node,
   - To repair a table (column family), I only need to run repair on any of
   nodes.

Am I right?

Thanks.


Re: Drawbacks of Major Compaction now that Automatic Tombstone Compaction Exists

2015-04-13 Thread Robert Coli
On Mon, Apr 13, 2015 at 12:26 PM, Rahul Neelakantan  wrote:

> Does that mean once you split it back into small ones, automatic
> compaction a will continue to happen on a more frequent basis now that it's
> no longer a single large monolith?
>

That's what the word "size tiered" means in the phrase "size tiered
compaction," yes.

=Rob


Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Robert Coli
On Mon, Apr 13, 2015 at 1:36 PM, Benyi Wang  wrote:

>
>- I need to run compaction one each node,
>
> In general, there is no requirement to manually run compaction. Minor
compaction occurs in the background, automatically.

>
>- To repair a table (column family), I only need to run repair on any
>of nodes.
>
> It depends on whether you are doing -pr or non -pr repair.

If you are doing -pr repair, you run repair on all nodes. If you do non -pr
repair, you have to figure out what set of nodes to run it on. That's why
-pr exists, to simplify this.

=Rob


Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Benyi Wang
What about "incremental repair" and "sequential repair"?

I ran "nodetool repair -- keyspace table" on one node. I found the repair
sessions running on different nodes. Will this command repair the whole
table?

In this page:
http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html#concept_ds_ebj_d3q_gk__opsRepairPrtRng

*Using the nodetool repair -pr (–partitioner-range) option repairs only the
first range returned by the partitioner for a node. Other replicas for that
range still have to perform the Merkle tree calculation, causing a
validation compaction.*

Does it sound like -pr runs on one node?
I'm still don't understand "the first range returned by the partitioned for
a node"?

On Mon, Apr 13, 2015 at 1:40 PM, Robert Coli  wrote:

> On Mon, Apr 13, 2015 at 1:36 PM, Benyi Wang  wrote:
>
>>
>>- I need to run compaction one each node,
>>
>> In general, there is no requirement to manually run compaction. Minor
> compaction occurs in the background, automatically.
>
>>
>>- To repair a table (column family), I only need to run repair on any
>>of nodes.
>>
>> It depends on whether you are doing -pr or non -pr repair.
>
> If you are doing -pr repair, you run repair on all nodes. If you do non
> -pr repair, you have to figure out what set of nodes to run it on. That's
> why -pr exists, to simplify this.
>
> =Rob
>
>


Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Jeff Ferland
Nodetool repair: The basic default sequential repair covers all nodes, computes 
merkle trees in sequence one node at a time. Only need to run the command one 
node.
Nodetool repair -par: covers all nodes, computes merkle trees for each node at 
the same time. Much higher IO load as every copy of a key range is scanned at 
once. Can be totally OK with SSDs and throughput limits.  Only need to run the 
command one node.
Nodetool repair -pr: Only covers the ranges owned by the node's token(s). Must 
be run on each node because each node owns a partial share of the ring.

Incremental repair: only consider range changes since the last repair. Probably 
can be combined with whatever other flags.

> On Apr 13, 2015, at 2:37 PM, Benyi Wang  wrote:
> 
> What about "incremental repair" and "sequential repair"?
> 
> I ran "nodetool repair -- keyspace table" on one node. I found the repair 
> sessions running on different nodes. Will this command repair the whole table?
> 
> In this page: 
> http://docs.datastax.com/en/cassandra/2.1/cassandra/operations/ops_repair_nodes_c.html#concept_ds_ebj_d3q_gk__opsRepairPrtRng
>  
> 
> 
> Using the nodetool repair -pr (–partitioner-range) option repairs only the 
> first range returned by the partitioner for a node. Other replicas for that 
> range still have to perform the Merkle tree calculation, causing a validation 
> compaction.
> 
> Does it sound like -pr runs on one node?
> I'm still don't understand "the first range returned by the partitioned for a 
> node"? 
> 
> On Mon, Apr 13, 2015 at 1:40 PM, Robert Coli  > wrote:
> On Mon, Apr 13, 2015 at 1:36 PM, Benyi Wang  > wrote:
> I need to run compaction one each node, 
> In general, there is no requirement to manually run compaction. Minor 
> compaction occurs in the background, automatically. 
> To repair a table (column family), I only need to run repair on any of nodes.
> It depends on whether you are doing -pr or non -pr repair.
> 
> If you are doing -pr repair, you run repair on all nodes. If you do non -pr 
> repair, you have to figure out what set of nodes to run it on. That's why -pr 
> exists, to simplify this. 
> 
> =Rob
> 
> 



Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Robert Coli
On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland  wrote:

> Nodetool repair -par: covers all nodes, computes merkle trees for each
> node at the same time. Much higher IO load as every copy of a key range is
> scanned at once. Can be totally OK with SSDs and throughput limits.  Only
> need to run the command one node.
>

No? -par is just a performance (of repair) de-optimization, intended to
improve service time during repair. Doing -par without -pr on a single node
doesn't repair your entire cluster.

Consider the following 7 node cluster, without vnodes :

A B C D E F G
RF=3

You run a repair on node D, without -pr.

D is repaired against B's tertiary replicas.
D is repaired against C's secondary replicas.
E is repaired against D's secondary replicas.
F is repaired against D's tertiary replicas.
Nodes A and G are completely unaffected and unrepaired, because D does not
share any ranges with them.

repair with or without -par only covers all *replica* nodes. Even with
vnodes, you still have to run it on almost all nodes in most cases. Which
is why most users should save themselves the complexity and just do a
rolling -par -pr on all nodes, one by one.

=Rob


Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Jon Haddad
Or use spotify’s reaper and forget about it 
https://github.com/spotify/cassandra-reaper 

> On Apr 13, 2015, at 3:45 PM, Robert Coli  wrote:
> 
> On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland  > wrote:
> Nodetool repair -par: covers all nodes, computes merkle trees for each node 
> at the same time. Much higher IO load as every copy of a key range is scanned 
> at once. Can be totally OK with SSDs and throughput limits.  Only need to run 
> the command one node.
> 
> No? -par is just a performance (of repair) de-optimization, intended to 
> improve service time during repair. Doing -par without -pr on a single node 
> doesn't repair your entire cluster.
> 
> Consider the following 7 node cluster, without vnodes :
> 
> A B C D E F G
> RF=3
> 
> You run a repair on node D, without -pr.
> 
> D is repaired against B's tertiary replicas.
> D is repaired against C's secondary replicas.
> E is repaired against D's secondary replicas.
> F is repaired against D's tertiary replicas.
> Nodes A and G are completely unaffected and unrepaired, because D does not 
> share any ranges with them.
> 
> repair with or without -par only covers all *replica* nodes. Even with 
> vnodes, you still have to run it on almost all nodes in most cases. Which is 
> why most users should save themselves the complexity and just do a rolling 
> -par -pr on all nodes, one by one.
> 
> =Rob
> 



Re: Do I need to run repair and compaction every node?

2015-04-13 Thread Jeff Ferland
Just read the source and well… yup. I’m guessing now that the options are 
indeed only rolling repair on each node (with -pr stopping the duplicate work) 
or -st -9223372036854775808 -et 9223372036854775807 to actually cover all 
ranges. I didn’t walk through to test that, though.

Glad 3.0 is getting a little bit of love on improving repairs and 
communications / logging about them.

-Jeff

> On Apr 13, 2015, at 3:45 PM, Robert Coli  wrote:
> 
> On Mon, Apr 13, 2015 at 3:33 PM, Jeff Ferland  > wrote:
> Nodetool repair -par: covers all nodes, computes merkle trees for each node 
> at the same time. Much higher IO load as every copy of a key range is scanned 
> at once. Can be totally OK with SSDs and throughput limits.  Only need to run 
> the command one node.
> 
> No? -par is just a performance (of repair) de-optimization, intended to 
> improve service time during repair. Doing -par without -pr on a single node 
> doesn't repair your entire cluster.
> 
> Consider the following 7 node cluster, without vnodes :
> 
> A B C D E F G
> RF=3
> 
> You run a repair on node D, without -pr.
> 
> D is repaired against B's tertiary replicas.
> D is repaired against C's secondary replicas.
> E is repaired against D's secondary replicas.
> F is repaired against D's tertiary replicas.
> Nodes A and G are completely unaffected and unrepaired, because D does not 
> share any ranges with them.
> 
> repair with or without -par only covers all *replica* nodes. Even with 
> vnodes, you still have to run it on almost all nodes in most cases. Which is 
> why most users should save themselves the complexity and just do a rolling 
> -par -pr on all nodes, one by one.
> 
> =Rob
>