RE: C* 1.2.x vs Gossip marking DOWN/UP

2016-04-14 Thread Michael Fong
Hi, Alain,

Thanks for your reply.

Unfortunately, it is a rather old version of system which comes with Cassandra 
v1.2.15, and database upgrade does not seem to be a viable solution. We have 
also recently observed a situation that the Cassandra instance froze around one 
minute while the other nodes eventually mark that node DOWN. Here are some logs 
of the scenario, that there is a 1 minute window with no sign of any operation 
was running:

Gossip related :
TRACE [GossipStage:1] 2016-04-13 23:34:08,641 GossipDigestSynVerbHandler.java 
(line 40) Received a GossipDigestSynMessage from /156.1.1.1
TRACE [GossipStage:1] 2016-04-13 23:35:01,081 GossipDigestSynVerbHandler.java 
(line 71) Gossip syn digests are : /156.1.1.1:1460103192:520418 
/156.1.1.4:1460103190:522108 /156.1.1.2:1460103205:522912 
/156.1.1.3:1460551526:41979

GC related:
2016-04-13T23:34:02.675+: 487270.189: Total time for which application 
threads were stopped: 0.0677060 seconds
2016-04-13T23:35:01.019+: 487328.533: [GC2016-04-13T23:35:01.020+: 
487328.534: [ParNew
Desired survivor size 1474560 bytes, new threshold 1 (max 1)
- age   1:1637144 bytes,1637144 total
: 843200K->1600K(843200K), 0.0559840 secs] 5631683K->4814397K(8446400K), 
0.0567850 secs] [Times: user=0.67 sys=0.00, real=0.05 secs]

Regular Cassandra operation:
INFO [CompactionExecutor:70229] 2016-04-13 23:34:02,439 CompactionTask.java 
(line 266) Compacted 4 sstables to 
[/opt/ruckuswireless/wsg/db/data/wsg/indexHistoricalRuckusClient/wsg-indexHistoricalRuckusClient-ic-1464,].
  54,743,298 bytes to 53,661,608 (~98% of original) in 29,124ms = 1.757166MB/s. 
 417,517 total rows, 265,853 unique.  Row merge counts were {1:114862, 
2:150328, 3:653, 4:10, }
INFO [HANDSHAKE-/156.1.1.2] 2016-04-13 23:35:01,110 OutboundTcpConnection.java 
(line 418) Handshaking version with /156.1.1.2

The situation comes randomly among all nodes. When this happens, the hector 
client application seems to have trouble connecting to that Cassandra database 
as well, for example,
04-13 23:34:54 [taskExecutor-167] ConcurrentHClientPool:273 ERROR - Transport 
exception in re-opening client in release on 
:{localhost(127.0.0.1):9160}

Has anyone had similar experience? The operating system is Ubuntu and kernel 
version is 2.6.32.24. Thanks in advance!

Sincerely,

Michael fong

From: Alain RODRIGUEZ [mailto:arodr...@gmail.com]
Sent: Wednesday, April 13, 2016 9:30 PM
To: user@cassandra.apache.org
Subject: Re: C* 1.2.x vs Gossip marking DOWN/UP

Hi Michael,

I had critical issues using 1.2 (.11, I believe) around gossip (but it was like 
2 years ago...).

Are you using the last C* 1.2.19 minor version? If not, you probably should go 
there asap.

A lot of issues like this one 
https://issues.apache.org/jira/browse/CASSANDRA-6297 have been fixed since then 
on C* 1.2, 2.0, 2.1, 2.2, 3.0.X, 3.X. You got to go through steps to upgrade. 
It should be safe and enough to go to the last 1.2 minor to solve this issue.

For your information, even C* 2.0 is no longer supported. The minimum version 
you should use now is 2.1.last.

This technical debt might end up costing you more in terms of time, money and 
Quality of Service that taking care of upgrades. The most probable thing is 
that your bug is fixed already on newer versions. Plus it is not very 
interesting for us to help you as we would have to go through old code, to find 
issues that are most likely already fixed. If you want some support (from 
community or commercial one) you really should upgrade this cluster. Make sure 
your clients are compatible too.

I did not know that some people were still using C* < 2.0 :-).

Cheers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-13 10:58 GMT+02:00 Michael Fong 
mailto:michael.f...@ruckuswireless.com>>:
Hi, all


We have been a Cassandra 4-node cluster (C* 1.2.x) where a node marked all the 
other 3 nodes DOWN, and came back UP a few seconds later. There was a 
compaction that kicked in a minute before, roughly 10~MB in size, followed by 
marking all the other nodes DOWN later. In the other words, in the system.log 
we see
00:00:00 Compacting ….
00:00:03 Compacted 8 sstables … 10~ megabytes
00:01:06 InetAddress /x.x.x.4 is now DOWN
00:01:06 InetAddress /x.x.x.3 is now DOWN
00:01:06 InetAddress /x.x.x.1 is now DOWN

There was no significant GC activities in gc.log. We have heard that busy 
compaction activities would cause this behavior, but we cannot reason why this 
could happen logically. How come a compaction operation would stop the Gossip 
thread to perform heartbeat check? Has anyone experienced this kind of behavior 
before?

Thanks in advanced!

Sincerely,

Michael Fong



Cassandra 2.1.12 Node size

2016-04-14 Thread Aiman Parvaiz
Hi all,
I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per
node. Each of my node has close to 1 TB of data. I am not seeing any issues
as of now but wanted to run it by you guys if this data size is pushing the
limits in any manner and if I should be working on reducing data size per
node. I will me migrating to incremental repairs shortly and full repair as
of now takes 20 hr/node. I am not seeing any issues with the nodes for now.

Thanks


Re: Balancing tokens over 2 datacenter

2016-04-14 Thread Walsh, Stephen
Thanks Guys,

I tend to agree that its a viable configuration, (but I’m biased)
We use datadog monitoring to view read writes per node,

We see all the writes are balanced (due to the replication factor) but all 
reads only go to DC1.
So with the configuration I believed confirmed :)

Any way to balance the primary tokens over the two DC’s? :)

Steve

From: Jeff Jirsa mailto:jeff.ji...@crowdstrike.com>>
Reply-To: mailto:user@cassandra.apache.org>>
Date: Thursday, 14 April 2016 at 03:05
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Balancing tokens over 2 datacenter

100% ownership on all nodes isn’t wrong with 3 nodes in each of 2 Dcs with RF=3 
in both of those Dcs. That’s exactly what you’d expect it to be, and a 
perfectly viable production config for many workloads.



From: Anuj Wadehra
Reply-To: "user@cassandra.apache.org"
Date: Wednesday, April 13, 2016 at 6:02 PM
To: "user@cassandra.apache.org"
Subject: Re: Balancing tokens over 2 datacenter

Hi Stephen Walsh,

As per the nodetool output, every node owns 100% of the range. This indicates 
wrong configuration. It would be good, if you verify and share following 
properties of yaml on all nodes:

Num tokens,seeds, cluster name,listen address, initial token.

Also, which snitch are you using? If you use propertyfilesnitch, please share 
cassandra-topology.properties too.



Thanks
Anuj

Sent from Yahoo Mail on 
Android

On Wed, 13 Apr, 2016 at 9:46 PM, Walsh, Stephen
mailto:stephen.wa...@aspect.com>> wrote:
Right again Alain
We use the DCAwareRoundRobinPolicy in our java datastax driver in each DC 
application to point to that Cassandra DC’s.



From: Alain RODRIGUEZ >
Reply-To: "user@cassandra.apache.org" 
>
Date: Wednesday, 13 April 2016 at 15:52
To: "user@cassandra.apache.org" 
>
Subject: Re: Balancing tokens over 2 datacenter

Steve,

This cluster looks just great.

Now, due to a miss configuration in our application, we saw that our 
application in both DC’s where pointing to DC1.

This is the only thing to solve, and it happens in the client side 
configuration.

What client do you use?

Are you using something like 'new DCAwareRoundRobinPolicy("DC1"));' as pointed 
in Bhuvan's link 
http://stackoverflow.com/questions/22813045/ability-to-write-to-a-particular-cassandra-node
 ? You can use some other

Then make sure to deploy this on clients on that need to use 'DC1' and 'new 
DCAwareRoundRobinPolicy("DC2")' on client that should be using 'DC2'.

Make sure ports are open.

This should be it,

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2016-04-13 16:28 GMT+02:00 Walsh, Stephen 
>:
Thanks for your helps guys,

As you guessed our schema is

{'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'}  AND 
durable_writes = false;


Our reads and writes on LOCAL_ONE with each application (now) using its own DC 
as its preferred DC

Here is the nodetool status for one of our tables (all tabes are created the 
same way)


Datacenter: DC1

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns (effective)  Host ID
   Rack

UN  X.0.0.149  14.6 MB256 100.0%
0f497235-a0bb-4e47-9434-dd0e126aa432  RAC3

UN  X.0.0.251  12.33 MB   256 100.0%
a1307717-4b61-4d57-8658-50460d6d54a1  RAC1

UN  X.0.0.79   21.54 MB   256 100.0%
f353c8f3-6b7c-483b-ad9a-3d66d469079e  RAC2

Datacenter: DC2

===

Status=Up/Down

|/ State=Normal/Leaving/Joining/Moving

--  Address Load   Tokens  Owns (effective)  Host ID
   Rack

UN  X.0.2.32   18.08 MB   256 100.0%
103a1cb3-6580-44bd-bf97-28ae160e1119  RAC6

UN  X.0.2.211  12.46 MB   256 100.0%
8c8dd5ba-806d-43eb-9ee5-af463e443f46  RAC5

UN  X.0.2.186  12.58 MB   256 100.0%
aef904ba-aaab-47f1-9bdc-cc1e0c676f61  RAC4


We ran the nodetool repair and cleanup in case the nodes where balanced but 
needed cleaning up – this was not the case :(


Steve


From: Alain RODRIGUEZ >
Reply-To: "user@cassandra.apache.org" 
>
Date: Wednesday, 13 April 2016 at 14:48
To: "user@cassandra.apache.org" 
>
Subject: Re: Balancing tokens over 2 datacenter

Hi Steve,

As such, all keyspaces and tables where created on DC1.
The effect of this is that all reads are now going to DC1 and ignoring DC2

I think this is not exactly true. When tables are created, they are created on 
a specific keyspace, no matter where you send the alter schema command, schema 
will propagate to all the datacenters the keyspace is replicated to.

So the question is: Is your keyspace using 'DC1: 3, DC2: 3' as replication 
factors? Could you show us the 

Re: Cassandra 2.1.12 Node size

2016-04-14 Thread Alain RODRIGUEZ
Hi,

I seek advice in data size per node. Each of my node has close to 1 TB of
> data. I am not seeing any issues as of now but wanted to run it by you guys
> if this data size is pushing the limits in any manner and if I should be
> working on reducing data size per node.


There is no real limit to the data size other than 50% of the machine disk
space using STCS and 80 % if you are using LCS. Those are 'soft' limits as
it will depend on your biggest sstables size and the number of concurrent
compactions mainly, but to stay away from trouble, it is better to keep
things under control, below the limits mentioned above.

I will me migrating to incremental repairs shortly and full repair as of
> now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>

As you noticed, you need to keep in mind that the larger the dataset is,
the longer operations will take. Repairs but also bootstrap or replace a
node, remove a node, any operation that require to stream data or read it.
Repair time can be mitigated by using incremental repairs indeed.

I am running a 9 node C* 2.1.12 cluster.
>

It should be quite safe to give incremental repair a try as many bugs have
been fixe in this version:

FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction
- incremental only

https://issues.apache.org/jira/browse/CASSANDRA-10422

FIX 2.1.12 - repair hang when replica is down - incremental only

https://issues.apache.org/jira/browse/CASSANDRA-10288

If you are using DTCS be aware of
https://issues.apache.org/jira/browse/CASSANDRA-3

If using LCS, watch closely sstable and compactions pending counts.

As a general comment, I would say that Cassandra has evolved to be able to
handle huge datasets (memory structures off-heap + increase of heap size
using G1GC, JBOD, vnodes, ...). Today Cassandra works just fine with big
dataset. I have seen clusters with 4+ TB nodes and other using a few GB per
node. It all depends on your requirements and your machines spec. If fast
operations are absolutely necessary, keep it small. If you want to use the
entire disk space (50/80% of total disk space max), go ahead as long as
other resources are fine (CPU, memory, disk throughput, ...).

C*heers,

---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-14 10:57 GMT+02:00 Aiman Parvaiz :

> Hi all,
> I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per
> node. Each of my node has close to 1 TB of data. I am not seeing any issues
> as of now but wanted to run it by you guys if this data size is pushing the
> limits in any manner and if I should be working on reducing data size per
> node. I will me migrating to incremental repairs shortly and full repair as
> of now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>
> Thanks
>
>
>
>


Re: Balancing tokens over 2 datacenter

2016-04-14 Thread Alain RODRIGUEZ
>
> 100% ownership on all nodes isn’t wrong with 3 nodes in each of 2 Dcs with
> RF=3 in both of those Dcs. That’s exactly what you’d expect it to be, and a
> perfectly viable production config for many workloads.


+1, no doubt about it. The only thing is all the nodes own the exact same
data, meaning the data is replicated 6 times, once in each the 6 machines.
Data is expensive but quite safe there, that's a tradeoff to consider, but
it is ok from a Cassandra point of view, nothing "wrong" there.


> We see all the writes are balanced (due to the replication factor) but all
> reads only go to DC1.
> So with the configuration I believed confirmed :)
>
> Any way to balance the primary tokens over the two DC’s? :)
>


Steve, I thought it was now ok.

Could you confirm this?

Are you using something like 'new DCAwareRoundRobinPolicy("DC1"));' as
> pointed in Bhuvan's link
> http://stackoverflow.com/questions/22813045/ability-to-write-to-a-particular-cassandra-node
>  ?
> You can use some other
>
> Then make sure to deploy this on clients on that need to use 'DC1' and
> 'new DCAwareRoundRobinPolicy("DC2")' on client that should be using 'DC2'.
>

Are your client using the 'DCAwareRoundRobinPolicy' and are the clients
from the datacenter related to DC2, using 'new
DCAwareRoundRobinPolicy("DC2")'?

This is really the only thing I can think about right now...

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-14 11:43 GMT+02:00 Walsh, Stephen :

> Thanks Guys,
>
> I tend to agree that its a viable configuration, (but I’m biased)
> We use datadog monitoring to view read writes per node,
>
> We see all the writes are balanced (due to the replication factor) but all
> reads only go to DC1.
> So with the configuration I believed confirmed :)
>
> Any way to balance the primary tokens over the two DC’s? :)
>
> Steve
>
> From: Jeff Jirsa 
> Reply-To: 
> Date: Thursday, 14 April 2016 at 03:05
>
> To: "user@cassandra.apache.org" 
> Subject: Re: Balancing tokens over 2 datacenter
>
> 100% ownership on all nodes isn’t wrong with 3 nodes in each of 2 Dcs with
> RF=3 in both of those Dcs. That’s exactly what you’d expect it to be, and a
> perfectly viable production config for many workloads.
>
>
>
> From: Anuj Wadehra
> Reply-To: "user@cassandra.apache.org"
> Date: Wednesday, April 13, 2016 at 6:02 PM
> To: "user@cassandra.apache.org"
> Subject: Re: Balancing tokens over 2 datacenter
>
> Hi Stephen Walsh,
>
> As per the nodetool output, every node owns 100% of the range. This
> indicates wrong configuration. It would be good, if you verify and share
> following properties of yaml on all nodes:
>
> Num tokens,seeds, cluster name,listen address, initial token.
>
> Also, which snitch are you using? If you use propertyfilesnitch, please
> share cassandra-topology.properties too.
>
>
>
> Thanks
> Anuj
>
> Sent from Yahoo Mail on Android
> 
>
> On Wed, 13 Apr, 2016 at 9:46 PM, Walsh, Stephen
>  wrote:
> Right again Alain
> We use the DCAwareRoundRobinPolicy in our java datastax driver in each DC
> application to point to that Cassandra DC’s.
>
>
>
> From: Alain RODRIGUEZ 
> Reply-To: "user@cassandra.apache.org" 
> Date: Wednesday, 13 April 2016 at 15:52
> To: "user@cassandra.apache.org" 
> Subject: Re: Balancing tokens over 2 datacenter
>
> Steve,
>
> This cluster looks just great.
>
> Now, due to a miss configuration in our application, we saw that our
>> application in both DC’s where pointing to DC1.
>
>
> This is the only thing to solve, and it happens in the client side
> configuration.
>
> What client do you use?
>
> Are you using something like 'new DCAwareRoundRobinPolicy("DC1"));' as
> pointed in Bhuvan's link
> http://stackoverflow.com/questions/22813045/ability-to-write-to-a-particular-cassandra-node
> ? You can use some other
>
> Then make sure to deploy this on clients on that need to use 'DC1' and
> 'new DCAwareRoundRobinPolicy("DC2")' on client that should be using 'DC2'.
>
> Make sure ports are open.
>
> This should be it,
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
>
> 2016-04-13 16:28 GMT+02:00 Walsh, Stephen :
>
>> Thanks for your helps guys,
>>
>> As you guessed our schema is
>>
>> {'class': 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3'}  AND
>> durable_writes = false;
>>
>>
>> Our reads and writes on LOCAL_ONE with each application (now) using its
>> own DC as its preferred DC
>>
>> Here is the nodetool status for one of our tables (all tabes are created
>> the same way)
>>
>> Datacenter: DC1
>>
>> ===
>>
>> Status=Up/Down
>>
>> |/ State=Normal/Leaving/Joining/Moving
>>
>> --  Address Load   Tokens  Owns (effective)  Host ID
>>   Rack
>>
>> UN  X.0.0.149  14.6 MB256  

Re: Balancing tokens over 2 datacenter

2016-04-14 Thread Walsh, Stephen
Hi Alain,

If you look below (chain is getting long I know) but I mentioned that we are 
indeed using DCAwareRoundRobinPolicy

"We use the DCAwareRoundRobinPolicy in our java datastax driver in each DC 
application to point to that Cassandra DC’s."

Indeed it is a trade off having all data over all nodes, but this is to allow, 
one DC to go down or 2 nodes now in a single DC.
Just to insure maximum up time.

Im afraid the that all application are all reading from DC1, despite having a 
preferred read of DC2.
I believe this is because the primary tokens where created in DC1 - due to an 
initial miss-configuration when our application where first started and only 
used DC1 to create the keyspaces ad tables

Steve


From: Alain RODRIGUEZ mailto:arodr...@gmail.com>>
Reply-To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Date: Thursday, 14 April 2016 at 12:57
To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Balancing tokens over 2 datacenter

100% ownership on all nodes isn’t wrong with 3 nodes in each of 2 Dcs with RF=3 
in both of those Dcs. That’s exactly what you’d expect it to be, and a 
perfectly viable production config for many workloads.

+1, no doubt about it. The only thing is all the nodes own the exact same data, 
meaning the data is replicated 6 times, once in each the 6 machines. Data is 
expensive but quite safe there, that's a tradeoff to consider, but it is ok 
from a Cassandra point of view, nothing "wrong" there.


We see all the writes are balanced (due to the replication factor) but all 
reads only go to DC1.
So with the configuration I believed confirmed :)

Any way to balance the primary tokens over the two DC’s? :)


Steve, I thought it was now ok.

Could you confirm this?

Are you using something like 'new DCAwareRoundRobinPolicy("DC1"));' as pointed 
in Bhuvan's link 
http://stackoverflow.com/questions/22813045/ability-to-write-to-a-particular-cassandra-node
 ? You can use some other

Then make sure to deploy this on clients on that need to use 'DC1' and 'new 
DCAwareRoundRobinPolicy("DC2")' on client that should be using 'DC2'.

Are your client using the 'DCAwareRoundRobinPolicy' and are the clients from 
the datacenter related to DC2, using 'new DCAwareRoundRobinPolicy("DC2")'?

This is really the only thing I can think about right now...

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com

2016-04-14 11:43 GMT+02:00 Walsh, Stephen 
mailto:stephen.wa...@aspect.com>>:
Thanks Guys,

I tend to agree that its a viable configuration, (but I’m biased)
We use datadog monitoring to view read writes per node,

We see all the writes are balanced (due to the replication factor) but all 
reads only go to DC1.
So with the configuration I believed confirmed :)

Any way to balance the primary tokens over the two DC’s? :)

Steve

From: Jeff Jirsa mailto:jeff.ji...@crowdstrike.com>>
Reply-To: mailto:user@cassandra.apache.org>>
Date: Thursday, 14 April 2016 at 03:05

To: "user@cassandra.apache.org" 
mailto:user@cassandra.apache.org>>
Subject: Re: Balancing tokens over 2 datacenter

100% ownership on all nodes isn’t wrong with 3 nodes in each of 2 Dcs with RF=3 
in both of those Dcs. That’s exactly what you’d expect it to be, and a 
perfectly viable production config for many workloads.



From: Anuj Wadehra
Reply-To: "user@cassandra.apache.org"
Date: Wednesday, April 13, 2016 at 6:02 PM
To: "user@cassandra.apache.org"
Subject: Re: Balancing tokens over 2 datacenter

Hi Stephen Walsh,

As per the nodetool output, every node owns 100% of the range. This indicates 
wrong configuration. It would be good, if you verify and share following 
properties of yaml on all nodes:

Num tokens,seeds, cluster name,listen address, initial token.

Also, which snitch are you using? If you use propertyfilesnitch, please share 
cassandra-topology.properties too.



Thanks
Anuj

Sent from Yahoo Mail on 
Android

On Wed, 13 Apr, 2016 at 9:46 PM, Walsh, Stephen
mailto:stephen.wa...@aspect.com>> wrote:
Right again Alain
We use the DCAwareRoundRobinPolicy in our java datastax driver in each DC 
application to point to that Cassandra DC’s.



From: Alain RODRIGUEZ 
Reply-To: "user@cassandra.apache.org" 
Date: Wednesday, 13 April 2016 at 15:52
To: "user@cassandra.apache.org" 
Subject: Re: Balancing tokens over 2 datacenter

Steve,

This cluster looks just great.

Now, due to a miss configuration in our application, we saw that our 
application in both DC’s where pointing to DC1.

This is the only thing to solve, and it happens in the client side 
configuration.

What client do you use?

Are you using something like 'new D

Re: Balancing tokens over 2 datacenter

2016-04-14 Thread Alain RODRIGUEZ
>
> I believe this is because the primary tokens where created in DC1 - due to
> an initial miss-configuration when our application where first started and
> only used DC1 to create the keyspaces ad tables
>

What does 'nodetool describecluster' outputs? If all the nodes share the
same schema then you are fine. Else locally use 'nodetool resetlocalschema'
(not sure about the command name) on the nodes not sharing the good schema.

Im afraid the that all application are all reading from DC1, despite having
> a preferred read of DC2.


How do you measure that, what metric are you checking?

FWIW, you can check from a node from DC2 if it is acting as a coordinator
or not at all by using:

'watch -d nodetool tpstats'

If the value for 'RequestResponseStage' is increasing, then this node is
acting as a coordinator (receiving and answering client reads or writes).

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com



2016-04-14 14:13 GMT+02:00 Walsh, Stephen :

> Hi Alain,
>
> If you look below (chain is getting long I know) but I mentioned that we
> are indeed using DCAwareRoundRobinPolicy
>
> "We use the DCAwareRoundRobinPolicy in our java datastax driver in each DC
> application to point to that Cassandra DC’s."
>
> Indeed it is a trade off having all data over all nodes, but this is to
> allow, one DC to go down or 2 nodes now in a single DC.
> Just to insure maximum up time.
>
> Im afraid the that all application are all reading from DC1, despite
> having a preferred read of DC2.
> I believe this is because the primary tokens where created in DC1 - due to
> an initial miss-configuration when our application where first started and
> only used DC1 to create the keyspaces ad tables
>
> Steve
>
>
> From: Alain RODRIGUEZ 
> Reply-To: "user@cassandra.apache.org" 
> Date: Thursday, 14 April 2016 at 12:57
>
> To: "user@cassandra.apache.org" 
> Subject: Re: Balancing tokens over 2 datacenter
>
> 100% ownership on all nodes isn’t wrong with 3 nodes in each of 2 Dcs with
>> RF=3 in both of those Dcs. That’s exactly what you’d expect it to be, and a
>> perfectly viable production config for many workloads.
>
>
> +1, no doubt about it. The only thing is all the nodes own the exact same
> data, meaning the data is replicated 6 times, once in each the 6 machines.
> Data is expensive but quite safe there, that's a tradeoff to consider, but
> it is ok from a Cassandra point of view, nothing "wrong" there.
>
>
>> We see all the writes are balanced (due to the replication factor) but
>> all reads only go to DC1.
>> So with the configuration I believed confirmed :)
>>
>> Any way to balance the primary tokens over the two DC’s? :)
>>
>
>
> Steve, I thought it was now ok.
>
> Could you confirm this?
>
> Are you using something like 'new DCAwareRoundRobinPolicy("DC1"));' as
>> pointed in Bhuvan's link
>> http://stackoverflow.com/questions/22813045/ability-to-write-to-a-particular-cassandra-node
>>  ?
>> You can use some other
>>
>> Then make sure to deploy this on clients on that need to use 'DC1' and
>> 'new DCAwareRoundRobinPolicy("DC2")' on client that should be using 'DC2'.
>>
>
> Are your client using the 'DCAwareRoundRobinPolicy' and are the clients
> from the datacenter related to DC2, using 'new
> DCAwareRoundRobinPolicy("DC2")'?
>
> This is really the only thing I can think about right now...
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-04-14 11:43 GMT+02:00 Walsh, Stephen :
>
>> Thanks Guys,
>>
>> I tend to agree that its a viable configuration, (but I’m biased)
>> We use datadog monitoring to view read writes per node,
>>
>> We see all the writes are balanced (due to the replication factor) but
>> all reads only go to DC1.
>> So with the configuration I believed confirmed :)
>>
>> Any way to balance the primary tokens over the two DC’s? :)
>>
>> Steve
>>
>> From: Jeff Jirsa 
>> Reply-To: 
>> Date: Thursday, 14 April 2016 at 03:05
>>
>> To: "user@cassandra.apache.org" 
>> Subject: Re: Balancing tokens over 2 datacenter
>>
>> 100% ownership on all nodes isn’t wrong with 3 nodes in each of 2 Dcs
>> with RF=3 in both of those Dcs. That’s exactly what you’d expect it to be,
>> and a perfectly viable production config for many workloads.
>>
>>
>>
>> From: Anuj Wadehra
>> Reply-To: "user@cassandra.apache.org"
>> Date: Wednesday, April 13, 2016 at 6:02 PM
>> To: "user@cassandra.apache.org"
>> Subject: Re: Balancing tokens over 2 datacenter
>>
>> Hi Stephen Walsh,
>>
>> As per the nodetool output, every node owns 100% of the range. This
>> indicates wrong configuration. It would be good, if you verify and share
>> following properties of yaml on all nodes:
>>
>> Num tokens,seeds, cluster name,listen address, initial token.
>>
>> Also, which snitch are you using? If you use propertyfiles

Re: Cassandra 2.1.12 Node size

2016-04-14 Thread Aiman Parvaiz
Thanks for the response Alain. I am using STCS and would like to take some 
action as we would be hitting 50% disk space pretty soon. Would adding nodes be 
the right way to start if I want to get the data per node down otherwise can 
you or someone on the list please suggest the right way to go about it.

Thanks

Sent from my iPhone

> On Apr 14, 2016, at 5:17 PM, Alain RODRIGUEZ  wrote:
> 
> Hi,
> 
>> I seek advice in data size per node. Each of my node has close to 1 TB of 
>> data. I am not seeing any issues as of now but wanted to run it by you guys 
>> if this data size is pushing the limits in any manner and if I should be 
>> working on reducing data size per node.
> 
> There is no real limit to the data size other than 50% of the machine disk 
> space using STCS and 80 % if you are using LCS. Those are 'soft' limits as it 
> will depend on your biggest sstables size and the number of concurrent 
> compactions mainly, but to stay away from trouble, it is better to keep 
> things under control, below the limits mentioned above.
> 
>> I will me migrating to incremental repairs shortly and full repair as of now 
>> takes 20 hr/node. I am not seeing any issues with the nodes for now.
> 
> As you noticed, you need to keep in mind that the larger the dataset is, the 
> longer operations will take. Repairs but also bootstrap or replace a node, 
> remove a node, any operation that require to stream data or read it. Repair 
> time can be mitigated by using incremental repairs indeed. 
> 
>> I am running a 9 node C* 2.1.12 cluster.
> 
> It should be quite safe to give incremental repair a try as many bugs have 
> been fixe in this version:
> 
> FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction - 
> incremental only
> 
> https://issues.apache.org/jira/browse/CASSANDRA-10422
> 
> FIX 2.1.12 - repair hang when replica is down - incremental only
> 
> https://issues.apache.org/jira/browse/CASSANDRA-10288
> 
> If you are using DTCS be aware of 
> https://issues.apache.org/jira/browse/CASSANDRA-3
> 
> If using LCS, watch closely sstable and compactions pending counts.
> 
> As a general comment, I would say that Cassandra has evolved to be able to 
> handle huge datasets (memory structures off-heap + increase of heap size 
> using G1GC, JBOD, vnodes, ...). Today Cassandra works just fine with big 
> dataset. I have seen clusters with 4+ TB nodes and other using a few GB per 
> node. It all depends on your requirements and your machines spec. If fast 
> operations are absolutely necessary, keep it small. If you want to use the 
> entire disk space (50/80% of total disk space max), go ahead as long as other 
> resources are fine (CPU, memory, disk throughput, ...).
> 
> C*heers,
> 
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
> 
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
> 
> 
> 2016-04-14 10:57 GMT+02:00 Aiman Parvaiz :
>> Hi all,
>> I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per 
>> node. Each of my node has close to 1 TB of data. I am not seeing any issues 
>> as of now but wanted to run it by you guys if this data size is pushing the 
>> limits in any manner and if I should be working on reducing data size per 
>> node. I will me migrating to incremental repairs shortly and full repair as 
>> of now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>> 
>> Thanks
> 


Experience with Kubernetes

2016-04-14 Thread Jack Krupansky
Does anybody here have any experience, positive or negative, with deploying
Cassandra (or DSE) clusters using Kubernetes? I don't have any immediate
need (or experience), but I am curious about the pros and cons.

There is an example here:
https://github.com/kubernetes/kubernetes/tree/master/examples/cassandra

Is there a better approach to deploying a Cassandra/DSE cluster than
Kubernetes?

Thanks.

-- Jack Krupansky


Re: Cassandra 2.1.12 Node size

2016-04-14 Thread Alain RODRIGUEZ
>
> Would adding nodes be the right way to start if I want to get the data per
> node down


Yes, if everything else is fine, the last and always available option to
reduce the disk size per node is to add new nodes. Sometimes it is the
first option considered as it is relatively quick and quite strait forward.

Again, 50 % of free disk space is not a hard limit. To give you a rough
idea, if the biggest sstable is 100 GB big and you still have 400 GB free,
you will probably be good to go, excepted if 4 compaction of 100 GB trigger
at the same time, filling up the disk.

Now is the good time to think of a plan to handle the growth for you, but
don't worry if data reaches 60%, it will probably not be a big deal.

You can make sure that:

- There are no snapshots, heap dumps or data not related with C* taking
some space
- The biggest sstables tombstone ratio are not too high (are tombstones are
correctly evicted ?)
- You are using compression (if you want too)

Consider:

- Adding TTLs to data you don't want to keep forever, shorten TTLs as much
as allowed.
- Migrating to C*3.0+ and take advantage of the new engine storage

C*heers,
---
Alain Rodriguez - al...@thelastpickle.com
France

The Last Pickle - Apache Cassandra Consulting
http://www.thelastpickle.com


2016-04-14 15:41 GMT+02:00 Aiman Parvaiz :

> Thanks for the response Alain. I am using STCS and would like to take some
> action as we would be hitting 50% disk space pretty soon. Would adding
> nodes be the right way to start if I want to get the data per node down
> otherwise can you or someone on the list please suggest the right way to go
> about it.
>
> Thanks
>
> Sent from my iPhone
>
> On Apr 14, 2016, at 5:17 PM, Alain RODRIGUEZ  wrote:
>
> Hi,
>
> I seek advice in data size per node. Each of my node has close to 1 TB of
>> data. I am not seeing any issues as of now but wanted to run it by you guys
>> if this data size is pushing the limits in any manner and if I should be
>> working on reducing data size per node.
>
>
> There is no real limit to the data size other than 50% of the machine disk
> space using STCS and 80 % if you are using LCS. Those are 'soft' limits as
> it will depend on your biggest sstables size and the number of concurrent
> compactions mainly, but to stay away from trouble, it is better to keep
> things under control, below the limits mentioned above.
>
> I will me migrating to incremental repairs shortly and full repair as of
>> now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>>
>
> As you noticed, you need to keep in mind that the larger the dataset is,
> the longer operations will take. Repairs but also bootstrap or replace a
> node, remove a node, any operation that require to stream data or read it.
> Repair time can be mitigated by using incremental repairs indeed.
>
> I am running a 9 node C* 2.1.12 cluster.
>>
>
> It should be quite safe to give incremental repair a try as many bugs have
> been fixe in this version:
>
> FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction
> - incremental only
>
> https://issues.apache.org/jira/browse/CASSANDRA-10422
>
> FIX 2.1.12 - repair hang when replica is down - incremental only
>
> https://issues.apache.org/jira/browse/CASSANDRA-10288
>
> If you are using DTCS be aware of
> https://issues.apache.org/jira/browse/CASSANDRA-3
>
> If using LCS, watch closely sstable and compactions pending counts.
>
> As a general comment, I would say that Cassandra has evolved to be able to
> handle huge datasets (memory structures off-heap + increase of heap size
> using G1GC, JBOD, vnodes, ...). Today Cassandra works just fine with big
> dataset. I have seen clusters with 4+ TB nodes and other using a few GB per
> node. It all depends on your requirements and your machines spec. If fast
> operations are absolutely necessary, keep it small. If you want to use the
> entire disk space (50/80% of total disk space max), go ahead as long as
> other resources are fine (CPU, memory, disk throughput, ...).
>
> C*heers,
>
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
> 2016-04-14 10:57 GMT+02:00 Aiman Parvaiz :
>
>> Hi all,
>> I am running a 9 node C* 2.1.12 cluster. I seek advice in data size per
>> node. Each of my node has close to 1 TB of data. I am not seeing any issues
>> as of now but wanted to run it by you guys if this data size is pushing the
>> limits in any manner and if I should be working on reducing data size per
>> node. I will me migrating to incremental repairs shortly and full repair as
>> of now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>>
>> Thanks
>>
>>
>>
>>
>


Re: Cassandra 2.1.12 Node size

2016-04-14 Thread Jack Krupansky
The four criteria I would suggest for evaluating node size:

1. Query latency.
2. Query throughput/load
3. Repair time - worst case, full repair, what you can least afford if it
happens at the worst time
4. Expected growth over the next six to 18 months - you don't what to be
scrambling with latency, throughput, and repair problems when you bump into
a wall on capacity. 20% to 30% is a fair number.

Alas, it is very difficult to determine how much spare capacity you have,
other than an artificial, synthetic load test: Try 30% more clients and
queries with 30% more (synthetic) data and see what happens to query
latency, total throughput, and repair time. Run such a test periodically
(monthly) to get a heads-up when load is getting closer to a wall.

Incremental repair is great to streamline and optimize your day-to-day
operations, but focus attention on replacement of down nodes during times
of stress.



-- Jack Krupansky

On Thu, Apr 14, 2016 at 10:14 AM, Alain RODRIGUEZ 
wrote:

> Would adding nodes be the right way to start if I want to get the data per
>> node down
>
>
> Yes, if everything else is fine, the last and always available option to
> reduce the disk size per node is to add new nodes. Sometimes it is the
> first option considered as it is relatively quick and quite strait forward.
>
> Again, 50 % of free disk space is not a hard limit. To give you a rough
> idea, if the biggest sstable is 100 GB big and you still have 400 GB free,
> you will probably be good to go, excepted if 4 compaction of 100 GB trigger
> at the same time, filling up the disk.
>
> Now is the good time to think of a plan to handle the growth for you, but
> don't worry if data reaches 60%, it will probably not be a big deal.
>
> You can make sure that:
>
> - There are no snapshots, heap dumps or data not related with C* taking
> some space
> - The biggest sstables tombstone ratio are not too high (are tombstones
> are correctly evicted ?)
> - You are using compression (if you want too)
>
> Consider:
>
> - Adding TTLs to data you don't want to keep forever, shorten TTLs as much
> as allowed.
> - Migrating to C*3.0+ and take advantage of the new engine storage
>
> C*heers,
> ---
> Alain Rodriguez - al...@thelastpickle.com
> France
>
> The Last Pickle - Apache Cassandra Consulting
> http://www.thelastpickle.com
>
>
> 2016-04-14 15:41 GMT+02:00 Aiman Parvaiz :
>
>> Thanks for the response Alain. I am using STCS and would like to take
>> some action as we would be hitting 50% disk space pretty soon. Would adding
>> nodes be the right way to start if I want to get the data per node down
>> otherwise can you or someone on the list please suggest the right way to go
>> about it.
>>
>> Thanks
>>
>> Sent from my iPhone
>>
>> On Apr 14, 2016, at 5:17 PM, Alain RODRIGUEZ  wrote:
>>
>> Hi,
>>
>> I seek advice in data size per node. Each of my node has close to 1 TB of
>>> data. I am not seeing any issues as of now but wanted to run it by you guys
>>> if this data size is pushing the limits in any manner and if I should be
>>> working on reducing data size per node.
>>
>>
>> There is no real limit to the data size other than 50% of the machine
>> disk space using STCS and 80 % if you are using LCS. Those are 'soft'
>> limits as it will depend on your biggest sstables size and the number of
>> concurrent compactions mainly, but to stay away from trouble, it is better
>> to keep things under control, below the limits mentioned above.
>>
>> I will me migrating to incremental repairs shortly and full repair as of
>>> now takes 20 hr/node. I am not seeing any issues with the nodes for now.
>>>
>>
>> As you noticed, you need to keep in mind that the larger the dataset is,
>> the longer operations will take. Repairs but also bootstrap or replace a
>> node, remove a node, any operation that require to stream data or read it.
>> Repair time can be mitigated by using incremental repairs indeed.
>>
>> I am running a 9 node C* 2.1.12 cluster.
>>>
>>
>> It should be quite safe to give incremental repair a try as many bugs
>> have been fixe in this version:
>>
>> FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction
>> - incremental only
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-10422
>>
>> FIX 2.1.12 - repair hang when replica is down - incremental only
>>
>> https://issues.apache.org/jira/browse/CASSANDRA-10288
>>
>> If you are using DTCS be aware of
>> https://issues.apache.org/jira/browse/CASSANDRA-3
>>
>> If using LCS, watch closely sstable and compactions pending counts.
>>
>> As a general comment, I would say that Cassandra has evolved to be able
>> to handle huge datasets (memory structures off-heap + increase of heap size
>> using G1GC, JBOD, vnodes, ...). Today Cassandra works just fine with big
>> dataset. I have seen clusters with 4+ TB nodes and other using a few GB per
>> node. It all depends on your requirements and your machines spec. If fast
>> operations are absolutely necessary,

Re: Large primary keys

2016-04-14 Thread Robert Wille
That would be a nice solution, but 3.4 is way too bleeding edge. I’ll just go 
with the digest for now. Thanks for pointing it out. I’ll have to consider a 
migration in the future when production is on 3.x.

On Apr 11, 2016, at 10:19 PM, Jack Krupansky 
mailto:jack.krupan...@gmail.com>> wrote:

Check out the text indexing feature of the new SASI feature in Cassandra 3.4. 
You could write a custom tokenizer to extract entities and then be able to 
query for documents that contain those entities.

That said, using a SHA digest key for the primary key has merit for direct 
access to the document given the document text.

-- Jack Krupansky

On Mon, Apr 11, 2016 at 7:12 PM, James Carman 
mailto:ja...@carmanconsulting.com>> wrote:
S3 maybe?

On Mon, Apr 11, 2016 at 7:05 PM Robert Wille 
mailto:rwi...@fold3.com>> wrote:
I do realize its kind of a weird use case, but it is legitimate. I have a 
collection of documents that I need to index, and I want to perform entity 
extraction on them and give the extracted entities special treatment in my 
full-text index. Because entity extraction costs money, and each document will 
end up being indexed multiple times, I want to cache them in Cassandra. The 
document text is the obvious key to retrieve entities from the cache. If I use 
the document ID, then I have to track timestamps. I know that sounds like a 
simple workaround, but I’m presenting a much-simplified view of my actual data 
model.

The reason for needing the text in the table, and not just a digest, is that 
sometimes entity extraction has to be deferred due to license limitations. In 
those cases, the entity extraction occurs on a background process, and the 
entities will be included in the index the next time the document is indexed.

I will use a digest as the key. I suspected that would be the answer, but its 
good to get confirmation.

Robert

On Apr 11, 2016, at 4:36 PM, Jan Kesten 
mailto:j.kes...@enercast.de>> wrote:

> Hi Robert,
>
> why do you need the actual text as a key? I sounds a bit unatural at least 
> for me. Keep in mind that you cannot do "like" queries on keys in cassandra. 
> For performance and keeping things more readable I would prefer hashing your 
> text and use the hash as key.
>
> You should also take into account to store the keys (hashes) in a seperate 
> table per day / hour or something like that, so you can quickly get all keys 
> for a time range. A query without the partition key may be very slow.
>
> Jan
>
> Am 11.04.2016 um 23:43 schrieb Robert Wille:
>> I have a need to be able to use the text of a document as the primary key in 
>> a table. These texts are usually less than 1K, but can sometimes be 10’s of 
>> K’s in size. Would it be better to use a digest of the text as the key? I 
>> have a background process that will occasionally need to do a full table 
>> scan and retrieve all of the texts, so using the digest doesn’t eliminate 
>> the need to store the text. Anyway, is it better to keep primary keys small, 
>> or is C* okay with large primary keys?
>>
>> Robert
>>
>





Re: Experience with Kubernetes

2016-04-14 Thread Joe Stein
You can use Mesos https://github.com/elodina/datastax-enterprise-mesos

~ Joestein
On Apr 14, 2016 10:13 AM, "Jack Krupansky"  wrote:

> Does anybody here have any experience, positive or negative, with
> deploying Cassandra (or DSE) clusters using Kubernetes? I don't have any
> immediate need (or experience), but I am curious about the pros and cons.
>
> There is an example here:
> https://github.com/kubernetes/kubernetes/tree/master/examples/cassandra
>
> Is there a better approach to deploying a Cassandra/DSE cluster than
> Kubernetes?
>
> Thanks.
>
> -- Jack Krupansky
>


Re: Experience with Kubernetes

2016-04-14 Thread Nate McCall
> Does anybody here have any experience, positive or negative, with
> deploying Cassandra (or DSE) clusters using Kubernetes? I don't have any
> immediate need (or experience), but I am curious about the pros and cons.
>
>

The last time I played around with kubernetes+cassandra, you could not
specify node allocations across failure boundaries (AZs, Regions, etc).

To me, that makes it not interesting outside of development or trivial
setups.

It does look like they are getting farther along on "ubernetes" which
should fix this:
https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/federation.md



-- 
-
Nate McCall
Austin, TX
@zznate

Co-Founder & Sr. Technical Consultant
Apache Cassandra Consulting
http://www.thelastpickle.com


Re: Experience with Kubernetes

2016-04-14 Thread Joe Stein
You can do that with the Mesos scheduler
https://github.com/elodina/datastax-enterprise-mesos and layout clusters
and racks for datacenters based on attributes
http://mesos.apache.org/documentation/latest/attributes-resources/

~ Joestein
On Apr 14, 2016 12:05 PM, "Nate McCall"  wrote:

>
> Does anybody here have any experience, positive or negative, with
>> deploying Cassandra (or DSE) clusters using Kubernetes? I don't have any
>> immediate need (or experience), but I am curious about the pros and cons.
>>
>>
>
> The last time I played around with kubernetes+cassandra, you could not
> specify node allocations across failure boundaries (AZs, Regions, etc).
>
> To me, that makes it not interesting outside of development or trivial
> setups.
>
> It does look like they are getting farther along on "ubernetes" which
> should fix this:
>
> https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/federation.md
>
>
>
> --
> -
> Nate McCall
> Austin, TX
> @zznate
>
> Co-Founder & Sr. Technical Consultant
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


Re: Cassandra 2.1.12 Node size

2016-04-14 Thread Aiman Parvaiz
Right now the biggest SST which I have is 210GB on a 3 TB disk, total disk
consumed is around 50% on all nodes, I am using SCTS. Read and Write query
latency is under 15ms. Full repair time is long but am sure when I switch
to incremental repairs this would be taken care of. I am hitting the 50%
disk issue. I recently ran the cleanup and backups aren't taking that much
space.

On Thu, Apr 14, 2016 at 8:06 PM, Jack Krupansky 
wrote:

> The four criteria I would suggest for evaluating node size:
>
> 1. Query latency.
> 2. Query throughput/load
> 3. Repair time - worst case, full repair, what you can least afford if it
> happens at the worst time
> 4. Expected growth over the next six to 18 months - you don't what to be
> scrambling with latency, throughput, and repair problems when you bump into
> a wall on capacity. 20% to 30% is a fair number.
>
> Alas, it is very difficult to determine how much spare capacity you have,
> other than an artificial, synthetic load test: Try 30% more clients and
> queries with 30% more (synthetic) data and see what happens to query
> latency, total throughput, and repair time. Run such a test periodically
> (monthly) to get a heads-up when load is getting closer to a wall.
>
> Incremental repair is great to streamline and optimize your day-to-day
> operations, but focus attention on replacement of down nodes during times
> of stress.
>
>
>
> -- Jack Krupansky
>
> On Thu, Apr 14, 2016 at 10:14 AM, Alain RODRIGUEZ 
> wrote:
>
>> Would adding nodes be the right way to start if I want to get the data
>>> per node down
>>
>>
>> Yes, if everything else is fine, the last and always available option to
>> reduce the disk size per node is to add new nodes. Sometimes it is the
>> first option considered as it is relatively quick and quite strait forward.
>>
>> Again, 50 % of free disk space is not a hard limit. To give you a rough
>> idea, if the biggest sstable is 100 GB big and you still have 400 GB free,
>> you will probably be good to go, excepted if 4 compaction of 100 GB trigger
>> at the same time, filling up the disk.
>>
>> Now is the good time to think of a plan to handle the growth for you, but
>> don't worry if data reaches 60%, it will probably not be a big deal.
>>
>> You can make sure that:
>>
>> - There are no snapshots, heap dumps or data not related with C* taking
>> some space
>> - The biggest sstables tombstone ratio are not too high (are tombstones
>> are correctly evicted ?)
>> - You are using compression (if you want too)
>>
>> Consider:
>>
>> - Adding TTLs to data you don't want to keep forever, shorten TTLs as
>> much as allowed.
>> - Migrating to C*3.0+ and take advantage of the new engine storage
>>
>> C*heers,
>> ---
>> Alain Rodriguez - al...@thelastpickle.com
>> France
>>
>> The Last Pickle - Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>>
>> 2016-04-14 15:41 GMT+02:00 Aiman Parvaiz :
>>
>>> Thanks for the response Alain. I am using STCS and would like to take
>>> some action as we would be hitting 50% disk space pretty soon. Would adding
>>> nodes be the right way to start if I want to get the data per node down
>>> otherwise can you or someone on the list please suggest the right way to go
>>> about it.
>>>
>>> Thanks
>>>
>>> Sent from my iPhone
>>>
>>> On Apr 14, 2016, at 5:17 PM, Alain RODRIGUEZ  wrote:
>>>
>>> Hi,
>>>
>>> I seek advice in data size per node. Each of my node has close to 1 TB
 of data. I am not seeing any issues as of now but wanted to run it by you
 guys if this data size is pushing the limits in any manner and if I should
 be working on reducing data size per node.
>>>
>>>
>>> There is no real limit to the data size other than 50% of the machine
>>> disk space using STCS and 80 % if you are using LCS. Those are 'soft'
>>> limits as it will depend on your biggest sstables size and the number of
>>> concurrent compactions mainly, but to stay away from trouble, it is better
>>> to keep things under control, below the limits mentioned above.
>>>
>>> I will me migrating to incremental repairs shortly and full repair as of
 now takes 20 hr/node. I am not seeing any issues with the nodes for now.

>>>
>>> As you noticed, you need to keep in mind that the larger the dataset is,
>>> the longer operations will take. Repairs but also bootstrap or replace
>>> a node, remove a node, any operation that require to stream data or read
>>> it. Repair time can be mitigated by using incremental repairs indeed.
>>>
>>> I am running a 9 node C* 2.1.12 cluster.

>>>
>>> It should be quite safe to give incremental repair a try as many bugs
>>> have been fixe in this version:
>>>
>>> FIX 2.1.12 - A lot of sstables using range repairs due to anticompaction
>>> - incremental only
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-10422
>>>
>>> FIX 2.1.12 - repair hang when replica is down - incremental only
>>>
>>> https://issues.apache.org/jira/browse/CASSANDRA-10288
>>>
>>> If you are using D

RE: Compaction Error When upgrading from 2.1.9 to 3.0.2

2016-04-14 Thread Anthony Verslues
It was an older upgrade plan so I went ahead and tried to upgrade to 3.0.5 and 
I ran into the same error.

Do you know what would cause this error? Is it something  to do with tombstoned 
or deleted rows?



From: Tyler Hobbs [mailto:ty...@datastax.com]
Sent: Wednesday, April 13, 2016 6:33 PM
To: user@cassandra.apache.org
Subject: Re: Compaction Error When upgrading from 2.1.9 to 3.0.2

Can you open a ticket here with your schema and the stacktrace? 
https://issues.apache.org/jira/browse/CASSANDRA
I'm also curious why you're not upgrading to 3.0.5 instead of 3.0.2.

On Wed, Apr 13, 2016 at 4:37 PM, Anthony Verslues 
mailto:anthony.versl...@mezocliq.com>> wrote:
I got this compaction error when running ‘nodetool upgradesstable –a’ while 
upgrading from 2.1.9 to 3.0.2. According to documentation this upgrade should 
work.

Would upgrading to another intermediate version help?


This is the line number: 
https://github.com/apache/cassandra/blob/cassandra-3.0.2/src/java/org/apache/cassandra/db/LegacyLayout.java#L1124


error: null
-- StackTrace --
java.lang.AssertionError
at 
org.apache.cassandra.db.LegacyLayout$CellGrouper.addCell(LegacyLayout.java:1124)
at 
org.apache.cassandra.db.LegacyLayout$CellGrouper.addAtom(LegacyLayout.java:1099)
at 
org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.readRow(UnfilteredDeserializer.java:444)
at 
org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer$UnfilteredIterator.hasNext(UnfilteredDeserializer.java:423)
at 
org.apache.cassandra.db.UnfilteredDeserializer$OldFormatDeserializer.hasNext(UnfilteredDeserializer.java:289)
at 
org.apache.cassandra.io.sstable.SSTableSimpleIterator$OldFormatIterator.readStaticRow(SSTableSimpleIterator.java:134)
at 
org.apache.cassandra.io.sstable.SSTableIdentityIterator.(SSTableIdentityIterator.java:57)
at 
org.apache.cassandra.io.sstable.format.big.BigTableScanner$KeyScanningIterator$1.initializeIterator(BigTableScanner.java:329)
at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.maybeInit(LazilyInitializedUnfilteredRowIterator.java:48)
at 
org.apache.cassandra.db.rows.LazilyInitializedUnfilteredRowIterator.isReverseOrder(LazilyInitializedUnfilteredRowIterator.java:65)
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:109)
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$1.reduce(UnfilteredPartitionIterators.java:100)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:206)
at 
org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:159)
at 
org.apache.cassandra.utils.AbstractIterator.hasNext(AbstractIterator.java:47)
at 
org.apache.cassandra.db.partitions.UnfilteredPartitionIterators$2.hasNext(UnfilteredPartitionIterators.java:150)
at 
org.apache.cassandra.db.transform.BasePartitions.hasNext(BasePartitions.java:72)
at 
org.apache.cassandra.db.compaction.CompactionIterator.hasNext(CompactionIterator.java:226)
at 
org.apache.cassandra.db.compaction.CompactionTask.runMayThrow(CompactionTask.java:177)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
org.apache.cassandra.db.compaction.CompactionTask.executeInternal(CompactionTask.java:78)
at 
org.apache.cassandra.db.compaction.AbstractCompactionTask.execute(AbstractCompactionTask.java:60)
at 
org.apache.cassandra.db.compaction.CompactionManager$8.runMayThrow(CompactionManager.java:572)
at 
org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:28)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)



--
Tyler Hobbs
DataStax


Re: Most stable version?

2016-04-14 Thread Jean Tremblay
Hi,
Could someone give his opinion on this?
What should be considered more stable, Cassandra 3.0.5 or Cassandra 3.5?

Thank you
Jean

> On 12 Apr,2016, at 07:00, Jean Tremblay  
> wrote:
> 
> Hi,
> Which version of Cassandra should considered most stable in the version 3?
> I see two main branch: the branch with the version 3.0.* and the tick-tock 
> one 3.*.*.
> So basically my question is: which one is most stable, version 3.0.5 or 
> version 3.3?
> I know odd versions in tick-took are bug fix. 
> Thanks
> Jean


Re: Most stable version?

2016-04-14 Thread Jack Krupansky
Normally, since 3.5 just came out, it would be wise to see if people report
any problems over the next few weeks.

But... the new tick-tock release process is designed to assure that these
odd-numbered releases are only incremental bug fixes from the last
even-numbered feature release, which was 3.4. So, 3.5 should be reasonably
stable.

That said, a bug-fix release of 3.0 is probably going to be more stable
than a bug fix release of a more recent feature release (3.4).

Usually it comes down to whether you need any of the new features or
improvements in 3.x, or whether you might want to keep your chosen release
in production for longer than the older 3.0 releases will be in production.

Ultimately, this is a personality test: Are you adventuresome or
conservative?

To be clear, with the new tick-tock release scheme, 3.5 is designed to be a
stable release.

-- Jack Krupansky

On Thu, Apr 14, 2016 at 3:23 PM, Jean Tremblay <
jean.tremb...@zen-innovations.com> wrote:

> Hi,
> Could someone give his opinion on this?
> What should be considered more stable, Cassandra 3.0.5 or Cassandra 3.5?
>
> Thank you
> Jean
>
> > On 12 Apr,2016, at 07:00, Jean Tremblay <
> jean.tremb...@zen-innovations.com> wrote:
> >
> > Hi,
> > Which version of Cassandra should considered most stable in the version
> 3?
> > I see two main branch: the branch with the version 3.0.* and the
> tick-tock one 3.*.*.
> > So basically my question is: which one is most stable, version 3.0.5 or
> version 3.3?
> > I know odd versions in tick-took are bug fix.
> > Thanks
> > Jean
>


Re: Compaction Error When upgrading from 2.1.9 to 3.0.2

2016-04-14 Thread Tyler Hobbs
On Thu, Apr 14, 2016 at 2:08 PM, Anthony Verslues <
anthony.versl...@mezocliq.com> wrote:

> It was an older upgrade plan so I went ahead and tried to upgrade to 3.0.5
> and I ran into the same error.
>

Okay, good to know.  Please include that info in the ticket when you open
it.


>
>
> Do you know what would cause this error? Is it something  to do with
> tombstoned or deleted rows?
>
>
>

I'm not sure, I haven't looked into it too deeply yet.  From the stacktrace
it looks related to reading the static columns of a row.


-- 
Tyler Hobbs
DataStax 


Re: Cassandra Golang Driver and Support

2016-04-14 Thread Yawei Li
Thanks for the info, Bryan!
We are in general assess the support level of GoCQL v.s Java Driver. From
http://gocql.github.io/, looks like it is a WIP (some TODO items, api is
subject to change)? And https://github.com/gocql/gocql suggests the
performance may degrade now and then, and the supported versions are up to
2.2.x? For us maintaining two stacks (Java and Go) may be expensive so I am
checking what's the general strategy folks are using here.

On Wed, Apr 13, 2016 at 11:31 AM, Bryan Cheng  wrote:

> Hi Yawei,
>
> While you're right that there's no first-party driver, we've had good luck
> using gocql (https://github.com/gocql/gocql) in production at moderate
> scale. What features in particular are you looking for that are missing?
>
> --Bryan
>
> On Tue, Apr 12, 2016 at 10:06 PM, Yawei Li  wrote:
>
>> Hi,
>>
>> It looks like to me that DataStax doesn't provide official golang driver
>> yet and the goland client libs are overall lagging behind the Java driver
>> in terms of feature set, supported version and possibly production
>> stability?
>>
>> We are going to support a large number of services  in both Java and Go.
>> if the above impression is largely true, we are considering the option of
>> focusing on Java client and having GoLang program talk to the Java service
>> via RPC for data access. Anyone has tried similar approach?
>>
>> Thanks
>>
>
>


Re: Cassandra Golang Driver and Support

2016-04-14 Thread Dan Kinder
Just want to put a plug in for gocql and the guys who work on it. I use it
for production applications that sustain ~10,000 writes/sec on an 8 node
cluster and in the few times I have seen problems they have been responsive
on issues and pull requests. Once or twice I have seen the API change but
otherwise it has been stable. In general I have found it very intuitive to
use and easy to configure.

On Thu, Apr 14, 2016 at 2:30 PM, Yawei Li  wrote:

> Thanks for the info, Bryan!
> We are in general assess the support level of GoCQL v.s Java Driver. From
> http://gocql.github.io/, looks like it is a WIP (some TODO items, api is
> subject to change)? And https://github.com/gocql/gocql suggests the
> performance may degrade now and then, and the supported versions are up to
> 2.2.x? For us maintaining two stacks (Java and Go) may be expensive so I am
> checking what's the general strategy folks are using here.
>
> On Wed, Apr 13, 2016 at 11:31 AM, Bryan Cheng 
> wrote:
>
>> Hi Yawei,
>>
>> While you're right that there's no first-party driver, we've had good
>> luck using gocql (https://github.com/gocql/gocql) in production at
>> moderate scale. What features in particular are you looking for that are
>> missing?
>>
>> --Bryan
>>
>> On Tue, Apr 12, 2016 at 10:06 PM, Yawei Li  wrote:
>>
>>> Hi,
>>>
>>> It looks like to me that DataStax doesn't provide official golang driver
>>> yet and the goland client libs are overall lagging behind the Java driver
>>> in terms of feature set, supported version and possibly production
>>> stability?
>>>
>>> We are going to support a large number of services  in both Java and Go.
>>> if the above impression is largely true, we are considering the option of
>>> focusing on Java client and having GoLang program talk to the Java service
>>> via RPC for data access. Anyone has tried similar approach?
>>>
>>> Thanks
>>>
>>
>>


Nodetool rebuild and bootstrap

2016-04-14 Thread Anubhav Kale
Hello,

Is it a correct statement that both rebuild and bootstrap resume streaming from 
where they were left off (meaning they don't stream the entire data again) in 
case of node restarting during rebuild / bootstrap process ?

Thanks !


Re: Nodetool rebuild and bootstrap

2016-04-14 Thread Jeff Jirsa
https://issues.apache.org/jira/browse/CASSANDRA-8838

Bootstrap only resumes on 2.2.0 and newer. I’m unsure of rebuild, but I suspect 
it does not resume at all. 


From:  Anubhav Kale
Reply-To:  "user@cassandra.apache.org"
Date:  Thursday, April 14, 2016 at 3:07 PM
To:  "user@cassandra.apache.org"
Subject:  Nodetool rebuild and bootstrap

Hello,

 

Is it a correct statement that both rebuild and bootstrap resume streaming from 
where they were left off (meaning they don’t stream the entire data again) in 
case of node restarting during rebuild / bootstrap process ?

 

Thanks !



smime.p7s
Description: S/MIME cryptographic signature


Fwd: Cassandra Load spike

2016-04-14 Thread kavya
Hi,

We are running a 6 node cassandra 2.2.4 cluster and we are seeing a spike
in the disk Load as per the ‘nodetool status’ command that does not
correspond with the actual disk usage. Load reported by nodetool was as
high as 3 times actual disk usage on certain nodes.
We noticed that the periodic repair failed with below error on running the
command : ’nodetool repair -pr’

ERROR [RepairJobTask:2] 2016-04-12 15:46:29,902 RepairRunnable.java:243 -
Repair session 64b54d50-0100-11e6-b46e-a511fd37b526 for range
(-3814318684016904396,-3810689996127667017] failed with error [….]
Validation failed in /
org.apache.cassandra.exceptions.RepairException: [….] Validation failed in

at
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64)
~[apache-cassandra-2.2.4.jar:2.2.4]
at
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183)
~[apache-cassandra-2.2.4.jar:2.2.4]
at
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:410)
~[apache-cassandra-2.2.4.jar:2.2.4]
at
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163)
~[apache-cassandra-2.2.4.jar:2.2.4]
at
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67)
~[apache-cassandra-2.2.4.jar:2.2.4]
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
[na:1.8.0_40]
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
[na:1.8.0_40]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40

We restarted all nodes in the cluster and ran a full repair which completed
successfully without any validation errors, however we still see Load spike
on the same nodes after a while. Please advice.

Thanks!


RE: Nodetool rebuild and bootstrap

2016-04-14 Thread Anubhav Kale
I confirmed that rebuild doesn’t resume at all. I couldn’t find a JIRA on this. 
Should I open one or can someone explain if there is a design rationale ?

From: Jeff Jirsa [mailto:jeff.ji...@crowdstrike.com]
Sent: Thursday, April 14, 2016 4:01 PM
To: user@cassandra.apache.org
Subject: Re: Nodetool rebuild and bootstrap

https://issues.apache.org/jira/browse/CASSANDRA-8838

Bootstrap only resumes on 2.2.0 and newer. I’m unsure of rebuild, but I suspect 
it does not resume at all.


From: Anubhav Kale
Reply-To: "user@cassandra.apache.org"
Date: Thursday, April 14, 2016 at 3:07 PM
To: "user@cassandra.apache.org"
Subject: Nodetool rebuild and bootstrap

Hello,

Is it a correct statement that both rebuild and bootstrap resume streaming from 
where they were left off (meaning they don’t stream the entire data again) in 
case of node restarting during rebuild / bootstrap process ?

Thanks !


[ANNOUNCE] YCSB 0.8.0 Release

2016-04-14 Thread Chrisjan Matser
On behalf of the development community, I am pleased to announce the
release of YCSB 0.8.0.  With the help of other Cassandra community
developers we are continuing to make enhancements to this binding.  Help in
testing a Cassandra 3 instance would be greatly appreciated for the next
release.


Highlights:

* Amazon S3 improvments including proper closing of the S3Object

* Apache Cassandra improvements including update to DataStax driver 3.0.0,
tested with Cassandra 2.2.5

* Apache HBase10 improvements including synchronization for multi-threading

* Core improvements to address future enhancements

* Elasticsearch improvements including update to 2.3.1 (latest stable
version)

* Orientdb improvements including a readallfields fix

Full release notes, including links to source and convenience binaries:

https://github.com/brianfrankcooper/YCSB/releases/tag/0.8.0

This release covers changes from the last 1 month.