Re: Fwd: Cassandra Load spike

2016-04-15 Thread Jan Kesten

Hi,

you should check the "snapshot" directories on your nodes - it is very 
likely there are some old ones from failed operations taking up some space.


Am 15.04.2016 um 01:28 schrieb kavya:

Hi,

We are running a 6 node cassandra 2.2.4 cluster and we are seeing a 
spike in the disk Load as per the ‘nodetool status’ command that does 
not correspond with the actual disk usage. Load reported by nodetool 
was as high as 3 times actual disk usage on certain nodes.
We noticed that the periodic repair failed with below error on running 
the command : ’nodetool repair -pr’


ERROR [RepairJobTask:2] 2016-04-12 15:46:29,902 
RepairRunnable.java:243 - Repair session 
64b54d50-0100-11e6-b46e-a511fd37b526 for range 
(-3814318684016904396,-3810689996127667017] failed with error [….] 
Validation failed in /
org.apache.cassandra.exceptions.RepairException: [….] Validation 
failed in 
at 
org.apache.cassandra.repair.ValidationTask.treeReceived(ValidationTask.java:64) 
~[apache-cassandra-2.2.4.jar:2.2.4]
at 
org.apache.cassandra.repair.RepairSession.validationComplete(RepairSession.java:183) 
~[apache-cassandra-2.2.4.jar:2.2.4]
at 
org.apache.cassandra.service.ActiveRepairService.handleMessage(ActiveRepairService.java:410) 
~[apache-cassandra-2.2.4.jar:2.2.4]
at 
org.apache.cassandra.repair.RepairMessageVerbHandler.doVerb(RepairMessageVerbHandler.java:163) 
~[apache-cassandra-2.2.4.jar:2.2.4]
at 
org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:67) 
~[apache-cassandra-2.2.4.jar:2.2.4]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_40]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_40]

at java.lang.Thread.run(Thread.java:745) [na:1.8.0_40

We restarted all nodes in the cluster and ran a full repair which 
completed successfully without any validation errors, however we still 
see Load spike on the same nodes after a while. Please advice.


Thanks!





Re: Most stable version?

2016-04-15 Thread Jean Tremblay
Thank you Jack.
Jean
On 14 Apr 2016, at 22:00 , Jack Krupansky 
mailto:jack.krupan...@gmail.com>> wrote:

Normally, since 3.5 just came out, it would be wise to see if people report any 
problems over the next few weeks.

But... the new tick-tock release process is designed to assure that these 
odd-numbered releases are only incremental bug fixes from the last 
even-numbered feature release, which was 3.4. So, 3.5 should be reasonably 
stable.

That said, a bug-fix release of 3.0 is probably going to be more stable than a 
bug fix release of a more recent feature release (3.4).

Usually it comes down to whether you need any of the new features or 
improvements in 3.x, or whether you might want to keep your chosen release in 
production for longer than the older 3.0 releases will be in production.

Ultimately, this is a personality test: Are you adventuresome or conservative?

To be clear, with the new tick-tock release scheme, 3.5 is designed to be a 
stable release.

-- Jack Krupansky

On Thu, Apr 14, 2016 at 3:23 PM, Jean Tremblay 
mailto:jean.tremb...@zen-innovations.com>> 
wrote:
Hi,
Could someone give his opinion on this?
What should be considered more stable, Cassandra 3.0.5 or Cassandra 3.5?

Thank you
Jean

> On 12 Apr,2016, at 07:00, Jean Tremblay 
> mailto:jean.tremb...@zen-innovations.com>> 
> wrote:
>
> Hi,
> Which version of Cassandra should considered most stable in the version 3?
> I see two main branch: the branch with the version 3.0.* and the tick-tock 
> one 3.*.*.
> So basically my question is: which one is most stable, version 3.0.5 or 
> version 3.3?
> I know odd versions in tick-took are bug fix.
> Thanks
> Jean




Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread horschi
Hi Jan,

were you able to resolve your Problem?

We are trying the same and also see a lot of WriteTimeouts:
WriteTimeoutException: Cassandra timeout during write query at consistency
SERIAL (2 replica were required but only 1 acknowledged the write)

How many clients were competing for a lock in your case? In our case its
only two :-(

cheers,
Christian


On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli  wrote:

> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
> jan.algermis...@nordsc.com> wrote:
>
>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
>> for implementing distributed locks.
>>
>
> [ and I'm experiencing the problem described in the subject ... ]
>
>
>> Any idea how to approach this problem?
>>
>
> 1) Upgrade to 2.0.1 release.
> 2) Try to reproduce symptoms.
> 3) If able to, file a JIRA at
> https://issues.apache.org/jira/secure/Dashboard.jspa including repro steps
> 4) Reply to this thread with the JIRA ticket URL
>
> =Rob
>
>
>


Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread Denise Rogers
Also, what type of data were you reading/writing?

Regards,
Denise

Sent from mi iPad

> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
> 
> Hi Jan,
> 
> were you able to resolve your Problem?
> 
> We are trying the same and also see a lot of WriteTimeouts:
> WriteTimeoutException: Cassandra timeout during write query at consistency 
> SERIAL (2 replica were required but only 1 acknowledged the write)
> 
> How many clients were competing for a lock in your case? In our case its only 
> two :-(
> 
> cheers,
> Christian
> 
> 
>> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli  wrote:
>>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen 
>>>  wrote:
>>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot) for 
>>> implementing distributed locks.
>> 
>> [ and I'm experiencing the problem described in the subject ... ]
>>  
>>> Any idea how to approach this problem?
>> 
>> 1) Upgrade to 2.0.1 release.
>> 2) Try to reproduce symptoms.
>> 3) If able to, file a JIRA at 
>> https://issues.apache.org/jira/secure/Dashboard.jspa including repro steps
>> 4) Reply to this thread with the JIRA ticket URL
>> 
>> =Rob
> 


Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread horschi
Hi Denise,

in my case its a small blob I am writing (should be around 100 bytes):

 CREATE TABLE "Lock" (
 lockname varchar,
 id varchar,
 value blob,
 PRIMARY KEY (lockname, id)
 ) WITH COMPACT STORAGE
 AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor',
'chunk_length_kb' : '8' };

You ask because large values are known to cause issues? Anything special
you have in mind?

kind regards,
Christian




On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:

> Also, what type of data were you reading/writing?
>
> Regards,
> Denise
>
> Sent from mi iPad
>
> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
>
> Hi Jan,
>
> were you able to resolve your Problem?
>
> We are trying the same and also see a lot of WriteTimeouts:
> WriteTimeoutException: Cassandra timeout during write query at consistency
> SERIAL (2 replica were required but only 1 acknowledged the write)
>
> How many clients were competing for a lock in your case? In our case its
> only two :-(
>
> cheers,
> Christian
>
>
> On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli 
> wrote:
>
>> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen <
>> jan.algermis...@nordsc.com> wrote:
>>
>>> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot)
>>> for implementing distributed locks.
>>>
>>
>> [ and I'm experiencing the problem described in the subject ... ]
>>
>>
>>> Any idea how to approach this problem?
>>>
>>
>> 1) Upgrade to 2.0.1 release.
>> 2) Try to reproduce symptoms.
>> 3) If able to, file a JIRA at
>> https://issues.apache.org/jira/secure/Dashboard.jspa including repro
>> steps
>> 4) Reply to this thread with the JIRA ticket URL
>>
>> =Rob
>>
>>
>>
>
>


Re: Traffic inconsistent across nodes

2016-04-15 Thread Eric Stevens
Thanks for that, that helps a lot.  The next thing to check might be
whether or not your application actually has access to the other nodes.
With that topology, and assuming all the nodes you included in your
original graph are in the 'WDC' data center, I'd be inclined to look for a
network issue of some kind.

Also, it probably doesn't matter, but what OS / Distribution are you
running the servers and clients on?

Check with netcat or something that you can reach all the configured ports
from your application server, but also the driver itself offers some
introspection into its view of individual connection health.  This is a
little bit ugly, but this is how we include information about connection
status in an API for health monitoring from a Scala application using the
Java driver; hopefully you can use it to see how to access information
about the driver's view of host health from the application's perspective.
Most importantly I'd suggest looking for host.isUp status and
LoadBalancingPolicy.distance(host) to see that it considers all the hosts
in your target datacenter to be LOCAL.

"hosts" -> {
  val hosts: Map[String, Map[String, mutable.Set[Host]]] =
connection.getMetadata
  .getAllHosts.asScala
  .groupBy(_.getDatacenter)
  .mapValues(_.groupBy(_.getRack))
  val lbp: LoadBalancingPolicy =
connection.getConfiguration.getPolicies.getLoadBalancingPolicy
  JsObject(hosts.map { case (dc: String, rackAndHosts) =>
dc -> JsObject(rackAndHosts.map { case (rack: String, hosts:
mutable.Set[Host]) =>
  rack -> JsArray(hosts.map { host =>
Json.obj(
  "address"  -> host.getAddress.toString,
  "socketAddress"-> host.getSocketAddress.toString,
  "cassandraVersion" -> host.getCassandraVersion.toString,
  "isUp" -> host.isUp,
  "hostDistance" -> lbp.distance(host).toString
)
  }.toSeq)
}.toSeq)
  }.toSeq)
},


On Wed, Apr 13, 2016 at 10:50 PM Anishek Agarwal  wrote:

> here is the output:  every node in a single DC is in the same rack.
>
> Datacenter: WDC5
>
> 
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address Load   Tokens  Owns (effective)  Host ID
> Rack
>
> UN  10.125.138.33   299.22 GB  256 64.2%
> 8aaa6015-d444-4551-a3c5-3257536df476  RAC1
>
> UN  10.125.138.125  329.38 GB  256 70.3%
> 70be44a2-de17-41f1-9d3a-6a0be600eedf  RAC1
>
> UN  10.125.138.129  305.11 GB  256 65.5%
> 0fbc7f44-7062-4996-9eba-2a05ae1a7032  RAC1
>
> Datacenter: WDC
>
> ===
>
> Status=Up/Down
>
> |/ State=Normal/Leaving/Joining/Moving
>
> --  Address Load   Tokens  Owns (effective)  Host ID
> Rack
>
> UN  10.124.114.105  151.09 GB  256 38.0%
> c432357d-bf81-4eef-98e1-664c178a3c23  RAC1
>
> UN  10.124.114.110  150.15 GB  256 36.9%
> 6f92d32e-1c64-4145-83d7-265c331ea408  RAC1
>
> UN  10.124.114.108  170.1 GB   256 41.3%
> 040ae7e5-3f1e-4874-8738-45edbf576e12  RAC1
>
> UN  10.124.114.98   165.34 GB  256 37.6%
> cdc69c7d-b9d6-4abd-9388-1cdcd35d946c  RAC1
>
> UN  10.124.114.113  145.22 GB  256 35.7%
> 1557af04-e658-4751-b984-8e0cdc41376e  RAC1
>
> UN  10.125.138.59   162.65 GB  256 38.6%
> 9ba1b7b6-5655-456e-b1a1-6f429750fc96  RAC1
>
> UN  10.124.114.97   164.03 GB  256 36.9%
> c918e497-498e-44c3-ab01-ab5cb4d48b09  RAC1
>
> UN  10.124.114.118  139.62 GB  256 35.1%
> 2bb0c265-a5d4-4cd4-8f50-13b5a9a891c9  RAC1
>
> On Thu, Apr 14, 2016 at 4:48 AM, Eric Stevens  wrote:
>
>> The output of nodetool status would really help answer some questions.  I
>> take it the 8 hosts in your graph are in the same DC.  Are the four serving
>> writes in the same logical or physical rack (as Cassandra sees it), while
>> the others are not?
>>
>> On Tue, Apr 12, 2016 at 10:48 PM Anishek Agarwal 
>> wrote:
>>
>>> We have two DC one with the above 8 nodes and other with 3 nodes.
>>>
>>>
>>>
>>> On Tue, Apr 12, 2016 at 8:06 PM, Eric Stevens  wrote:
>>>
 Maybe include nodetool status here?  Are the four nodes serving reads
 in one DC (local to your driver's config) while the others are in another?

 On Tue, Apr 12, 2016, 1:01 AM Anishek Agarwal 
 wrote:

> hello,
>
> we have 8 nodes in one cluster and attached is the traffic patterns
> across the nodes.
>
> its very surprising that only 4 nodes show transmitting (purple)
> packets.
>
> our driver configuration on clients has the following load balancing
> configuration  :
>
> new TokenAwarePolicy(
> new 
> DCAwareRoundRobinPolicy(configuration.get(Constants.LOCAL_DATA_CENTRE_NAME,
>  "WDC")),
> true)
>
>
> any idea what is that we are missing which is leading to this skewed
> data read patterns
>
> cassandra drivers as below:
>
> 
> com.datastax.cassandra
> cassandra-driver-core
> 2.1.6
>>>

Re: All subsequent CAS requests time out after heavy use of new CAS feature

2016-04-15 Thread Denise Rogers
My thinking was that due to the size of the data that there maybe I/O issues. 
But it sounds more like you're competing for locks and hit a deadlock issue. 

Regards,
Denise
Cell - (860)989-3431

Sent from mi iPhone

> On Apr 15, 2016, at 9:00 AM, horschi  wrote:
> 
> Hi Denise,
> 
> in my case its a small blob I am writing (should be around 100 bytes):
> 
>  CREATE TABLE "Lock" (
>  lockname varchar,
>  id varchar,
>  value blob,
>  PRIMARY KEY (lockname, id)
>  ) WITH COMPACT STORAGE 
>  AND COMPRESSION = { 'sstable_compression' : 'SnappyCompressor', 
> 'chunk_length_kb' : '8' };
> 
> You ask because large values are known to cause issues? Anything special you 
> have in mind?
> 
> kind regards,
> Christian
> 
> 
> 
> 
>> On Fri, Apr 15, 2016 at 2:42 PM, Denise Rogers  wrote:
>> Also, what type of data were you reading/writing?
>> 
>> Regards,
>> Denise
>> 
>> Sent from mi iPad
>> 
>>> On Apr 15, 2016, at 8:29 AM, horschi  wrote:
>>> 
>>> Hi Jan,
>>> 
>>> were you able to resolve your Problem?
>>> 
>>> We are trying the same and also see a lot of WriteTimeouts:
>>> WriteTimeoutException: Cassandra timeout during write query at consistency 
>>> SERIAL (2 replica were required but only 1 acknowledged the write)
>>> 
>>> How many clients were competing for a lock in your case? In our case its 
>>> only two :-(
>>> 
>>> cheers,
>>> Christian
>>> 
>>> 
 On Tue, Sep 24, 2013 at 12:18 AM, Robert Coli  wrote:
> On Mon, Sep 16, 2013 at 9:09 AM, Jan Algermissen 
>  wrote:
> I am experimenting with C* 2.0 ( and today's java-driver 2.0 snapshot) 
> for implementing distributed locks.
 
 [ and I'm experiencing the problem described in the subject ... ]
  
> Any idea how to approach this problem?
 
 1) Upgrade to 2.0.1 release.
 2) Try to reproduce symptoms.
 3) If able to, file a JIRA at 
 https://issues.apache.org/jira/secure/Dashboard.jspa including repro steps
 4) Reply to this thread with the JIRA ticket URL
 
 =Rob
> 


Re: Experience with Kubernetes

2016-04-15 Thread David Aronchick
Hi--

It's trivial to do this in Kubernetes, even without Ubernetes. Please feel
free to send me a note and I'll walk you through it.

Disclosure: I work on Google on Kubernetes.

On Thu, Apr 14, 2016 at 9:10 AM Joe Stein  wrote:

> You can do that with the Mesos scheduler
> https://github.com/elodina/datastax-enterprise-mesos and layout clusters
> and racks for datacenters based on attributes
> http://mesos.apache.org/documentation/latest/attributes-resources/
>
> ~ Joestein
> On Apr 14, 2016 12:05 PM, "Nate McCall"  wrote:
>
>>
>> Does anybody here have any experience, positive or negative, with
>>> deploying Cassandra (or DSE) clusters using Kubernetes? I don't have any
>>> immediate need (or experience), but I am curious about the pros and cons.
>>>
>>>
>>
>> The last time I played around with kubernetes+cassandra, you could not
>> specify node allocations across failure boundaries (AZs, Regions, etc).
>>
>> To me, that makes it not interesting outside of development or trivial
>> setups.
>>
>> It does look like they are getting farther along on "ubernetes" which
>> should fix this:
>>
>> https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/federation.md
>>
>>
>>
>> --
>> -
>> Nate McCall
>> Austin, TX
>> @zznate
>>
>> Co-Founder & Sr. Technical Consultant
>> Apache Cassandra Consulting
>> http://www.thelastpickle.com
>>
>


Re: Experience with Kubernetes

2016-04-15 Thread Jack Krupansky
It sounds as if Kubernetes is oriented towards a single data center, DC in
Cassandra parlance, and maybe Ubernetes attempts to address that. If there
is indeed some simple, obvious, way to trick Kubernetes into
Cassandra-style multi-DC operation, it sure would be helpful if it was
documented more obviously.

Also, it seems like a lot of these cluster tools are oriented towards
applications that in turn use networked services such as Cassandra for
storage of the vast bulk of their data. IOW, they persist their real data
in an external database such as Cassandra, and that any other persisted
data or state is relatively inconsequential. How that model applies to a
database server that works best with fairly large amounts of ultra-fast
local data storage is not so obvious. Maybe that simply wasn't a design
goal?

-- Jack Krupansky

On Fri, Apr 15, 2016 at 3:48 PM, David Aronchick 
wrote:

> Hi--
>
> It's trivial to do this in Kubernetes, even without Ubernetes. Please feel
> free to send me a note and I'll walk you through it.
>
> Disclosure: I work on Google on Kubernetes.
>
> On Thu, Apr 14, 2016 at 9:10 AM Joe Stein  wrote:
>
>> You can do that with the Mesos scheduler
>> https://github.com/elodina/datastax-enterprise-mesos and layout clusters
>> and racks for datacenters based on attributes
>> http://mesos.apache.org/documentation/latest/attributes-resources/
>>
>> ~ Joestein
>> On Apr 14, 2016 12:05 PM, "Nate McCall"  wrote:
>>
>>>
>>> Does anybody here have any experience, positive or negative, with
 deploying Cassandra (or DSE) clusters using Kubernetes? I don't have any
 immediate need (or experience), but I am curious about the pros and cons.


>>>
>>> The last time I played around with kubernetes+cassandra, you could not
>>> specify node allocations across failure boundaries (AZs, Regions, etc).
>>>
>>> To me, that makes it not interesting outside of development or trivial
>>> setups.
>>>
>>> It does look like they are getting farther along on "ubernetes" which
>>> should fix this:
>>>
>>> https://github.com/kubernetes/kubernetes/blob/master/docs/proposals/federation.md
>>>
>>>
>>>
>>> --
>>> -
>>> Nate McCall
>>> Austin, TX
>>> @zznate
>>>
>>> Co-Founder & Sr. Technical Consultant
>>> Apache Cassandra Consulting
>>> http://www.thelastpickle.com
>>>
>>