Re: Issue in internode encryption in cassandra

2016-08-03 Thread Ashwini Mhatre (asmhatre)
Hi, Is any one have any hint regarding node to node encryption . Regards, Ashwini Mhatre From: asmhatre mailto:asmha...@cisco.com>> Reply-To: "user@cassandra.apache.org" mailto:user@cassandra.apache.org>> Date: Monday, 25 July 2016 at 4:15 PM To: "user@cassandr

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Ben Slater
Yes, looks like you have a (at least one) 100MB partition which is big enough to cause issues. When you do lots of writes to the large partition it is likely to end up getting compacted (as per the log) and compactions often use a lot of memory / cause a lot of GC when they hit large partitions. Th

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread DuyHai Doan
On a side node, do you monitor your disk I/O to see whether the disk bandwidth can catch up with the huge spikes in write ? Use dstat during the insert storm to see if you have big values for CPU wait On Wed, Aug 3, 2016 at 12:41 PM, Ben Slater wrote: > Yes, looks like you have a (at least one)

unreachable nodes mystery in describecluster output

2016-08-03 Thread Aleksandr Ivanov
Hello, I'm running v3.0.8 in multi-data center deployment (6 DCs, 6 nodes per DC, maximum latency between some nodes ~200ms). After clean cluster start I run into issue when "nodetool descibecluster" shows that some random nodes from deployment are UNREACHABLE however in "nodetool status" or "node

RE: Issue in internode encryption in cassandra

2016-08-03 Thread Bastien DINE
Hi Ashwini, On all my nodes, I’m installing the additional jce policy https://support.datastax.com/hc/en-us/articles/204226129-Receiving-error-Caused-by-java-lang-IllegalArgumentException-Cannot-support-TLS-RSA-WITH-AES-256-CBC-SHA-with-currently-installed-providers-on-DSE-startup-after-setting-up

Re: unreachable nodes mystery in describecluster output

2016-08-03 Thread Romain Hardouin
Hi, The latency is high... Regarding the ALTER, did you try to increase the timeout with "cqlsh --request-timeout=REQUEST_TIMEOUT"? Because the default is 10 seconds. Apart the unreachable nodes, do you know if all nodes have the same schema version?  Best, Romain

Re: unreachable nodes mystery in describecluster output

2016-08-03 Thread Aleksandr Ivanov
> > The latency is high... > It is but is it really causing the problem? Latency is high but constant and not higher than ~200ms. Regarding the ALTER, did you try to increase the timeout with "cqlsh > --request-timeout=REQUEST_TIMEOUT"? Because the default is 10 seconds. > I use 25sec timeout (--r

Re: unreachable nodes mystery in describecluster output

2016-08-03 Thread Romain Hardouin
That's a good news if describecluster shows the same version on each node. Try with a high timeout like 120 seconds to see if it works. Is there a VPN between DCs? Is there room for improvement at the network level? TCP tuning, etc. I'm not saying you won't have unreachable nodes but it's worth

Re: unreachable nodes mystery in describecluster output

2016-08-03 Thread Aleksandr Ivanov
No VPN involved and no limitations, which can affects internode communication, on network level. I'm curious why "nodetool status" shows that all is OK and no suspicious messages in log file is such problem exists. I'm looking for hints how to troubleshoot such problem or maybe anyone have seen suc

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Kevin Burton
Curious why the 2.2 to 3.x upgrade path is risky at best. Do you mean that this is just for OUR use case since we're having some issues or that the upgrade path is risky in general? On Wed, Aug 3, 2016 at 3:41 AM, Ben Slater wrote: > Yes, looks like you have a (at least one) 100MB partition whic

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Kevin Burton
DuyHai. Yes. We're generally happy with our disk throughput. We're on all SSD and have about 60 boxes. The amount of data written isn't THAT much. Maybe 5GB max... but its over 60 boxes. On Wed, Aug 3, 2016 at 3:49 AM, DuyHai Doan wrote: > On a side node, do you monitor your disk I/O to s

Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Romain Hardouin
> Curious why the 2.2 to 3.x upgrade path is risky at best. I guess that >upgrade from 2.2 is less tested by DataStax QA because DSE4 used C* 2.1, not >2.2.I would say the safest upgrade is 2.1 to 3.0.x Best, Romain

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Reynald Bourtembourg
Hi, Maybe Ben was referring to this issue which has been mentioned recently on this mailing list: https://issues.apache.org/jira/browse/CASSANDRA-11887 Cheers, Reynald On 03/08/2016 18:09, Romain Hardouin wrote: >Curious why the 2.2 to 3.x upgrade path is risky at best. I guess that upgrade

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Jonathan Haddad
Kevin, "Our scheme uses large buckets of content where we write to a bucket/partition for 5 minutes, then move to a new one." Are you writing to a single partition and only that partition for 5 minutes? If so, you should really rethink your data model. This method does not scale as you add node

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Kevin Burton
We usually use 100 per every 5 minutes.. but you're right. We might actually move this use case over to using Elasticsearch in the next couple of weeks. On Wed, Aug 3, 2016 at 11:09 AM, Jonathan Haddad wrote: > Kevin, > > "Our scheme uses large buckets of content where we write to a > bucket/pa

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Henrik Schröder
Have you tried using the G1 garbage collector instead of CMS? We had the same issues that things were normally fine, but as soon as something extraordinary happened, a node could go into GC hell and never recover, and that could then spread to other nodes as they took up the slack, trapping them i

Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Kevin Burton
It seems these are basically impossible to track down. https://support.datastax.com/hc/en-us/articles/207267063-Mutation-of-x-bytes-is-too-large-for-the-maxiumum-size-of-y- has some information but their work around is to increase the transaction log. There's no way to find out WHAT client or wh

Re: [Marketing Mail] Re: Memory leak and lockup on our 2.2.7 Cassandra cluster.

2016-08-03 Thread Ben Slater
Yep, that was what I was referring to. On Thu, 4 Aug 2016 2:24 am Reynald Bourtembourg < reynald.bourtembo...@esrf.fr> wrote: > Hi, > > Maybe Ben was referring to this issue which has been mentioned recently on > this mailing list: > https://issues.apache.org/jira/browse/CASSANDRA-11887 > > Chee

Re: Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Ryan Svihla
Made a Jira about it already https://issues.apache.org/jira/plugins/servlet/mobile#issue/CASSANDRA-12231 Regards, Ryan Svihla > On Aug 3, 2016, at 2:58 PM, Kevin Burton wrote: > > It seems these are basically impossible to track down. > > https://support.datastax.com/hc/en-us/articles/2072

Re: Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Jonathan Haddad
I haven't verified, so i'm not 100% certain, but I believe you'd get back an exception to the client. Yes, this belongs in the DB, but I don't think you're totally blind to what went wrong. My guess is this exception in the Python driver (but other drivers should have a similar exception): https:

Re: Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Ryan Svihla
Where I see this a lot is: 1. DBA notices it in logs 2. Everyone says code works fine no errors 3. Weeks of combing all apps find out 3 teams are doing fire and forget futures... 4. Convince each team they really need to handle futures 5. Couple months before you figure out who was the culprit by

Re: Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Ryan Svihla
On a related note, I still need to file a Jira just to make it easier to find large cells in general, I've had 2 customers now with a bunch of 10mb+ writes (single cell) they weren't expecting and tracking that down is equally challenging (Spark in both cases made it doable, but slow to find).

Re: Mutation of X bytes is too large for the maximum size of Y

2016-08-03 Thread Kevin Burton
Yes.. Logging it is far far far far better. I think a lot of devs don't have experience working in actual production environments. YES the client should probably handle it, but WHICH client. This is why you log things. Log the statement that was aborted (at least the first 100 bytes), On Wed, A