Re: Understanding "nodetool netstats" on a multi region cluster

2020-04-17 Thread Erick Ramirez
The node which shows that bootstrap session from January 31 won't go away
until you restart it. But you don't need to restart unless it's affecting
the node's operation. Cheers!

>


Re: Disabling Swap for Cassandra

2020-04-17 Thread Alex Ott
I usually recommend following document:
https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/config/configRecommendedSettings.html
- it's about DSE, but applicable to OSS Cassandra as well...

Kunal  at "Thu, 16 Apr 2020 15:49:35 -0700" wrote:
 K> Hello,

 K>  

 K> I need some suggestion from you all. I am new to Cassandra and was reading 
Cassandra best practices. On one document, it was
 K> mentioned that Cassandra should not be using swap, it degrades the 
performance.

 K> My question is instead of disabling swap system wide, can we force 
Cassandra not to use swap? Some documentation suggests to use
 K> memory_locking_policy in cassandra.yaml.

 K> How do I check if our Cassandra already has this parameter and still uses 
swap ? Is there any way i can check this. I already
 K> checked cassandra.yaml and dont see this parameter. Is there any other 
place i can check and confirm?

 K> Also, Can I set memlock parameter to unlimited (64kB default), so entire 
Heap (Xms = Xmx) can be locked at node startup ? Will that
 K> help?

 K> Or if you have any other suggestions, please let me know.

 K>  

 K>  

 K> Regards,

 K> Kunal

 K>  



-- 
With best wishes,Alex Ott
Principal Architect, DataStax
http://datastax.com/

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org



Re: Disabling Swap for Cassandra

2020-04-17 Thread Reid Pinchback
I think there is some potential yak shaving to worrying excessively about swap. 
The reality is that you should know the memory demands of what you are running 
on your C* nodes and have things configured so that significant swap would be a 
highly abnormal situation.  

I'd expect to see excessive churn on buffer cache long before I'd see excessive 
swap kicking in, but sometimes a little swap usage doesn't mean much beyond the 
O/S detecting that some memory allocation is so stale that it may as well push 
it out of the way.  This can happen for perfectly reasonable situations if, for 
example, you make heavy use of crond for automating system maintenance.  Also, 
if you are running on Dell boxes, Dell software updates can get a bit cranky 
and you see resource locking that has zilch to do with your application stack.

I'd worry less about how to crank down swap beyond the advice to make it a last 
resort, and more about how to monitor and alert on abnormal system behavior.  
When it's abnormal, you want a chance to see what is going on so you can fix 
it.  OOM'ing problems out of visibility makes it hard to investigate root 
causes.  I'd rather be paged while the cause is visible, than be paged anyways 
for the down node and have nothing to inspect.

R


On 4/17/20, 6:12 AM, "Alex Ott"  wrote:

 Message from External Sender

I usually recommend following document:

https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.datastax.com_en_dse_5.1_dse-2Ddev_datastax-5Fenterprise_config_configRecommendedSettings.html&d=DwIFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=TQPtBiV2cZow-OW1xEFTxqIlaA6VWPwM9PbMdScHIGw&s=_YR4k-l76UU-LxTd7WCtHAV6_LdRP2qzNiBAD1dAzdU&e=
 
- it's about DSE, but applicable to OSS Cassandra as well...

Kunal  at "Thu, 16 Apr 2020 15:49:35 -0700" wrote:
 K> Hello,

 K>  

 K> I need some suggestion from you all. I am new to Cassandra and was 
reading Cassandra best practices. On one document, it was
 K> mentioned that Cassandra should not be using swap, it degrades the 
performance.

 K> My question is instead of disabling swap system wide, can we force 
Cassandra not to use swap? Some documentation suggests to use
 K> memory_locking_policy in cassandra.yaml.

 K> How do I check if our Cassandra already has this parameter and still 
uses swap ? Is there any way i can check this. I already
 K> checked cassandra.yaml and dont see this parameter. Is there any other 
place i can check and confirm?

 K> Also, Can I set memlock parameter to unlimited (64kB default), so 
entire Heap (Xms = Xmx) can be locked at node startup ? Will that
 K> help?

 K> Or if you have any other suggestions, please let me know.

 K>  

 K>  

 K> Regards,

 K> Kunal

 K>  



-- 
With best wishes,Alex Ott
Principal Architect, DataStax

https://urldefense.proofpoint.com/v2/url?u=http-3A__datastax.com_&d=DwIFaQ&c=9Hv6XPedRSA-5PSECC38X80c1h60_XWA4z1k_R1pROA&r=OIgB3poYhzp3_A7WgD7iBCnsJaYmspOa2okNpf6uqWc&m=TQPtBiV2cZow-OW1xEFTxqIlaA6VWPwM9PbMdScHIGw&s=ddXQN2wa2-ikDaE4LFM7Z-g-V369ObwXmt6_BeWRXPU&e=
 

-
To unsubscribe, e-mail: user-unsubscr...@cassandra.apache.org
For additional commands, e-mail: user-h...@cassandra.apache.org





Re: Cassandra node JVM hang during node repair a table with materialized view

2020-04-17 Thread Reid Pinchback
I would pay attention to the dirty background writer activity at the O/S level. 
 If you see that it isn’t keeping up with flushing changes to disk, then you’ll 
be in an even worse situation as you increase the JVM heap size, because that 
will be done at the cost of the size of available buffer cache.  When Linux 
can’t flush to disk, it can manifest as malloc failures (although if your C* is 
configured to have the JVM pre-touch all memory allocations, that shouldn’t 
happen… I don’t know if C* versions as old as yours do that, current ones 
definitely are configured that way).

If you get stuck, you may want to consider upgrading to something recent in the 
3.11 versions, 3.11.5 or newer.  A setting for controlling merkle-tree height 
was back-ported from the work on C* version 4, and that lets you tune some of 
the memory pressure on repairs, trading memory-related performance for 
network-related performance.  Networks are faster these days, it can be a 
reasonable tradeoff to consider. We used to periodically knock over C* nodes 
during repairs, until we incorporated a patch for that issue.

From: Ben G 
Reply-To: "user@cassandra.apache.org" 
Date: Thursday, April 16, 2020 at 3:32 AM
To: "user@cassandra.apache.org" 
Subject: Re: Cassandra node JVM hang during node repair a table with 
materialized view

Message from External Sender
Thanks a lot. We are working on removing views and control the partition size.  
I hope the improvements help us

Best regards

Gb

Erick Ramirez mailto:erick.rami...@datastax.com>> 
于2020年4月16日周四 下午2:08写道:
GC collector is G1.  I ever repair the node after scale up. The JVM issue 
reproduced.  Can I increase the heap to 40 GB on a 64GB VM?

I wouldn't recommend going beyond 31GB on G1. It will be diminishing returns as 
I mentioned before.

Do you think the issue is related to materialized view or big partition?

Yes, materialised views are problematic and I don't recommend them for 
production since they're still experimental. But if I were to guess, I'd say 
your problem is more an issue with large partitions and too many tombstones 
both putting pressure on the heap.

The thing is if you can't bootstrap because you're running into the 
TombstoneOverwhelmException (I'm guessing), I can't see how you wouldn't run 
into it with repairs. In any case, try running repairs on the smaller tables 
first and work on the remaining tables one-by-one. But bootstrapping a node 
with repairs is a very expensive exercise than just plain old bootstrap. I get 
that you're in a tough spot right now so good luck!


--

Thanks
Guo Bin


Re: Understanding "nodetool netstats" on a multi region cluster

2020-04-17 Thread Jai Bheemsen Rao Dhanwada
This is not causing any issues and with the restart it goes away. But I
would like understand why this is happening only on certain DC but not all
DC.
I am depending on this output to find whether a node is doing any streams
or not.

On Fri, Apr 17, 2020 at 2:37 AM Erick Ramirez 
wrote:

> The node which shows that bootstrap session from January 31 won't go away
> until you restart it. But you don't need to restart unless it's affecting
> the node's operation. Cheers!
>
>>


Re: Multi DC replication between different Cassandra versions

2020-04-17 Thread Elliott Sims
If you're upgrading the whole cluster, I'd recommend going ahead and
upgrading all the way to 3.11.6 if possible.  In my experience it's been
noticeably faster, more reliable, and easier to manage compared to 3.0.x.

On Thu, Apr 16, 2020 at 6:37 PM Ashika Umagiliya 
wrote:

> Thank you for the clarifications,
>
> If this is not recommended, our last resort is to upgrade the entire
> cluster.
>
> About Kafka Connect, we sound following Source Connectors which can be
> used to Ingest data from C* to Kafka .
>
> https://debezium.io/documentation/reference/connectors/cassandra.html
> https://docs.lenses.io/2.0/connectors/source/cassandra-cdc.html
> https://docs.lenses.io/2.0/connectors/source/cassandra.html
>
> https://www.datastax.com/press-release/datastax-announces-change-data-capture-cdc-connector-apache-kafka
>
>
>
>
> On Thu, Apr 16, 2020 at 9:42 PM Durity, Sean R <
> sean_r_dur...@homedepot.com> wrote:
>
>> I agree – do not aim for a mixed version as normal. Mixed versions are
>> fine during an upgrade process, but the goal is to complete the upgrade as
>> soon as possible.
>>
>>
>>
>> As for other parts of your plan, the Kafka Connector is a “sink-only,”
>> which means that it can only insert into Cassandra. It doesn’t go the other
>> way.
>>
>>
>>
>> I usually suggest that if the data is needed in two (or more) places,
>> that the application write to a queue. Then, let the queue feed all the
>> downstream destinations.
>>
>>
>>
>>
>>
>> Sean Durity – Staff Systems Engineer, Cassandra
>>
>>
>>
>> *From:* Christopher Bradford 
>> *Sent:* Thursday, April 16, 2020 1:13 AM
>> *To:* user@cassandra.apache.org
>> *Subject:* [EXTERNAL] Re: Multi DC replication between different
>> Cassandra versions
>>
>>
>>
>> It’s worth noting there can be issues with streaming between different
>> versions of C*. Note this excerpt from
>>
>> https://thelastpickle.com/blog/2019/02/26/data-center-switch.html
>> [thelastpickle.com]
>> 
>>
>>
>>
>>
>> Note that with an upgrade it’s important to keep in mind that *streaming
>> in a cluster running mixed versions of Casandra is not recommended*
>>
>>
>>
>> Emphasis mine. With the approach you’re suggesting streaming would be
>> involved both during bootstrap and repair. Would it be possible to upgrade
>> to a more recent release prior to pursuing this course of action?
>>
>>
>>
>> On Thu, Apr 16, 2020 at 1:02 AM Erick Ramirez 
>> wrote:
>>
>> I don't mean any disrespect but let me offer you a friendly advice --
>> don't do it to yourself. I think you would have a very hard time finding
>> someone who would recommend implementing a solution that involves mixed
>> versions. If you run into issues, it would be hell trying to unscramble
>> that egg.
>>
>>
>>
>> On top of that, Cassandra 3.0.9 is an ancient version released 4 years
>> ago (September 2016). There are several pages of fixes deployed since then.
>> So in the nicest possible way, what you're planning to do is not a good
>> idea. I personally wouldn't do it. Cheers!
>>
>> --
>>
>>
>> Christopher Bradford
>>
>>
>>
>> --
>>
>> The information in this Internet Email is confidential and may be legally
>> privileged. It is intended solely for the addressee. Access to this Email
>> by anyone else is unauthorized. If you are not the intended recipient, any
>> disclosure, copying, distribution or any action taken or omitted to be
>> taken in reliance on it, is prohibited and may be unlawful. When addressed
>> to our clients any opinions or advice contained in this Email are subject
>> to the terms and conditions expressed in any applicable governing The Home
>> Depot terms of business or client engagement letter. The Home Depot
>> disclaims all responsibility and liability for the accuracy and content of
>> this attachment and for any damages or losses arising from any
>> inaccuracies, errors, viruses, e.g., worms, trojan horses, etc., or other
>> items of a destructive nature, which may be contained in this attachment
>> and shall not be liable for direct, indirect, consequential or special
>> damages in connection with this e-mail message or its attachment.
>>
>


Re: Multi DC replication between different Cassandra versions

2020-04-17 Thread Erick Ramirez
>
> If you're upgrading the whole cluster, I'd recommend going ahead and
> upgrading all the way to 3.11.6 if possible.  In my experience it's been
> noticeably faster, more reliable, and easier to manage compared to 3.0.x.
>

Thanks, Elliott. That's really good to know. 👍


Impact of setting low value for flag -XX:MaxDirectMemorySize

2020-04-17 Thread manish khandelwal
What will be the impact of setting the value of  XX:MaxDirectMemorySize to
some low value.  Currently default value for off heap is equal to heap
memory.

I saw this open ticket discussing this but could not infer much from it.
https://issues.apache.org/jira/browse/CASSANDRA-10930

Regards
Manish


Re: Impact of setting low value for flag -XX:MaxDirectMemorySize

2020-04-17 Thread Erick Ramirez
Like most things, it depends on (a) what you're allowing and (b) how much
your nodes require. MaxDirectMemorySize is the upper-bound for off-heap
memory used for the direct byte buffer. C* uses it for Netty so if your
nodes are busy servicing requests, they'd have more IO threads consuming
memory.

During low traffic periods, there's less memory allocated to service
requests and they eventually get freed up by GC tasks. But if traffic
volumes are high, memory doesn't get freed up quick enough so the max is
reached. When this happens, you'll see OOMs like "OutOfMemoryError: Direct
buffer memory" show up in the logs.

You can play around with different values but make sure you test it
exhaustively before trying it out in production. Cheers!

GOT QUESTIONS? Apache Cassandra experts from the community and DataStax
have answers! Share your expertise on https://community.datastax.com/.

>


Re: Impact of setting low value for flag -XX:MaxDirectMemorySize

2020-04-17 Thread HImanshu Sharma
>From the codebase as much I understood, if once a buffer is being
allocated, then it is not freed and added to a recyclable pool. When a new
request comes effort is made to fetch memory from recyclable pool and if is
not available new allocation request is made. And while allocating a new
request if memory limit is breached then we get this oom error.

I would like to know is my understanding correct
If what I am thinking is correct, is there way we can get this buffer pool
reduced when there is low traffic because what I have observed in my system
this memory remains static even if there is no traffic.

Regards
Manish

On Sat, Apr 18, 2020 at 11:13 AM Erick Ramirez 
wrote:

> Like most things, it depends on (a) what you're allowing and (b) how much
> your nodes require. MaxDirectMemorySize is the upper-bound for off-heap
> memory used for the direct byte buffer. C* uses it for Netty so if your
> nodes are busy servicing requests, they'd have more IO threads consuming
> memory.
>
> During low traffic periods, there's less memory allocated to service
> requests and they eventually get freed up by GC tasks. But if traffic
> volumes are high, memory doesn't get freed up quick enough so the max is
> reached. When this happens, you'll see OOMs like "OutOfMemoryError:
> Direct buffer memory" show up in the logs.
>
> You can play around with different values but make sure you test it
> exhaustively before trying it out in production. Cheers!
>
> GOT QUESTIONS? Apache Cassandra experts from the community and DataStax
> have answers! Share your expertise on https://community.datastax.com/.
>
>>