Re: upgrading 2.1.x cluster with ec2multiregionsnitch system.peers "corruption"

2019-03-27 Thread Oleksandr Shulgin
On Tue, Mar 26, 2019 at 10:28 PM Carl Mueller
 wrote:

> - the AWS people say EIPs are a PITA.
>

Why?


> - if we hardcode the global IPs in the yaml, then yaml editing is required
> for the occaisional hard instance reboot in aws and its attendant global ip
> reassignment
> - if we try leaving broadcast_rpc_address blank, null , or commented out
> with rpc_address set to 0.0.0.0 then cassandra refuses to start
>

Yeah, that's not nice.

- if we take out rpc_address and broadcast_rpc_address, then cqlsh doesn't
> work with localhost anymore and that fucks up some of our cluster
> managemetn tooling
>
> - we kind of are being lazy and just want what worked in 2.1 to work in 2.2
>

Makes total sense to me.

I'll try to track down where cassandra startup is complaining to us about
> rpc_address: 0.0.0.0 and broadcast_rpc_address being blank/null/commented
> out. That section of code may need an exception for EC2MRS.
>

It sounds like this check is done before instantiating the snitch and it
should be other way round, so that the snitch can have a chance to adjust
the configuration before it's checked for correctness.  Do you have the
exact error message with which it complains?

--
Alex


RE: TWCS Compactions & Tombstones

2019-03-27 Thread Nick Hatfield
Awesome, thank you Jeff. Sorry I had not seen this yet. So we have this 
enabled, I guess it will just take time to finally chew through it all?

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Tuesday, March 26, 2019 9:41 PM
To: user@cassandra.apache.org
Subject: Re: TWCS Compactions & Tombstones


Or Upgrade to a version with 
https://issues.apache.org/jira/browse/CASSANDRA-13418 and enable that feature

--
Jeff Jirsa


On Mar 26, 2019, at 6:23 PM, Rahul Singh 
mailto:rahul.xavier.si...@gmail.com>> wrote:
What's your timewindow? Roughly how much data is in each window?

If you examine the sstable data and see that is truly old data with little 
chance that it has any new data, you can just remove the SStables. You can do a 
rolling restart -- take down a node, remove mc-254400-* and then start it up.


rahul.xavier.si...@gmail.com

http://cassandra.link



On Tue, Mar 26, 2019 at 8:01 AM Nick Hatfield 
mailto:nick.hatfi...@metricly.com>> wrote:
How does one properly rid of sstables that have fallen victim to overlapping 
timestamps? I realized that we had TWCS set in our CF which also had a 
read_repair = 0.1 and after correcting this to 0.0 I can clearly see the 
affects over time on the new sstables. However, I still have old sstables that 
date back some time last year, and I need to remove them:

Max: 09/05/2018 Min: 09/04/2018 Estimated droppable tombstones: 
0.883205790993204613G Mar 26 11:34 mc-254400-big-Data.db


What is the best way to do this? This is on a production system so any help 
would be greatly appreciated.

Thanks,


Re: upgrading 2.1.x cluster with ec2multiregionsnitch system.peers "corruption"

2019-03-27 Thread Carl Mueller
I'll try to get a replicated error message, but it was along the lines of
what is in the gossip strategy agnostic description in cassandra.yaml
comments of what happens when you set rpc_address to 0.0.0.0: you must then
set broadcast_rpc_address.

On Wed, Mar 27, 2019 at 3:21 AM Oleksandr Shulgin <
oleksandr.shul...@zalando.de> wrote:

> On Tue, Mar 26, 2019 at 10:28 PM Carl Mueller
>  wrote:
>
>> - the AWS people say EIPs are a PITA.
>>
>
> Why?
>
>
>> - if we hardcode the global IPs in the yaml, then yaml editing is
>> required for the occaisional hard instance reboot in aws and its attendant
>> global ip reassignment
>> - if we try leaving broadcast_rpc_address blank, null , or commented out
>> with rpc_address set to 0.0.0.0 then cassandra refuses to start
>>
>
> Yeah, that's not nice.
>
> - if we take out rpc_address and broadcast_rpc_address, then cqlsh doesn't
>> work with localhost anymore and that fucks up some of our cluster
>> managemetn tooling
>>
>> - we kind of are being lazy and just want what worked in 2.1 to work in
>> 2.2
>>
>
> Makes total sense to me.
>
> I'll try to track down where cassandra startup is complaining to us about
>> rpc_address: 0.0.0.0 and broadcast_rpc_address being blank/null/commented
>> out. That section of code may need an exception for EC2MRS.
>>
>
> It sounds like this check is done before instantiating the snitch and it
> should be other way round, so that the snitch can have a chance to adjust
> the configuration before it's checked for correctness.  Do you have the
> exact error message with which it complains?
>
> --
> Alex
>
>


Re: TWCS Compactions & Tombstones

2019-03-27 Thread Jeff Jirsa
You would need to swap your class from the com.jeffjirsa variant (probably
from 2.1 / 2.2) to the official TWCS class.

Once that happens I suspect it'll happen quite quickly, but I'm not sure.

On Wed, Mar 27, 2019 at 7:30 AM Nick Hatfield 
wrote:

> Awesome, thank you Jeff. Sorry I had not seen this yet. So we have this
> enabled, I guess it will just take time to finally chew through it all?
>
>
>
> *From:* Jeff Jirsa [mailto:jji...@gmail.com]
> *Sent:* Tuesday, March 26, 2019 9:41 PM
> *To:* user@cassandra.apache.org
> *Subject:* Re: TWCS Compactions & Tombstones
>
>
>
> Or Upgrade to a version with 
> https://issues.apache.org/jira/browse/CASSANDRA-13418 and enable that feature
>
>
>
> --
>
> Jeff Jirsa
>
>
>
>
> On Mar 26, 2019, at 6:23 PM, Rahul Singh 
> wrote:
>
> What's your timewindow? Roughly how much data is in each window?
>
>
>
> If you examine the sstable data and see that is truly old data with little
> chance that it has any new data, you can just remove the SStables. You can
> do a rolling restart -- take down a node, remove mc-254400-* and then start
> it up.
>
>
>
>
> rahul.xavier.si...@gmail.com
>
>
>
> http://cassandra.link
>
>
>
>
>
>
>
> On Tue, Mar 26, 2019 at 8:01 AM Nick Hatfield 
> wrote:
>
> How does one properly rid of sstables that have fallen victim to
> overlapping timestamps? I realized that we had TWCS set in our CF which
> also had a read_repair = 0.1 and after correcting this to 0.0 I can clearly
> see the affects over time on the new sstables. However, I still have old
> sstables that date back some time last year, and I need to remove them:
>
>
>
> Max: 09/05/2018 Min: 09/04/2018 Estimated droppable tombstones:
> 0.883205790993204613G Mar 26 11:34 mc-254400-big-Data.db
>
>
>
>
>
> What is the best way to do this? This is on a production system so any
> help would be greatly appreciated.
>
>
>
> Thanks,
>
>


Re: upgrading 2.1.x cluster with ec2multiregionsnitch system.peers "corruption"

2019-03-27 Thread Carl Mueller
I filed https://issues.apache.org/jira/browse/CASSANDRA-15068

EIPs per the aws experts cost money, are limited in resources (we have a
lot of VMs) and cause a lot of headaches in our autoscaling /
infrastructure as code systems.

On Wed, Mar 27, 2019 at 12:35 PM Carl Mueller 
wrote:

> I'll try to get a replicated error message, but it was along the lines of
> what is in the gossip strategy agnostic description in cassandra.yaml
> comments of what happens when you set rpc_address to 0.0.0.0: you must
> then set broadcast_rpc_address.
>
> On Wed, Mar 27, 2019 at 3:21 AM Oleksandr Shulgin <
> oleksandr.shul...@zalando.de> wrote:
>
>> On Tue, Mar 26, 2019 at 10:28 PM Carl Mueller
>>  wrote:
>>
>>> - the AWS people say EIPs are a PITA.
>>>
>>
>> Why?
>>
>>
>>> - if we hardcode the global IPs in the yaml, then yaml editing is
>>> required for the occaisional hard instance reboot in aws and its attendant
>>> global ip reassignment
>>> - if we try leaving broadcast_rpc_address blank, null , or commented out
>>> with rpc_address set to 0.0.0.0 then cassandra refuses to start
>>>
>>
>> Yeah, that's not nice.
>>
>> - if we take out rpc_address and broadcast_rpc_address, then cqlsh
>>> doesn't work with localhost anymore and that fucks up some of our cluster
>>> managemetn tooling
>>>
>>> - we kind of are being lazy and just want what worked in 2.1 to work in
>>> 2.2
>>>
>>
>> Makes total sense to me.
>>
>> I'll try to track down where cassandra startup is complaining to us about
>>> rpc_address: 0.0.0.0 and broadcast_rpc_address being blank/null/commented
>>> out. That section of code may need an exception for EC2MRS.
>>>
>>
>> It sounds like this check is done before instantiating the snitch and it
>> should be other way round, so that the snitch can have a chance to adjust
>> the configuration before it's checked for correctness.  Do you have the
>> exact error message with which it complains?
>>
>> --
>> Alex
>>
>>


Re: upgrading 2.1.x cluster with ec2multiregionsnitch system.peers "corruption"

2019-03-27 Thread Carl Mueller
We are probably going to just have a VM startup script for now that
automatically updates the yaml on instance restart. It seems to be the
least-sucky approach at this point.

On Wed, Mar 27, 2019 at 12:36 PM Carl Mueller 
wrote:

> I filed https://issues.apache.org/jira/browse/CASSANDRA-15068
>
> EIPs per the aws experts cost money, are limited in resources (we have a
> lot of VMs) and cause a lot of headaches in our autoscaling /
> infrastructure as code systems.
>
> On Wed, Mar 27, 2019 at 12:35 PM Carl Mueller <
> carl.muel...@smartthings.com> wrote:
>
>> I'll try to get a replicated error message, but it was along the lines of
>> what is in the gossip strategy agnostic description in cassandra.yaml
>> comments of what happens when you set rpc_address to 0.0.0.0: you must
>> then set broadcast_rpc_address.
>>
>> On Wed, Mar 27, 2019 at 3:21 AM Oleksandr Shulgin <
>> oleksandr.shul...@zalando.de> wrote:
>>
>>> On Tue, Mar 26, 2019 at 10:28 PM Carl Mueller
>>>  wrote:
>>>
 - the AWS people say EIPs are a PITA.

>>>
>>> Why?
>>>
>>>
 - if we hardcode the global IPs in the yaml, then yaml editing is
 required for the occaisional hard instance reboot in aws and its attendant
 global ip reassignment
 - if we try leaving broadcast_rpc_address blank, null , or commented
 out with rpc_address set to 0.0.0.0 then cassandra refuses to start

>>>
>>> Yeah, that's not nice.
>>>
>>> - if we take out rpc_address and broadcast_rpc_address, then cqlsh
 doesn't work with localhost anymore and that fucks up some of our cluster
 managemetn tooling

 - we kind of are being lazy and just want what worked in 2.1 to work in
 2.2

>>>
>>> Makes total sense to me.
>>>
>>> I'll try to track down where cassandra startup is complaining to us
 about rpc_address: 0.0.0.0 and broadcast_rpc_address being
 blank/null/commented out. That section of code may need an exception for
 EC2MRS.

>>>
>>> It sounds like this check is done before instantiating the snitch and it
>>> should be other way round, so that the snitch can have a chance to adjust
>>> the configuration before it's checked for correctness.  Do you have the
>>> exact error message with which it complains?
>>>
>>> --
>>> Alex
>>>
>>>


RE: TWCS Compactions & Tombstones

2019-03-27 Thread Nick Hatfield
Awesome, thanks again!

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Wednesday, March 27, 2019 1:36 PM
To: cassandra 
Subject: Re: TWCS Compactions & Tombstones

You would need to swap your class from the com.jeffjirsa variant (probably from 
2.1 / 2.2) to the official TWCS class.

Once that happens I suspect it'll happen quite quickly, but I'm not sure.

On Wed, Mar 27, 2019 at 7:30 AM Nick Hatfield 
mailto:nick.hatfi...@metricly.com>> wrote:
Awesome, thank you Jeff. Sorry I had not seen this yet. So we have this 
enabled, I guess it will just take time to finally chew through it all?

From: Jeff Jirsa [mailto:jji...@gmail.com]
Sent: Tuesday, March 26, 2019 9:41 PM
To: user@cassandra.apache.org
Subject: Re: TWCS Compactions & Tombstones


Or Upgrade to a version with 
https://issues.apache.org/jira/browse/CASSANDRA-13418 and enable that feature

--
Jeff Jirsa


On Mar 26, 2019, at 6:23 PM, Rahul Singh 
mailto:rahul.xavier.si...@gmail.com>> wrote:
What's your timewindow? Roughly how much data is in each window?

If you examine the sstable data and see that is truly old data with little 
chance that it has any new data, you can just remove the SStables. You can do a 
rolling restart -- take down a node, remove mc-254400-* and then start it up.


rahul.xavier.si...@gmail.com

http://cassandra.link



On Tue, Mar 26, 2019 at 8:01 AM Nick Hatfield 
mailto:nick.hatfi...@metricly.com>> wrote:
How does one properly rid of sstables that have fallen victim to overlapping 
timestamps? I realized that we had TWCS set in our CF which also had a 
read_repair = 0.1 and after correcting this to 0.0 I can clearly see the 
affects over time on the new sstables. However, I still have old sstables that 
date back some time last year, and I need to remove them:

Max: 09/05/2018 Min: 09/04/2018 Estimated droppable tombstones: 
0.883205790993204613G Mar 26 11:34 mc-254400-big-Data.db


What is the best way to do this? This is on a production system so any help 
would be greatly appreciated.

Thanks,


RE: Cassandra 2.1.18 - NPE during startup

2019-03-27 Thread Steinmaurer, Thomas
Hello,

any ideas regarding below, cause it happened again on a different node.

Thanks
Thomas

From: Steinmaurer, Thomas 
Sent: Dienstag, 05. Februar 2019 23:03
To: user@cassandra.apache.org
Subject: Cassandra 2.1.18 - NPE during startup

Hello,

at a particular customer location, we are seeing the following NPE during 
startup with Cassandra 2.1.18.

INFO  [SSTableBatchOpen:2] 2019-02-03 13:32:56,131 SSTableReader.java:475 - 
Opening 
/var/opt/data/cassandra/system/schema_keyspaces-b0f2235744583cdb9631c43e59ce3676/system-schema_keyspaces-ka-130
 (256 bytes)
ERROR [main] 2019-02-03 13:32:56,552 CassandraDaemon.java:583 - Exception 
encountered during startup
org.apache.cassandra.io.FSReadError: java.lang.NullPointerException
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:672)
 ~[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.service.CassandraDaemon.setup(CassandraDaemon.java:310) 
[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.service.CassandraDaemon.activate(CassandraDaemon.java:566) 
[apache-cassandra-2.1.18.jar:2.1.18]
at 
org.apache.cassandra.service.CassandraDaemon.main(CassandraDaemon.java:655) 
[apache-cassandra-2.1.18.jar:2.1.18]
Caused by: java.lang.NullPointerException: null
at 
org.apache.cassandra.db.ColumnFamilyStore.removeUnfinishedCompactionLeftovers(ColumnFamilyStore.java:664)
 ~[apache-cassandra-2.1.18.jar:2.1.18]
... 3 common frames omitted

I found 
https://issues.apache.org/jira/browse/CASSANDRA-10501,
 but this should be fixed in 2.1.18.

Is the above log stating that it is caused by a system keyspace related SSTable?

This is a 3 node setup with 2 others running fine. If system table related and 
as LocalStrategy is used as replication strategy (to my knowledge), perhaps 
simply copying over data for the schema_keyspaces table from another node might 
fix it?

Any help appreciated.

Thanks.
Thomas
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313
The contents of this e-mail are intended for the named addressee only. It 
contains information that may be confidential. Unless you are the named 
addressee or an authorized designee, you may not copy or use it, or disclose it 
to anyone else. If you received it in error please notify us immediately and 
then destroy it. Dynatrace Austria GmbH (registration number FN 91482h) is a 
company registered in Linz whose registered office is at 4040 Linz, Austria, 
Freistädterstraße 313