Re: OpsCenter sending alert emails or posting to a url never succeeded.

2016-07-26 Thread Yuan Fang
Ryan,

Thanks so much!
Your info is very very important!

Yuan

On Wed, Jul 20, 2016 at 1:54 PM, Ryan Springer 
wrote:

> Yuan,
>
> If you add a static key=value pair, you need to add it using the "fields="
> section.  However, when the "fields=" section is used, you must list all of
> the fields that you want to appear in the POST data.
>
> Here is an example that sends message_type and the message itself:
>
> fields=message_type=CRITICAL
>message={message}
>
>
> Try placing this at the end of your posturl.conf where you currently only
> have "message_type=CRITICAL"
>
> Thank you,
>
> Ryan Springer
> Opscenter Provisioning Team
>
> On Tue, Jul 19, 2016 at 6:56 PM, Yuan Fang  wrote:
>
>> Anyone succeeded?
>>
>>
>>
>> Here is my setting for postUrl.
>> ==
>> ubuntu@ip-172-31-55-130:/etc/opscenter/event-plugins$ more posturl.conf
>>
>> [posturl]
>> enabled=1
>>
>> # levels can be comma delimited list of any of the following:
>> # DEBUG,INFO,WARN,ERROR,CRITICAL,ALERT
>> # If left empty, will listen for all levels
>> levels=
>>
>> # clusters is a comma delimited list of cluster names for which
>> # this alert config will be eligible to run.
>> # If left empty, this alert will will be called for events on all clusters
>> clusters=
>>
>> # the URL to send a HTTP POST to
>> url=https://alert.victorops.com/integrations/generic*
>>
>> # Set a username for basic HTTP authorization
>> #username=foo
>>
>> # Set a password for basic HTTP authorization
>> #password=bar
>>
>> # Set the type of posted data. Available options are 'json' or 'form'
>> post_type=json
>>
>> # Fields specified here will override the default event data fields.
>> #
>> # They must be formatted as key-value pair, with key and value separated
>> by
>> # an equals (=). Each pair after the first must be on its own line,
>> # indented beyond the first line
>> #
>> # You may use tokens found within the default event data for or in
>> # values. For example, some available keys are:
>> #   cluster, time, level_str, message, target_node, event_source,
>> success, api_source_ip, user, source_node
>> # Keys must be encapsulated in {brackets}.
>> #
>> #fields=textKey=value
>> #mixedKey=cluster-{cluster}
>> #event-msg={message}
>> message_type=CRITICAL
>>
>
>
>


Node after restart sees other nodes down for 10 minutes

2016-07-26 Thread Farzad Panahi
I am new to Cassandra and trying to figure out how the cluster behaves when
things go south.

I have a 6-node cluster, RF=3.

I stop Cassandra service on a node for a while. All nodes see the node as
DN. After a while I start the Cassandra service on DN. Interesting point is
that all other nodes see the node now as UN but the node itself sees 4
nodes as DN and only one node as UN. After about 10 minutes the node sees
other nodes as up as well.

I am trying to figure out where this delay is coming from.

I have attached part of system.log that looks interesting. Looks like after
Gossiper logs InetAddress  is now UP the node is actually seeing that
node as up even though the node has already handshaked with that node
before.

Any ideas?

Cheers

Farzad

--
INFO  [main] 2016-07-25 21:58:46,044 StorageService.java:533 - Cassandra
version: 3.0.8
INFO  [main] 2016-07-25 21:58:46,098 StorageService.java:534 - Thrift API
version: 20.1.0
INFO  [main] 2016-07-25 21:58:46,150 StorageService.java:535 - CQL
supported versions: 3.4.0 (default: 3.4.0)
INFO  [main] 2016-07-25 21:58:46,284 IndexSummaryManager.java:85 -
Initializing index summary manager with a memory pool size of 198 MB and a
resize interval of 60 minutes
INFO  [main] 2016-07-25 21:58:46,343 StorageService.java:554 - Loading
persisted ring state
INFO  [main] 2016-07-25 21:58:46,418 StorageService.java:743 - Starting up
server gossip
INFO  [main] 2016-07-25 21:58:46,680 TokenMetadata.java:429 - Updating
topology for ip-10-4-43-66.ec2.internal/10.4.43.66
INFO  [main] 2016-07-25 21:58:46,707 TokenMetadata.java:429 - Updating
topology for ip-10-4-43-66.ec2.internal/10.4.43.66
INFO  [main] 2016-07-25 21:58:46,792 MessagingService.java:557 - Starting
Messaging Service on ip-10-4-43-66.ec2.internal/10.4.43.66:7000 (eth0)

INFO  [HANDSHAKE-/10.4.68.222] 2016-07-25 21:58:46,920
OutboundTcpConnection.java:515 - Handshaking version with /10.4.68.222
INFO  [GossipStage:1] 2016-07-25 21:58:47,011 Gossiper.java:1028 - Node /
10.4.68.221 has restarted, now UP
INFO  [HANDSHAKE-/10.4.68.222] 2016-07-25 21:58:47,007
OutboundTcpConnection.java:515 - Handshaking version with /10.4.68.222
INFO  [main] 2016-07-25 21:58:47,030 StorageService.java:1902 - Node
ip-10-4-43-66.ec2.internal/10.4.43.66 state jump to NORMAL
INFO  [main] 2016-07-25 21:58:47,096 CassandraDaemon.java:644 - Waiting for
gossip to settle before accepting client requests...
INFO  [GossipStage:1] 2016-07-25 21:58:47,134 StorageService.java:1902 -
Node /10.4.68.221 state jump to NORMAL
INFO  [HANDSHAKE-/10.4.68.221] 2016-07-25 21:58:47,137
OutboundTcpConnection.java:515 - Handshaking version with /10.4.68.221
INFO  [GossipStage:1] 2016-07-25 21:58:47,211 TokenMetadata.java:429 -
Updating topology for /10.4.68.221
INFO  [GossipStage:1] 2016-07-25 21:58:47,261 TokenMetadata.java:429 -
Updating topology for /10.4.68.221
INFO  [GossipStage:1] 2016-07-25 21:58:47,295 Gossiper.java:1028 - Node /
10.4.68.222 has restarted, now UP
INFO  [GossipStage:1] 2016-07-25 21:58:47,337 StorageService.java:1902 -
Node /10.4.68.222 state jump to NORMAL
INFO  [GossipStage:1] 2016-07-25 21:58:47,385 TokenMetadata.java:429 -
Updating topology for /10.4.68.222
INFO  [GossipStage:1] 2016-07-25 21:58:47,452 TokenMetadata.java:429 -
Updating topology for /10.4.68.222
INFO  [GossipStage:1] 2016-07-25 21:58:47,497 Gossiper.java:1028 - Node /
10.4.54.176 has restarted, now UP
INFO  [GossipStage:1] 2016-07-25 21:58:47,544 StorageService.java:1902 -
Node /10.4.54.176 state jump to NORMAL
INFO  [HANDSHAKE-/10.4.54.176] 2016-07-25 21:58:47,548
OutboundTcpConnection.java:515 - Handshaking version with /10.4.54.176
INFO  [GossipStage:1] 2016-07-25 21:58:47,594 TokenMetadata.java:429 -
Updating topology for /10.4.54.176
INFO  [GossipStage:1] 2016-07-25 21:58:47,639 TokenMetadata.java:429 -
Updating topology for /10.4.54.176
WARN  [GossipTasks:1] 2016-07-25 21:58:47,678 FailureDetector.java:287 -
Not marking nodes down due to local pause of 43226235115 > 50
INFO  [HANDSHAKE-/10.4.43.65] 2016-07-25 21:58:47,679
OutboundTcpConnection.java:515 - Handshaking version with /10.4.43.65
INFO  [GossipStage:1] 2016-07-25 21:58:47,757 Gossiper.java:1028 - Node /
10.4.54.177 has restarted, now UP
INFO  [GossipStage:1] 2016-07-25 21:58:47,788 StorageService.java:1902 -
Node /10.4.54.177 state jump to NORMAL
INFO  [HANDSHAKE-/10.4.54.177] 2016-07-25 21:58:47,789
OutboundTcpConnection.java:515 - Handshaking version with /10.4.54.177
INFO  [GossipStage:1] 2016-07-25 21:58:47,836 TokenMetadata.java:429 -
Updating topology for /10.4.54.177
INFO  [GossipStage:1] 2016-07-25 21:58:47,887 TokenMetadata.java:429 -
Updating topology for /10.4.54.177
INFO  [GossipStage:1] 2016-07-25 21:58:47,926 Gossiper.java:1028 - Node /
10.4.43.65 has restarted, now UP
INFO  [GossipStage:1] 2016-07-25 21:58:47,976 StorageService.java:1902 -
Node /10.4.43.65 state jump to NORMAL
INFO  [GossipStage:1] 2016-07-25 21:58:48,036 TokenMetadata.java:42

Re: Re : Recommended procedure for enabling SSL on a live production cluster

2016-07-26 Thread sai krishnam raju potturi
hi Nate;
thanks for the help. Upgrading to 2.1.12 seems to be the solution for
client to node encryption on NATIVE port.

The other issue we are facing is with the STORAGE port. The reason behind
this is that we need to switch back and forth between different
internode_encryption modes, and we need C* servers to keep running in
transient state or during mode switching. Currently this is not possible.
For example, we have a internode_encryption=none cluster in a multi-region
AWS environment and want to set internode_encryption=dc by rolling restart
C* nodes. However, the node with internode_encryption=dc, does not open to
listen to non-ssl port. As a result, we have a splitted brain cluster here.

Below is a ticket opened for the exact same issue. Has anybody overcome any
such issue on a production cluster? Thanks in advance.

https://issues.apache.org/jira/browse/CASSANDRA-8751

thanks
Sai

On Wed, Jul 20, 2016 at 5:25 PM, Nate McCall  wrote:

> If you migrate to the latest 2.1 first, you can make this a non-issue as
> 2.1.12 and above support simultaneous SSL and plain on the same port for
> exactly this use case:
> https://issues.apache.org/jira/browse/CASSANDRA-10559
>
> On Thu, Jul 21, 2016 at 3:02 AM, sai krishnam raju potturi <
> pskraj...@gmail.com> wrote:
>
>> hi ;
>>  if possible could someone shed some light on this. I followed a
>> post from the lastpickle which was very informative, but we had some
>> concerns when it came to enabling SSL on a live production cluster.
>>
>>
>> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html
>>
>> 1 : We generally remove application traffic from a DC which has ongoing
>> changes, just not to affect end customers if things go south during the
>> update.
>>
>> 2 : So once DC-A has been restarted after enabling SSL, this would be
>> missing writes during that period, as the DC-A would be shown as down by
>> the other DC's. We will not be able to put back application traffic on DC-A
>> until we run inter-dc repairs, which will happen only  when SSL has been
>> enabled on all DC's.
>>
>> 3 : Repeating the procedure for every DC will lead to some missed writes
>> across all DC's.
>>
>> 4 : We could do the rolling restart of a DC-A with application traffic
>> on, but we are concerned if for any infrastructure related reason we have
>> an issue, we will have to serve traffic from another DC-B, which might be
>> missing on writes to the DC-A during that period.
>>
>> We have 4 DC's which 50 nodes each.
>>
>>
>> thanks
>> Sai
>>
>> -- Forwarded message --
>> From: sai krishnam raju potturi 
>> Date: Mon, Jul 18, 2016 at 11:06 AM
>> Subject: Re : Recommended procedure for enabling SSL on a live production
>> cluster
>> To: user@cassandra.apache.org
>>
>>
>> Hi;
>>   We have a Cassandra cluster ( version 2.0.14 ) spanning across 4
>> datacenters with 50 nodes each. We are planning to enable SSL between the
>> datacenters. We are following the standard procedure for enabling SSL (
>> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html)
>> . We were planning to enable SSL for each datacenter at a time.
>>
>> During the rolling restart, it's expected that the nodes in the
>> datacenter that had the service restarted, will show as down by the nodes
>> in other datacenters that have not restarted the service. This would lead
>> to missed writes among various nodes during this procedure.
>>
>> What would be the recommended procedure for enabling SSL on a live
>> production cluster without the chaos.
>>
>> thanks
>> Sai
>>
>>
>
>
> --
> -
> Nate McCall
> Wellington, NZ
> @zznate
>
> CTO
> Apache Cassandra Consulting
> http://www.thelastpickle.com
>


regarding drain process

2016-07-26 Thread Varun Barala
Hi all,


Recently I'm facing a problem with cassandra nodes. Nodes go down very
frequently.
I went through system.log and found the reason that somehow c*
triggers *draining
process.*

I know the purpose of *nodetool drain *but it should not trigger
automatically, right ?

or is there any specific settings for the same ?


*we are using C*-2.1.13.*

please let me know if you need more info.

Thanking you!!

Regards,
Varun Barala