Re: OpsCenter sending alert emails or posting to a url never succeeded.
Ryan, Thanks so much! Your info is very very important! Yuan On Wed, Jul 20, 2016 at 1:54 PM, Ryan Springer wrote: > Yuan, > > If you add a static key=value pair, you need to add it using the "fields=" > section. However, when the "fields=" section is used, you must list all of > the fields that you want to appear in the POST data. > > Here is an example that sends message_type and the message itself: > > fields=message_type=CRITICAL >message={message} > > > Try placing this at the end of your posturl.conf where you currently only > have "message_type=CRITICAL" > > Thank you, > > Ryan Springer > Opscenter Provisioning Team > > On Tue, Jul 19, 2016 at 6:56 PM, Yuan Fang wrote: > >> Anyone succeeded? >> >> >> >> Here is my setting for postUrl. >> == >> ubuntu@ip-172-31-55-130:/etc/opscenter/event-plugins$ more posturl.conf >> >> [posturl] >> enabled=1 >> >> # levels can be comma delimited list of any of the following: >> # DEBUG,INFO,WARN,ERROR,CRITICAL,ALERT >> # If left empty, will listen for all levels >> levels= >> >> # clusters is a comma delimited list of cluster names for which >> # this alert config will be eligible to run. >> # If left empty, this alert will will be called for events on all clusters >> clusters= >> >> # the URL to send a HTTP POST to >> url=https://alert.victorops.com/integrations/generic* >> >> # Set a username for basic HTTP authorization >> #username=foo >> >> # Set a password for basic HTTP authorization >> #password=bar >> >> # Set the type of posted data. Available options are 'json' or 'form' >> post_type=json >> >> # Fields specified here will override the default event data fields. >> # >> # They must be formatted as key-value pair, with key and value separated >> by >> # an equals (=). Each pair after the first must be on its own line, >> # indented beyond the first line >> # >> # You may use tokens found within the default event data for or in >> # values. For example, some available keys are: >> # cluster, time, level_str, message, target_node, event_source, >> success, api_source_ip, user, source_node >> # Keys must be encapsulated in {brackets}. >> # >> #fields=textKey=value >> #mixedKey=cluster-{cluster} >> #event-msg={message} >> message_type=CRITICAL >> > > >
Node after restart sees other nodes down for 10 minutes
I am new to Cassandra and trying to figure out how the cluster behaves when things go south. I have a 6-node cluster, RF=3. I stop Cassandra service on a node for a while. All nodes see the node as DN. After a while I start the Cassandra service on DN. Interesting point is that all other nodes see the node now as UN but the node itself sees 4 nodes as DN and only one node as UN. After about 10 minutes the node sees other nodes as up as well. I am trying to figure out where this delay is coming from. I have attached part of system.log that looks interesting. Looks like after Gossiper logs InetAddress is now UP the node is actually seeing that node as up even though the node has already handshaked with that node before. Any ideas? Cheers Farzad -- INFO [main] 2016-07-25 21:58:46,044 StorageService.java:533 - Cassandra version: 3.0.8 INFO [main] 2016-07-25 21:58:46,098 StorageService.java:534 - Thrift API version: 20.1.0 INFO [main] 2016-07-25 21:58:46,150 StorageService.java:535 - CQL supported versions: 3.4.0 (default: 3.4.0) INFO [main] 2016-07-25 21:58:46,284 IndexSummaryManager.java:85 - Initializing index summary manager with a memory pool size of 198 MB and a resize interval of 60 minutes INFO [main] 2016-07-25 21:58:46,343 StorageService.java:554 - Loading persisted ring state INFO [main] 2016-07-25 21:58:46,418 StorageService.java:743 - Starting up server gossip INFO [main] 2016-07-25 21:58:46,680 TokenMetadata.java:429 - Updating topology for ip-10-4-43-66.ec2.internal/10.4.43.66 INFO [main] 2016-07-25 21:58:46,707 TokenMetadata.java:429 - Updating topology for ip-10-4-43-66.ec2.internal/10.4.43.66 INFO [main] 2016-07-25 21:58:46,792 MessagingService.java:557 - Starting Messaging Service on ip-10-4-43-66.ec2.internal/10.4.43.66:7000 (eth0) INFO [HANDSHAKE-/10.4.68.222] 2016-07-25 21:58:46,920 OutboundTcpConnection.java:515 - Handshaking version with /10.4.68.222 INFO [GossipStage:1] 2016-07-25 21:58:47,011 Gossiper.java:1028 - Node / 10.4.68.221 has restarted, now UP INFO [HANDSHAKE-/10.4.68.222] 2016-07-25 21:58:47,007 OutboundTcpConnection.java:515 - Handshaking version with /10.4.68.222 INFO [main] 2016-07-25 21:58:47,030 StorageService.java:1902 - Node ip-10-4-43-66.ec2.internal/10.4.43.66 state jump to NORMAL INFO [main] 2016-07-25 21:58:47,096 CassandraDaemon.java:644 - Waiting for gossip to settle before accepting client requests... INFO [GossipStage:1] 2016-07-25 21:58:47,134 StorageService.java:1902 - Node /10.4.68.221 state jump to NORMAL INFO [HANDSHAKE-/10.4.68.221] 2016-07-25 21:58:47,137 OutboundTcpConnection.java:515 - Handshaking version with /10.4.68.221 INFO [GossipStage:1] 2016-07-25 21:58:47,211 TokenMetadata.java:429 - Updating topology for /10.4.68.221 INFO [GossipStage:1] 2016-07-25 21:58:47,261 TokenMetadata.java:429 - Updating topology for /10.4.68.221 INFO [GossipStage:1] 2016-07-25 21:58:47,295 Gossiper.java:1028 - Node / 10.4.68.222 has restarted, now UP INFO [GossipStage:1] 2016-07-25 21:58:47,337 StorageService.java:1902 - Node /10.4.68.222 state jump to NORMAL INFO [GossipStage:1] 2016-07-25 21:58:47,385 TokenMetadata.java:429 - Updating topology for /10.4.68.222 INFO [GossipStage:1] 2016-07-25 21:58:47,452 TokenMetadata.java:429 - Updating topology for /10.4.68.222 INFO [GossipStage:1] 2016-07-25 21:58:47,497 Gossiper.java:1028 - Node / 10.4.54.176 has restarted, now UP INFO [GossipStage:1] 2016-07-25 21:58:47,544 StorageService.java:1902 - Node /10.4.54.176 state jump to NORMAL INFO [HANDSHAKE-/10.4.54.176] 2016-07-25 21:58:47,548 OutboundTcpConnection.java:515 - Handshaking version with /10.4.54.176 INFO [GossipStage:1] 2016-07-25 21:58:47,594 TokenMetadata.java:429 - Updating topology for /10.4.54.176 INFO [GossipStage:1] 2016-07-25 21:58:47,639 TokenMetadata.java:429 - Updating topology for /10.4.54.176 WARN [GossipTasks:1] 2016-07-25 21:58:47,678 FailureDetector.java:287 - Not marking nodes down due to local pause of 43226235115 > 50 INFO [HANDSHAKE-/10.4.43.65] 2016-07-25 21:58:47,679 OutboundTcpConnection.java:515 - Handshaking version with /10.4.43.65 INFO [GossipStage:1] 2016-07-25 21:58:47,757 Gossiper.java:1028 - Node / 10.4.54.177 has restarted, now UP INFO [GossipStage:1] 2016-07-25 21:58:47,788 StorageService.java:1902 - Node /10.4.54.177 state jump to NORMAL INFO [HANDSHAKE-/10.4.54.177] 2016-07-25 21:58:47,789 OutboundTcpConnection.java:515 - Handshaking version with /10.4.54.177 INFO [GossipStage:1] 2016-07-25 21:58:47,836 TokenMetadata.java:429 - Updating topology for /10.4.54.177 INFO [GossipStage:1] 2016-07-25 21:58:47,887 TokenMetadata.java:429 - Updating topology for /10.4.54.177 INFO [GossipStage:1] 2016-07-25 21:58:47,926 Gossiper.java:1028 - Node / 10.4.43.65 has restarted, now UP INFO [GossipStage:1] 2016-07-25 21:58:47,976 StorageService.java:1902 - Node /10.4.43.65 state jump to NORMAL INFO [GossipStage:1] 2016-07-25 21:58:48,036 TokenMetadata.java:42
Re: Re : Recommended procedure for enabling SSL on a live production cluster
hi Nate; thanks for the help. Upgrading to 2.1.12 seems to be the solution for client to node encryption on NATIVE port. The other issue we are facing is with the STORAGE port. The reason behind this is that we need to switch back and forth between different internode_encryption modes, and we need C* servers to keep running in transient state or during mode switching. Currently this is not possible. For example, we have a internode_encryption=none cluster in a multi-region AWS environment and want to set internode_encryption=dc by rolling restart C* nodes. However, the node with internode_encryption=dc, does not open to listen to non-ssl port. As a result, we have a splitted brain cluster here. Below is a ticket opened for the exact same issue. Has anybody overcome any such issue on a production cluster? Thanks in advance. https://issues.apache.org/jira/browse/CASSANDRA-8751 thanks Sai On Wed, Jul 20, 2016 at 5:25 PM, Nate McCall wrote: > If you migrate to the latest 2.1 first, you can make this a non-issue as > 2.1.12 and above support simultaneous SSL and plain on the same port for > exactly this use case: > https://issues.apache.org/jira/browse/CASSANDRA-10559 > > On Thu, Jul 21, 2016 at 3:02 AM, sai krishnam raju potturi < > pskraj...@gmail.com> wrote: > >> hi ; >> if possible could someone shed some light on this. I followed a >> post from the lastpickle which was very informative, but we had some >> concerns when it came to enabling SSL on a live production cluster. >> >> >> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html >> >> 1 : We generally remove application traffic from a DC which has ongoing >> changes, just not to affect end customers if things go south during the >> update. >> >> 2 : So once DC-A has been restarted after enabling SSL, this would be >> missing writes during that period, as the DC-A would be shown as down by >> the other DC's. We will not be able to put back application traffic on DC-A >> until we run inter-dc repairs, which will happen only when SSL has been >> enabled on all DC's. >> >> 3 : Repeating the procedure for every DC will lead to some missed writes >> across all DC's. >> >> 4 : We could do the rolling restart of a DC-A with application traffic >> on, but we are concerned if for any infrastructure related reason we have >> an issue, we will have to serve traffic from another DC-B, which might be >> missing on writes to the DC-A during that period. >> >> We have 4 DC's which 50 nodes each. >> >> >> thanks >> Sai >> >> -- Forwarded message -- >> From: sai krishnam raju potturi >> Date: Mon, Jul 18, 2016 at 11:06 AM >> Subject: Re : Recommended procedure for enabling SSL on a live production >> cluster >> To: user@cassandra.apache.org >> >> >> Hi; >> We have a Cassandra cluster ( version 2.0.14 ) spanning across 4 >> datacenters with 50 nodes each. We are planning to enable SSL between the >> datacenters. We are following the standard procedure for enabling SSL ( >> http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html) >> . We were planning to enable SSL for each datacenter at a time. >> >> During the rolling restart, it's expected that the nodes in the >> datacenter that had the service restarted, will show as down by the nodes >> in other datacenters that have not restarted the service. This would lead >> to missed writes among various nodes during this procedure. >> >> What would be the recommended procedure for enabling SSL on a live >> production cluster without the chaos. >> >> thanks >> Sai >> >> > > > -- > - > Nate McCall > Wellington, NZ > @zznate > > CTO > Apache Cassandra Consulting > http://www.thelastpickle.com >
regarding drain process
Hi all, Recently I'm facing a problem with cassandra nodes. Nodes go down very frequently. I went through system.log and found the reason that somehow c* triggers *draining process.* I know the purpose of *nodetool drain *but it should not trigger automatically, right ? or is there any specific settings for the same ? *we are using C*-2.1.13.* please let me know if you need more info. Thanking you!! Regards, Varun Barala