On Fri, Mar 3, 2017 at 7:56 AM, Romain Hardouin <romainh...@yahoo.fr> wrote:
> I suspect a lack of 3.x reliability. Cassandra could had gave up with > dropped messages but not with a "drop keyspace". I mean I already saw some > spark jobs with too much executors that produce a high load average on a > DC. I saw a C* node with a 1 min. load avg of 140 that can still have a P99 > read latency at 40ms. But I never saw a disappearing keyspace. There are > old tickets regarding C* 1.x but as far as I remember it was due to a > create/drop/create keyspace. > > > Le Vendredi 3 mars 2017 13h44, George Webster <webste...@gmail.com> a > écrit : > > > Thank you for your reply and good to know about the debug statement. I > haven't > > We never dropped or re-created the keyspace before. We haven't even > performed writes to that keyspace in months. I also checked the permissions > of Apache, that user had read only access. > > Unfortunately, I reverted from a backend recently. I cannot say for sure > anymore if I saw something in system before the revert. > > Anyway, hopefully it was just a fluke. We have some crazy ML libraries > running on it maybe Cassandra just gave up? Ohh well, Cassandra is a a > champ and we haven't really had issues with it before. > > On Thu, Mar 2, 2017 at 6:51 PM, Romain Hardouin <romainh...@yahoo.fr> > wrote: > > Did you inspect system tables to see if there is some traces of your > keyspace? Did you ever drop and re-create this keyspace before that? > > Lines in debug appear because fd interval is > 2 seconds (logs are in > nanoseconds). You can override intervals via -Dcassandra.fd_initial_value_ > ms and -Dcassandra.fd_max_interval_ms properties. Are you sure you didn't > have these lines in debug logs before? I used to see them a lot prior to > increase intervals to 4 seconds. > > Best, > > Romain > > Le Mardi 28 février 2017 18h25, George Webster <webste...@gmail.com> a > écrit : > > > Hey Cassandra Users, > > We recently encountered an issue with a keyspace just disappeared. I was > curious if anyone has had this occur before and can provide some insight. > > We are using cassandra 3.10. 2 DCs 3 nodes each. > The data was still located in the storage folder but is not located inside > Cassandra > > I searched the logs for any hints of error or commands being executed that > could have caused a loss of a keyspace. Unfortunately I found nothing. In > the logs the only unusual issue i saw was a series of read timeouts that > occurred right around when the keyspace went away. Since then I see > numerous entries in debug log as the following: > > DEBUG [GossipStage:1] 2017-02-28 18:14:12,580 FailureDetector.java:457 - > Ignoring interval time of 2155674599 for /x.x.x..12 > DEBUG [GossipStage:1] 2017-02-28 18:14:16,580 FailureDetector.java:457 - > Ignoring interval time of 2945213745 for /x.x.x.81 > DEBUG [GossipStage:1] 2017-02-28 18:14:19,590 FailureDetector.java:457 - > Ignoring interval time of 2006530862 for /x.x.x..69 > DEBUG [GossipStage:1] 2017-02-28 18:14:27,434 FailureDetector.java:457 - > Ignoring interval time of 3441841231 for /x.x.x.82 > DEBUG [GossipStage:1] 2017-02-28 18:14:29,588 FailureDetector.java:457 - > Ignoring interval time of 2153964846 for /x.x.x.82 > DEBUG [GossipStage:1] 2017-02-28 18:14:33,582 FailureDetector.java:457 - > Ignoring interval time of 2588593281 for /x.x.x.82 > DEBUG [GossipStage:1] 2017-02-28 18:14:37,588 FailureDetector.java:457 - > Ignoring interval time of 2005305693 for /x.x.x.69 > DEBUG [GossipStage:1] 2017-02-28 18:14:38,592 FailureDetector.java:457 - > Ignoring interval time of 2009244850 for /x.x.x.82 > DEBUG [GossipStage:1] 2017-02-28 18:14:43,584 FailureDetector.java:457 - > Ignoring interval time of 2149192677 for /x.x.x.69 > DEBUG [GossipStage:1] 2017-02-28 18:14:45,605 FailureDetector.java:457 - > Ignoring interval time of 2021180918 for /x.x.x.85 > DEBUG [GossipStage:1] 2017-02-28 18:14:46,432 FailureDetector.java:457 - > Ignoring interval time of 2436026101 for /x.x.x.81 > DEBUG [GossipStage:1] 2017-02-28 18:14:46,432 FailureDetector.java:457 - > Ignoring interval time of 2436187894 for /x.x.x.82 > > During the time of the disappearing keyspace we had two concurrent > activities: > 1) Running a Spark job (via HDP 2.5.3 in Yarn) that was performing a > countbykey. It was using they Keyspace that disappeared. The operation > crashed. > 2) We created a new keyspace to test out scheme. Only "fancy" thing in > that keyspace are a few material view tables. Data was being loaded into > that keyspace during the crash. The load process was extracting information > and then just writing to Cassandra. > > Any ideas? Anyone seen this before? > > Thanks, > George > > > > > > Cassandra takes snapshots for certain events. Does this extend to drop keyspace commands? Maybe it should.