I suspect a lack of 3.x reliability. Cassandra could had gave up with dropped 
messages but not with a "drop keyspace". I mean I already saw some spark jobs 
with too much executors that produce a high load average on a DC. I saw a C* 
node with a 1 min. load avg of 140 that can still have a P99 read latency at 
40ms. But I never saw a disappearing keyspace. There are old tickets regarding 
C* 1.x but as far as I remember it was due to a create/drop/create keyspace.

    Le Vendredi 3 mars 2017 13h44, George Webster <webste...@gmail.com> a écrit 
:
 

 Thank you for your reply and good to know about the debug statement. I haven't 
 
We never dropped or re-created the keyspace before. We haven't even performed 
writes to that keyspace in months. I also checked the permissions of Apache, 
that user had read only access. 
Unfortunately, I reverted from a backend recently. I cannot say for sure 
anymore if I saw something in system before the revert. 
Anyway, hopefully it was just a fluke. We have some crazy ML libraries running 
on it maybe Cassandra just gave up? Ohh well, Cassandra is a a champ and we 
haven't really had issues with it before. 
On Thu, Mar 2, 2017 at 6:51 PM, Romain Hardouin <romainh...@yahoo.fr> wrote:

Did you inspect system tables to see if there is some traces of your keyspace? 
Did you ever drop and re-create this keyspace before that?
Lines in debug appear because fd interval is > 2 seconds (logs are in 
nanoseconds). You can override intervals via -Dcassandra.fd_initial_value_ ms 
and -Dcassandra.fd_max_interval_ms properties. Are you sure you didn't have 
these lines in debug logs before? I used to see them a lot prior to increase 
intervals to 4 seconds. 
Best,
Romain

    Le Mardi 28 février 2017 18h25, George Webster <webste...@gmail.com> a 
écrit :
 

 Hey Cassandra Users,
We recently encountered an issue with a keyspace just disappeared. I was 
curious if anyone has had this occur before and can provide some insight. 
We are using cassandra 3.10. 2 DCs  3 nodes each. The data was still located in 
the storage folder but is not located inside Cassandra
I searched the logs for any hints of error or commands being executed that 
could have caused a loss of a keyspace. Unfortunately I found nothing. In the 
logs the only unusual issue i saw was a series of read timeouts that occurred 
right around when the keyspace went away. Since then I see numerous entries in 
debug log as the following:
DEBUG [GossipStage:1] 2017-02-28 18:14:12,580 FailureDetector.java:457 - 
Ignoring interval time of 2155674599 for /x.x.x..12DEBUG [GossipStage:1] 
2017-02-28 18:14:16,580 FailureDetector.java:457 - Ignoring interval time of 
2945213745 for /x.x.x.81DEBUG [GossipStage:1] 2017-02-28 18:14:19,590 
FailureDetector.java:457 - Ignoring interval time of 2006530862 for 
/x.x.x..69DEBUG [GossipStage:1] 2017-02-28 18:14:27,434 
FailureDetector.java:457 - Ignoring interval time of 3441841231 for 
/x.x.x.82DEBUG [GossipStage:1] 2017-02-28 18:14:29,588 FailureDetector.java:457 
- Ignoring interval time of 2153964846 for /x.x.x.82DEBUG [GossipStage:1] 
2017-02-28 18:14:33,582 FailureDetector.java:457 - Ignoring interval time of 
2588593281 for /x.x.x.82DEBUG [GossipStage:1] 2017-02-28 18:14:37,588 
FailureDetector.java:457 - Ignoring interval time of 2005305693 for 
/x.x.x.69DEBUG [GossipStage:1] 2017-02-28 18:14:38,592 FailureDetector.java:457 
- Ignoring interval time of 2009244850 for /x.x.x.82DEBUG [GossipStage:1] 
2017-02-28 18:14:43,584 FailureDetector.java:457 - Ignoring interval time of 
2149192677 for /x.x.x.69DEBUG [GossipStage:1] 2017-02-28 18:14:45,605 
FailureDetector.java:457 - Ignoring interval time of 2021180918 for 
/x.x.x.85DEBUG [GossipStage:1] 2017-02-28 18:14:46,432 FailureDetector.java:457 
- Ignoring interval time of 2436026101 for /x.x.x.81DEBUG [GossipStage:1] 
2017-02-28 18:14:46,432 FailureDetector.java:457 - Ignoring interval time of 
2436187894 for /x.x.x.82
During the time of the disappearing keyspace we had two concurrent 
activities:1) Running a Spark job (via HDP 2.5.3 in Yarn) that was performing a 
countbykey. It was using they Keyspace that disappeared. The operation 
crashed.2) We created a new keyspace to test out scheme. Only "fancy" thing in 
that keyspace are a few material view tables. Data was being loaded into that 
keyspace during the crash. The load process was extracting information and then 
just writing to Cassandra. 
Any ideas? Anyone seen this before?
Thanks,George

   



   

Reply via email to