Re: WriteTimeoutException When only One Node is Down

2017-01-15 Thread Shalom Sagges
Hi Yuji,

Thanks for your reply.
That's what I don't understand. Since the writes are in LOCAL_QUORUM, even
if a node fails, there should be enough replica to satisfy the request,
shouldn't it?
Otherwise, the whole idea behind no single point of failure is only
partially true? Or is there something I'm missing...

Thanks!


Shalom Sagges
DBA
T: +972-74-700-4035
 
 We Create Meaningful Connections



On Fri, Jan 13, 2017 at 4:15 AM, Yuji Ito  wrote:

> Hi Shalom,
>
> I also got WriteTimeoutException in my destructive test like your test.
>
> When did you drop a node?
> A coordinator node sends a write request to all replicas.
> When one of nodes was down while the request is executed, sometimes
> WriteTimeOutException happens.
>
> cf. http://www.datastax.com/dev/blog/cassandra-error-handling-done-right
>
> Thanks,
> Yuji
>
>
>
> On Thu, Jan 12, 2017 at 4:26 PM, Shalom Sagges 
> wrote:
>
>> Hi Everyone,
>>
>> I'm using C* v3.0.9 for a cluster of 3 DCs with RF 3 in each DC. All
>> read/write queries are set to consistency LOCAL_QUORUM.
>> The relevant keyspace is built as follows:
>>
>> *CREATE KEYSPACE mykeyspace WITH replication = {'class':
>> 'NetworkTopologyStrategy', 'DC1': '3', 'DC2': '3', 'DC3': '3'}  AND
>> durable_writes = true;*
>>
>> I use* Datastax driver 3.0.1*
>>
>>
>> When I performed a resiliency test for the application, each time I
>> dropped one node, the client got the following error:
>>
>>
>> com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra
>> timeout during write query at consistency TWO (2 replica were required but
>> only 1 acknowledged the write)
>> at com.datastax.driver.core.exceptions.WriteTimeoutException.
>> copy(WriteTimeoutException.java:73)
>> at com.datastax.driver.core.exceptions.WriteTimeoutException.
>> copy(WriteTimeoutException.java:26)
>> at com.datastax.driver.core.DriverThrowables.propagateCause(Dri
>> verThrowables.java:37)
>> at com.datastax.driver.core.DefaultResultSetFuture.getUninterru
>> ptibly(DefaultResultSetFuture.java:245)
>> at com.datastax.driver.core.AbstractSession.execute(AbstractSes
>> sion.java:63)
>> at humanclick.ldap.commImpl.siteData.CassandraSiteDataDaoSpring
>> .updateJprunDomains(CassandraSiteDataDaoSpring.java:121)
>> at humanclick.ldap.commImpl.siteData.CassandraSiteDataDaoSpring
>> .createOrUpdate(CassandraSiteDataDaoSpring.java:97)
>> at humanclick.ldapAdapter.dataUpdater.impl.SiteDataToLdapUpdate
>> r.update(SiteDataToLdapUpdater.java:280)
>>
>>
>> After a few seconds the error no longer recurs. I have no idea why
>> there's a timeout since there are additional replicas that satisfy the
>> consistency level, and I'm more baffled when the error showed *"Cassandra
>> timeout during write query at consistency TWO (2 replica were required but
>> only 1 acknowledged the write)"*
>>
>> Any ideas?  I'm quite at a loss here.
>>
>> Thanks!
>>
>>
>>
>> Shalom Sagges
>> DBA
>> T: +972-74-700-4035 <+972%2074-700-4035>
>>  
>>  We Create Meaningful Connections
>>
>>
>>
>> This message may contain confidential and/or privileged information.
>> If you are not the addressee or authorized to receive this on behalf of
>> the addressee you must not use, copy, disclose or take action based on this
>> message or any information herein.
>> If you have received this message in error, please advise the sender
>> immediately by reply email and delete this message. Thank you.
>>
>
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.


Re: Backups eating up disk space

2017-01-15 Thread Chris Mawata
You don't have a viable solution because you are not making a snapshot as a
starting point. After a while you will have a lot of backup data.  Using
the backups to get your cluster to a given state will involve copying a
very large amount of backup data, possibility more than the capacity of
your cluster followed by a tremendous amount of compaction. If your
topology changes life could really get miserable. I would counsel having
period snapshots so that your possible bad day in the future is less bad.
On Jan 13, 2017 8:01 AM, "Kunal Gangakhedkar" 
wrote:

> Great, thanks a lot to all for the help :)
>
> I finally took the dive and went with Razi's suggestions.
> In summary, this is what I did:
>
>- turn off incremental backups on each of the nodes in rolling fashion
>- remove the 'backups' directory from each keyspace on each node.
>
> This ended up freeing up almost 350GB on each node - yay :)
>
> Again, thanks a lot for the help, guys.
>
> Kunal
>
> On 12 January 2017 at 21:15, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
> raziuddin.kh...@nih.gov> wrote:
>
>> snapshots are slightly different than backups.
>>
>>
>>
>> In my explanation of the hardlinks created in the backups folder, notice
>> that compacted sstables, never end up in the backups folder.
>>
>>
>>
>> On the other hand, a snapshot is meant to represent the data at a
>> particular moment in time. Thus, the snapshots directory contains hardlinks
>> to all active sstables at the time the snapshot was taken, which would
>> include: compacted sstables; and any sstables from memtable flush or
>> streamed from other nodes that both exist in the table directory and the
>> backups directory.
>>
>>
>>
>> So, that would be the difference between snapshots and backups.
>>
>>
>>
>> Best regards,
>>
>> -Razi
>>
>>
>>
>>
>>
>> *From: *Alain RODRIGUEZ 
>> *Reply-To: *"user@cassandra.apache.org" 
>> *Date: *Thursday, January 12, 2017 at 9:16 AM
>>
>> *To: *"user@cassandra.apache.org" 
>> *Subject: *Re: Backups eating up disk space
>>
>>
>>
>> My 2 cents,
>>
>>
>>
>> As I mentioned earlier, we're not currently using snapshots - it's only
>> the backups that are bothering me right now.
>>
>>
>>
>> I believe backups folder is just the new name for the previously called
>> snapshots folder. But I can be completely wrong, I haven't played that much
>> with snapshots in new versions yet.
>>
>>
>>
>> Anyway, some operations in Apache Cassandra can trigger a snapshot:
>>
>>
>>
>> - Repair (when not using parallel option but sequential repairs instead)
>>
>> - Truncating a table (by default)
>>
>> - Dropping a table (by default)
>>
>> - Maybe other I can't think of... ?
>>
>>
>>
>> If you want to clean space but still keep a backup you can run:
>>
>>
>>
>> "nodetool clearsnapshots"
>>
>> "nodetool snapshot "
>>
>>
>>
>> This way and for a while, data won't be taking space as old files will be
>> cleaned and new files will be only hardlinks as detailed above. Then you
>> might want to work at a proper backup policy, probably implying getting
>> data out of production server (a lot of people uses S3 or similar
>> services). Or just do that from time to time, meaning you only keep a
>> backup and disk space behaviour will be hard to predict.
>>
>>
>>
>> C*heers,
>>
>> ---
>>
>> Alain Rodriguez - @arodream - al...@thelastpickle.com
>>
>> France
>>
>>
>>
>> The Last Pickle - Apache Cassandra Consulting
>>
>> http://www.thelastpickle.com
>>
>>
>>
>> 2017-01-12 6:42 GMT+01:00 Prasenjit Sarkar :
>>
>> Hi Kunal,
>>
>>
>>
>> Razi's post does give a very lucid description of how cassandra manages
>> the hard links inside the backup directory.
>>
>>
>>
>> Where it needs clarification is the following:
>>
>> --> incremental backups is a system wide setting and so its an all or
>> nothing approach
>>
>>
>>
>> --> as multiple people have stated, incremental backups do not create
>> hard links to compacted sstables. however, this can bloat the size of your
>> backups
>>
>>
>>
>> --> again as stated, it is a general industry practice to place backups
>> in a different secondary storage location than the main production site. So
>> best to move it to the secondary storage before applying rm on the backups
>> folder
>>
>>
>>
>> In my experience with production clusters, managing the backups folder
>> across multiple nodes can be painful if the objective is to ever recover
>> data. With the usual disclaimers, better to rely on third party vendors to
>> accomplish the needful rather than scripts/tablesnap.
>>
>>
>>
>> Regards
>>
>> Prasenjit
>>
>>
>>
>>
>>
>> On Wed, Jan 11, 2017 at 7:49 AM, Khaja, Raziuddin (NIH/NLM/NCBI) [C] <
>> raziuddin.kh...@nih.gov> wrote:
>>
>> Hello Kunal,
>>
>>
>>
>> Caveat: I am not a super-expert on Cassandra, but it helps to explain to
>> others, in order to eventually become an expert, so if my explanation is
>> wrong, I would hope others would correct me. J
>>
>>
>>
>> The active sstables/data files are are all the files located in th

Re: Help

2017-01-15 Thread Anshu Vajpayee
​Setup is not on cloud. We have  few nodes in one  DC(1) and same number of
nodes in other DC(2). We have dedicated firewall in-front on nodes.

Read and write happen with local quorum so those dont get affected but
hints get accumulated from one DC to other DC for replications. Hints are
also getting timed out sporadically in logs.

describe cluster didn't show any error , but in some case it was taking
longer time.

On Sun, Jan 15, 2017 at 3:01 AM, Aleksandr Ivanov  wrote:

> Could you share a bit your cluster setup? Do you use cloud for your
> deployment or dedicated firewalls in front of nodes?
>
> If gossip shows that everything is up it doesn't mean that all nodes can
> communicate with each other. I have noticed situations when TCP connection
> was killed by firewall and Cassandra didn't reconnect automatically. It can
> be easily detected with nodetool describecluster command.
>
> Aleksandr
>
>  shows - all nodes are up.
>>
>> But when  we perform writes , coordinator stores the hints. It means  -
>> coordinator was not able to deliver the writes to few nodes after meeting
>> consistency requirements.
>>
>> The nodes for which  writes were failing, are in different DC. Those
>> nodes do not have any load.
>>
>> Gossips shows everything is up.  I already set write timeout to 60 sec,
>> but no help.
>>
>> Can anyone encounter this scenario ? Network side everything is fine.
>>
>> Cassandra version is 2.1.13
>>
>> --
>> *Regards,*
>> *Anshu *
>>
>>
>>


-- 
*Regards,*
*Anshu *


Re: Help

2017-01-15 Thread Jonathan Haddad
I've heard enough stories of firewall issues that I'm willing to bet it's
the problem, if it's sitting between the nodes.
On Sun, Jan 15, 2017 at 9:32 AM Anshu Vajpayee 
wrote:

> ​Setup is not on cloud. We have  few nodes in one  DC(1) and same number
> of nodes in other DC(2). We have dedicated firewall in-front on nodes.
>
> Read and write happen with local quorum so those dont get affected but
> hints get accumulated from one DC to other DC for replications. Hints are
> also getting timed out sporadically in logs.
>
> describe cluster didn't show any error , but in some case it was taking
> longer time.
>
> On Sun, Jan 15, 2017 at 3:01 AM, Aleksandr Ivanov 
> wrote:
>
> Could you share a bit your cluster setup? Do you use cloud for your
> deployment or dedicated firewalls in front of nodes?
>
> If gossip shows that everything is up it doesn't mean that all nodes can
> communicate with each other. I have noticed situations when TCP connection
> was killed by firewall and Cassandra didn't reconnect automatically. It can
> be easily detected with nodetool describecluster command.
>
> Aleksandr
>
>  shows - all nodes are up.
>
> But when  we perform writes , coordinator stores the hints. It means  -
> coordinator was not able to deliver the writes to few nodes after meeting
> consistency requirements.
>
> The nodes for which  writes were failing, are in different DC. Those nodes
> do not have any load.
>
> Gossips shows everything is up.  I already set write timeout to 60 sec,
> but no help.
>
> Can anyone encounter this scenario ? Network side everything is fine.
>
> Cassandra version is 2.1.13
>
> --
> *Regards,*
> *Anshu *
>
>
>
>
>
> --
> *Regards,*
> *Anshu *
>
>
>


Re: RemoveNode CPU Spike Question

2017-01-15 Thread Shalom Sagges
Hi Anubhav,

This happened to us as well, on all nodes in the DC. We found that after
performing removenode, all other nodes suddenly started to do a lot of
compactions that increased CPU.
To mitigate that, we used nodetool disableautocompaction before removing
the node. Then, after removal, we slowly enabled autocompaction (a few
minutes between each enable) on the nodes one by one.
This helped with the CPU increase you've mentioned.



Shalom Sagges
DBA
T: +972-74-700-4035
 
 We Create Meaningful Connections



On Tue, Jan 10, 2017 at 8:03 PM, Anubhav Kale 
wrote:

> Well, looking through logs I confirmed that my understanding below is
> correct, but would be good to hear from experts for sure 😊
>
>
>
> *From:* Anubhav Kale [mailto:anubhav.k...@microsoft.com]
> *Sent:* Tuesday, January 10, 2017 9:58 AM
> *To:* user@cassandra.apache.org
> *Cc:* Sean Usher 
> *Subject:* RemoveNode CPU Spike Question
>
>
>
> Hello,
>
>
>
> Recently, I started noticing an interesting pattern. When I execute
> “removenode”, a subset of the nodes that now own the tokens result it in a
> CPU spike / disk activity, and sometimes SSTables on those nodes shoot up.
>
>
>
> After looking through the code, it appears to me that below function
> forces data to be streamed from some of the new nodes to the node from
> where “removenode” is kicked in. Is my understanding correct ?
>
>
>
> https://github.com/apache/cassandra/blob/d384e781d6f7c028dbe88cfe9dd3e9
> 66e72cd046/src/java/org/apache/cassandra/service/StorageService.java#L2548
> 
>
>
>
> Our nodes don’t run very hot, but it appears this streaming causes them to
> have issues. Have other people seen this ?
>
>
>
> Thanks !
>

-- 
This message may contain confidential and/or privileged information. 
If you are not the addressee or authorized to receive this on behalf of the 
addressee you must not use, copy, disclose or take action based on this 
message or any information herein. 
If you have received this message in error, please advise the sender 
immediately by reply email and delete this message. Thank you.