Everyone, thank you for the responses

Jon, to answer your question we’re using the General Purpose SSD with IOPS of 
1500/3000 so based on your definition I guess we’re using the awful ones since 
they aren’t provisioned IOPS. We’re also trying G1 garbage collection.

I also just looked at our application setting overrides and it appears we are 
using CL=ONE with RF=2 on both of the DCs. We’ve also disabled durable writes 
as shown in the keyspace creation statement below


-          CREATE KEYSPACE reporting WITH replication = {'class': 
'NetworkTopologyStrategy', 'us-east_dc1': '2', 'us-east_dc2': '2'}  AND 
durable_writes = false;

The main table we’re interacting with has these settings for compaction (These 
are Akka persistence journal tables)

compaction = {'bucket_high': '1.5', 'bucket_low': '0.5', 'class': 
'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'enabled': 
'true', 'max_threshold': '32', 'min_sstable_size': '50', 'min_threshold': '4', 
'tombstone_compaction_interval': '86400', 'tombstone_threshold': '0.2', 
'unchecked_tombstone_compaction': 'false'}

We’re also planning to set a TTL of about 3 hours on the table since we’re 
using these tables for business continuity so we don’t need the data to persist 
for long periods.

RICHARD NEY
TECHNICAL DIRECTOR, RESEARCH & DEVELOPMENT
+1 (978) 848.6640 WORK
+1 (916) 846.2353 MOBILE
UNITED STATES
richard....@aspect.com<mailto:richard....@aspect.com>
aspect.com<http://www.aspect.com/>

[mailSigLogo-rev.jpg]

From: Jonathan Haddad <j...@jonhaddad.com>
Reply-To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Date: Monday, December 26, 2016 at 2:02 PM
To: "user@cassandra.apache.org" <user@cassandra.apache.org>
Subject: Re: Has anyone deployed a production cluster with less than 6 nodes 
per DC?

There's nothing wrong with running a 3 node DC.  A million writes an hour is 
averaging less than 300 writes a second, which is pretty trivial.

Are you running provisioned SSD EBS volumes or the traditional, awful ones?

RF=2 with Quorum is kind of pointless, that's the same as CL=ALL.  Not 
recommended.  I don't know why your timeouts are happening, but when they do, 
RF=2 w/ QUORUM is going to make the problem worse.  Either use RF=3 or use 
CL=ONE.

Your management is correct here.  Throwing more hardware at this problem is the 
wrong solution given that your current hardware should be able to handle over 
100x what it's doing right now.

Jon
This email (including any attachments) is proprietary to Aspect Software, Inc. 
and may contain information that is confidential. If you have received this 
message in error, please do not read, copy or forward this message. Please 
notify the sender immediately, delete it from your system and destroy any 
copies. You may not further disclose or distribute this email or its 
attachments.

Reply via email to