Re: Cassandra as storage for cache data

Jeremy Hanna Tue, 25 Jun 2013 06:06:27 -0700

If you have rapidly expiring data, then tombstones are probably filling your 
disk and your heap (depending on how you order the data on disk).  To check to 
see if your queries are affected by tombstones, you might try using the query 
tracing that's built-in to 1.2.
See:
http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets
  -- has an example of tracing where you can see tombstones affecting the query
http://www.datastax.com/dev/blog/tracing-in-cassandra-1-2

You'll want to consider reducing the gc_grace period from the default of 10 
days for those column families - with the understanding why gc_grace exists in 
the first place, see http://wiki.apache.org/cassandra/DistributedDeletes .  
Then once the gc_grace period has passed, the tombstones will stay around until 
they are compacted away.  So there are two options currently to compact them 
away more quickly:
1) use leveled compaction - see 
http://www.datastax.com/dev/blog/when-to-use-leveled-compaction  Leveled 
compaction only requires 10% headroom (as opposed to 50% for size tiered 
compaction) for amount of disk that needs to be kept free.
2) if 1 doesn't work and you're still seeing performance degrading and the 
tombstones aren't getting cleared out fast enough, you might consider using 
size tiered compaction but performing regular major compactions to get rid of 
expired data.

Keep in mind though that if you use gc_grace of 0 and do any kind of manual 
deletes outside of TTLs, you probably want to do the deletes at 
ConsistencyLevel.ALL or else if a node goes down, then comes back up, there's a 
chance that deleted data may be resurrected.  That only applies to non-ttl data 
where you manually delete it.  See the explanation of distributed deletes for 
more information.

On 25 Jun 2013, at 13:31, Dmitry Olshansky <dmitry.olshan...@gridnine.com> 
wrote:

> Hello,
> 
> we are using Cassandra as a data storage for our caching system. Our 
> application generates about 20 put and get requests per second. An average 
> size of one cache item is about 500 Kb.
> 
> Cache items are placed into one column family with TTL set to 20 - 60 
> minutes. Keys and values are bytes (not utf8 strings). Compaction strategy is 
> SizeTieredCompactionStrategy.
> 
> We setup Cassandra 1.2.6 cluster of 4 nodes. Replication factor is 2. Each 
> node has 10GB of RAM and enough space on HDD.
> 
> Now when we're putting this cluster into the load it's quickly fills with our 
> runtime data (about 5 GB on every node) and we start observing performance 
> degradation with often timeouts on client side.
> 
> We see that on each node compaction starts very frequently and lasts for 
> several minutes to complete. It seems that each node usually busy with 
> compaction process.
> 
> Here the questions:
> 
> What are the recommended setup configuration for our use case?
> 
> Is it makes sense to somehow tell Cassandra to keep all data in memory 
> (memtables) to eliminate flushing it to disk (sstables) thus decreasing 
> number of compactions? How to achieve this behavior?
> 
> Cassandra is starting with default shell script that gives the following 
> command line:
> 
> jsvc.exec -user cassandra -home /usr/lib/jvm/java-6-openjdk-amd64/jre/bin/../ 
> -pidfile /var/run/cassandra.pid -errfile &1 -outfile 
> /var/log/cassandra/output.log -cp <CLASSPATH_SKIPPED> 
> -Dlog4j.configuration=log4j-server.properties 
> -Dlog4j.defaultInitOverride=true 
> -XX:HeapDumpPath=/var/lib/cassandra/java_1371805844.hprof 
> -XX:ErrorFile=/var/lib/cassandra/hs_err_1371805844.log -ea 
> -javaagent:/usr/share/cassandra/lib/jamm-0.2.5.jar -XX:+UseThreadPriorities 
> -XX:ThreadPriorityPolicy=42 -Xms2500M -Xmx2500M -Xmn400M 
> -XX:+HeapDumpOnOutOfMemoryError -Xss180k -XX:+UseParNewGC 
> -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:SurvivorRatio=8 
> -XX:MaxTenuringThreshold=1 -XX:CMSInitiatingOccupancyFraction=75 
> -XX:+UseCMSInitiatingOccupancyOnly -XX:+UseTLAB 
> -Djava.net.preferIPv4Stack=true -Dcom.sun.management.jmxremote.port=7199 
> -Dcom.sun.management.jmxremote.ssl=false 
> -Dcom.sun.management.jmxremote.authenticate=false 
> org.apache.cassandra.service.CassandraDaemon
> 
> -- 
> Best regards,
> Dmitry Olshansky
>

Re: Cassandra as storage for cache data

Reply via email to