Re: Cassandra pending compaction tasks keeps increasing

aaron morton Mon, 28 Jan 2013 23:32:17 -0800

>     * Why nodetool repair increases the data size that much? It's not likely 
> that much data needs to be repaired. Will that happen for all the subsequent 
> repair?
Repair only detects differences in entire rows. If you have very wide rows then 
small differences in rows can result in a large amount of streaming. 
Streaming creates new SSTables on the receiving side, which then need to be 
compacted. So repair often results in compaction doing it's thing for a while.


>     * How to make LCS run faster? After almost a day, the LCS tasks only 
> dropped by 1000. I am afraid it will never catch up. We set

This is going to be tricky to diagnose, sorry for asking silly questions...

Do you have wide rows ? Are you seeing logging about "Compacting wide rows" ? 
Are you seeing GC activity logged or seeing CPU steal on a VM ? 
Have you tried disabling multithreaded_compaction ? 
Are you using Key Caches ? Have you tried disabling 
compaction_preheat_key_cache?
Can you enabled DEBUG level logging and make them available ?

Cheers

-----------------
Aaron Morton
Freelance Cassandra Developer
New Zealand

@aaronmorton
http://www.thelastpickle.com

On 29/01/2013, at 8:59 AM, Derek Williams <de...@fyrie.net> wrote:

> I could be wrong about this, but when repair is run, it isn't just values 
> that are streamed between nodes, it's entire sstables. This causes a lot of 
> duplicate data to be written which was already correct on the node, which 
> needs to be compacted away.
> 
> As for speeding it up, no idea.
> 
> 
> On Mon, Jan 28, 2013 at 12:16 PM, Wei Zhu <wz1...@yahoo.com> wrote:
> Any thoughts?
> 
> Thanks.
> -Wei
> 
> ----- Original Message -----
> From: "Wei Zhu" <wz1...@yahoo.com>
> To: user@cassandra.apache.org
> Sent: Friday, January 25, 2013 10:09:37 PM
> Subject: Re: Cassandra pending compaction tasks keeps increasing
> 
> 
> To recap the problem,
> 1.1.6 on SSD, 5 nodes, RF = 3, one CF only.
> After data load, initially all 5 nodes have very even data size (135G, each). 
> I ran nodetool repair -pr on node 1 which have replicates on node 2, node 3 
> since we set RF = 3.
> It appears that huge amount of data got transferred. Node 1 has 220G, node 2, 
> 3 have around 170G. Pending LCS task on node 1 is 15K and node 2, 3 have 
> around 7K each.
> Questions:
> 
>     * Why nodetool repair increases the data size that much? It's not likely 
> that much data needs to be repaired. Will that happen for all the subsequent 
> repair?
>     * How to make LCS run faster? After almost a day, the LCS tasks only 
> dropped by 1000. I am afraid it will never catch up. We set
> 
> 
>         * compaction_throughput_mb_per_sec = 500
>         * multithreaded_compaction: true
> 
> 
> Both Disk and CPU util are less than 10%. I understand LCS is single 
> threaded, any chance to speed it up?
> 
> 
>     * We use default SSTable size as 5M, Will increase the size of SSTable 
> help? What will happen if I change the setting after the data is loaded.
> 
> Any suggestion is very much appreciated.
> 
> -Wei
> 
> ----- Original Message -----
> 
> From: "Wei Zhu" <wz1...@yahoo.com>
> To: user@cassandra.apache.org
> Sent: Thursday, January 24, 2013 11:46:04 PM
> Subject: Re: Cassandra pending compaction tasks keeps increasing
> 
> I believe I am running into this one:
> 
> https://issues.apache.org/jira/browse/CASSANDRA-4765
> 
> By the way, I am using 1.1.6 (I though I was using 1.1.7) and this one is 
> fixed in 1.1.7.
> 
> ----- Original Message -----
> 
> From: "Wei Zhu" <wz1...@yahoo.com>
> To: user@cassandra.apache.org
> Sent: Thursday, January 24, 2013 11:18:59 PM
> Subject: Re: Cassandra pending compaction tasks keeps increasing
> 
> Thanks Derek,
> in the cassandra-env.sh, it says
> 
> # reduce the per-thread stack size to minimize the impact of Thrift
> # thread-per-client. (Best practice is for client connections to
> # be pooled anyway.) Only do so on Linux where it is known to be
> # supported.
> # u34 and greater need 180k
> JVM_OPTS="$JVM_OPTS -Xss180k"
> 
> What value should I use? Java defaults at 400K? Maybe try that first.
> 
> Thanks.
> -Wei
> 
> ----- Original Message -----
> From: "Derek Williams" <de...@fyrie.net>
> To: user@cassandra.apache.org, "Wei Zhu" <wz1...@yahoo.com>
> Sent: Thursday, January 24, 2013 11:06:00 PM
> Subject: Re: Cassandra pending compaction tasks keeps increasing
> 
> 
> Increasing the stack size in cassandra-env.sh should help you get past the 
> stack overflow. Doesn't help with your original problem though.
> 
> 
> 
> On Fri, Jan 25, 2013 at 12:00 AM, Wei Zhu < wz1...@yahoo.com > wrote:
> 
> 
> Well, even after restart, it throws the the same exception. I am basically 
> stuck. Any suggestion to clear the pending compaction tasks? Below is the end 
> of stack trace:
> 
> at com.google.common.collect.Sets$1.iterator(Sets.java:578)
> at com.google.common.collect.Sets$1.iterator(Sets.java:578)
> at com.google.common.collect.Sets$1.iterator(Sets.java:578)
> at com.google.common.collect.Sets$1.iterator(Sets.java:578)
> at com.google.common.collect.Sets$3.iterator(Sets.java:667)
> at com.google.common.collect.Sets$3.size(Sets.java:670)
> at com.google.common.collect.Iterables.size(Iterables.java:80)
> at org.apache.cassandra.db.DataTracker.buildIntervalTree(DataTracker.java:557)
> at 
> org.apache.cassandra.db.compaction.CompactionController.<init>(CompactionController.java:69)
> at 
> org.apache.cassandra.db.compaction.CompactionTask.execute(CompactionTask.java:105)
> at 
> org.apache.cassandra.db.compaction.LeveledCompactionTask.execute(LeveledCompactionTask.java:50)
> at 
> org.apache.cassandra.db.compaction.CompactionManager$1.runMayThrow(CompactionManager.java:154)
> at org.apache.cassandra.utils.WrappedRunnable.run(WrappedRunnable.java:30)
> at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
> at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source)
> at java.util.concurrent.FutureTask.run(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
> at java.lang.Thread.run(Unknown Source)
> 
> Any suggestion is very much appreciated
> 
> -Wei
> 
> 
> 
> ----- Original Message -----
> From: "Wei Zhu" < wz1...@yahoo.com >
> To: user@cassandra.apache.org
> Sent: Thursday, January 24, 2013 10:55:07 PM
> Subject: Re: Cassandra pending compaction tasks keeps increasing
> 
> Do you mean 90% of the reads should come from 1 SSTable?
> 
> By the way, after I finished the data migrating, I ran nodetool repair -pr on 
> one of the nodes. Before nodetool repair, all the nodes have the same disk 
> space usage. After I ran the nodetool repair, the disk space for that node 
> jumped from 135G to 220G, also there are more than 15000 pending compaction 
> tasks. After a while , Cassandra started to throw the exception like below 
> and stop compacting. I had to restart the node. By the way, we are using 
> 1.1.7. Something doesn't seem right.
> 
> 
> INFO [CompactionExecutor:108804] 2013-01-24 22:23:10,427 CompactionTask.java 
> (line 109) Compacting 
> [SSTableReader(path='/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-753782-Data.db')]
> INFO [CompactionExecutor:108804] 2013-01-24 22:23:11,610 CompactionTask.java 
> (line 221) Compacted to 
> [/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-754996-Data.db,]. 
> 5,259,403 to 5,259,403 (~100% of original) bytes for 1,983 keys at 
> 4.268730MB/s. Time: 1,175ms.
> INFO [CompactionExecutor:108805] 2013-01-24 22:23:11,617 CompactionTask.java 
> (line 109) Compacting 
> [SSTableReader(path='/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-754880-Data.db')]
> INFO [CompactionExecutor:108805] 2013-01-24 22:23:12,828 CompactionTask.java 
> (line 221) Compacted to 
> [/ssd/cassandra/data/zoosk/friends/zoosk-friends-hf-754997-Data.db,]. 
> 5,272,746 to 5,272,746 (~100% of original) bytes for 1,941 keys at 
> 4.152339MB/s. Time: 1,211ms.
> ERROR [CompactionExecutor:108806] 2013-01-24 22:23:13,048 
> AbstractCassandraDaemon.java (line 135) Exception in thread 
> Thread[CompactionExecutor:108806,1,main]
> java.lang.StackOverflowError
> at java.util.AbstractList$Itr.hasNext(Unknown Source)
> at com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
> at com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)
> at com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
> at com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)
> at com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
> at com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)
> at com.google.common.collect.Iterators$5.hasNext(Iterators.java:517)
> at com.google.common.collect.Iterators$3.hasNext(Iterators.java:114)
> 
> 
> ----- Original Message -----
> From: "aaron morton" < aa...@thelastpickle.com >
> To: user@cassandra.apache.org
> Sent: Wednesday, January 23, 2013 2:40:45 PM
> Subject: Re: Cassandra pending compaction tasks keeps increasing
> 
> The histogram does not look right to me, too many SSTables for an LCS CF.
> 
> 
> It's a symptom no a cause. If LCS is catching up though it should be more 
> like the distribution in the linked article.
> 
> 
> Cheers
> 
> 
> 
> 
> 
> 
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> 
> On 23/01/2013, at 10:57 AM, Jim Cistaro < jcist...@netflix.com > wrote:
> 
> 
> 
> 
> What version are you using? Are you seeing any compaction related assertions 
> in the logs?
> 
> 
> Might be https://issues.apache.org/jira/browse/CASSANDRA-4411
> 
> 
> We experienced this problem of the count only decreasing to a certain number 
> and then stopping. If you are idle, it should go to 0. I have not seen it 
> overestimate for zero, only for non-zero amounts.
> 
> 
> As for timeouts etc, you will need to look at things like nodetool tpstats to 
> see if you have pending transactions queueing up.
> 
> 
> Jc
> 
> 
> From: Wei Zhu < wz1...@yahoo.com >
> Reply-To: " user@cassandra.apache.org " < user@cassandra.apache.org >, Wei 
> Zhu < wz1...@yahoo.com >
> Date: Tuesday, January 22, 2013 12:56 PM
> To: " user@cassandra.apache.org " < user@cassandra.apache.org >
> Subject: Re: Cassandra pending compaction tasks keeps increasing
> 
> 
> 
> 
> 
> 
> Thanks Aaron and Jim for your reply. The data import is done. We have about 
> 135G on each node and it's about 28K SStables. For normal operation, we only 
> have about 90 writes per seconds, but when I ran nodetool compationstats, it 
> remains at 9 and hardly changes. I guess it's just an estimated number.
> 
> 
> When I ran histogram,
> 
> 
> 
> Offset SSTables Write Latency Read Latency Row Size Column Count
> 1 2644 0 0 0 18660057
> 2 8204 0 0 0 9824270
> 3 11198 0 0 0 6968475
> 4 4269 6 0 0 5510745
> 5 517 29 0 0 4595205
> 
> 
> 
> 
> You can see about half of the reads result in 3 SSTables. Majority of read 
> latency are under 5ms, only a dozen are over 10ms. We haven't fully turn on 
> reads yet, only 60 reads per second. We see about 20 read timeout during the 
> past 12 hours. Not a single warning from Cassandra Log.
> 
> 
> Is it normal for Cassandra to timeout some requests? We set rpc timeout to be 
> 1s, it shouldn't time out any of them?
> 
> 
> Thanks.
> -Wei
> 
> 
> 
> 
> 
> From: aaron morton < aa...@thelastpickle.com >
> To: user@cassandra.apache.org
> Sent: Monday, January 21, 2013 12:21 AM
> Subject: Re: Cassandra pending compaction tasks keeps increasing
> 
> 
> 
> The main guarantee LCS gives you is that most reads will only touch 1 row 
> http://www.datastax.com/dev/blog/when-to-use-leveled-compaction
> 
> 
> If compaction is falling behind this may not hold.
> 
> 
> nodetool cfhistograms tells you how many SSTables were read from for reads. 
> It's a recent histogram that resets each time you read from it.
> 
> 
> Also, parallel levelled compaction in 1.2 
> http://www.datastax.com/dev/blog/performance-improvements-in-cassandra-1-2
> 
> 
> Cheers
> 
> 
> 
> 
> 
> 
> 
> 
> -----------------
> Aaron Morton
> Freelance Cassandra Developer
> New Zealand
> 
> 
> @aaronmorton
> http://www.thelastpickle.com
> 
> 
> On 20/01/2013, at 7:49 AM, Jim Cistaro < jcist...@netflix.com > wrote:
> 
> 
> 
> 
> 
> 1) In addition to iostat, dstat is a good tool to see wht kind of disck 
> throuput your are getting. That would be one thing to monitor.
> 2) For LCS, we also see pending compactions skyrocket. During load, LCS will 
> create a lot of small sstables which will queue up for compaction.
> 3) For us the biggest concern is not how high the pending count gets, but how 
> often it gets back down near zero. If your load is something you can do in 
> segments or pause, then you can see how fast the cluster recovers on the 
> compactions.
> 4) One thing which we tune per cluster is the size of the files. Increasing 
> this from 5MB can sometimes improve things. But I forget if we have ever 
> changed this after starting data load.
> 
> 
> Is your cluster receiving read traffic during this data migration? If so, I 
> would say that read latency is your best measure. If the high number of 
> SSTables waiting to compact is not hurting your reads, then you are probably 
> ok. Since you are on SSD, there is a good chance the compactions are not 
> hurting you. As for compactionthroughput, we set ours high for SSD. You 
> usually wont use it all because the compactions are usually single threaded. 
> Dstat will help you measure this.
> 
> 
> I hope this helps,
> jc
> 
> 
> From: Wei Zhu < wz1...@yahoo.com >
> Reply-To: " user@cassandra.apache.org " < user@cassandra.apache.org >, Wei 
> Zhu < wz1...@yahoo.com >
> Date: Friday, January 18, 2013 12:10 PM
> To: Cassandr usergroup < user@cassandra.apache.org >
> Subject: Cassandra pending compaction tasks keeps increasing
> 
> 
> 
> 
> 
> 
> Hi,
> When I run nodetool compactionstats
> 
> 
> I see the number of pending tasks keep going up steadily.
> 
> 
> I tried to increase the compactionthroughput, by using
> 
> 
> nodetool setcompactionthroughput
> 
> 
> I even tried the extreme to set it to 0 to disable the throttling.
> 
> 
> I checked iostats and we have SSD for data, the disk util is less than 5% 
> which means it's not I/O bound, CPU is also less than 10%
> 
> 
> We are using levelcompaction and in the process of migrating data. We have 
> 4500 writes per second and very few reads. We have about 70G data now and 
> will grow to 150G when the migration finishes. We only have one CF and right 
> now the number of SSTable is around 15000, write latency is still under 0.1ms.
> 
> 
> Anything needs to be concerned? Or anything I can do to reduce the number of 
> pending compaction?
> 
> 
> Thanks.
> -Wei
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> 
> --
> 
> Derek Williams
> 
> 
> 
> 
> 
> 
> -- 
> Derek Williams

Re: Cassandra pending compaction tasks keeps increasing

Reply via email to