Are you able to put together a test case, maybe using the stress testing tool, that models your data layout?
If so can you add it to https://issues.apache.org/jira/browse/CASSANDRA-3592 Thanks ----------------- Aaron Morton Freelance Developer @aaronmorton http://www.thelastpickle.com On 16/07/2012, at 8:17 PM, 黄荣桢 wrote: > Hello, > > I find the compaction of my secondary index takes a long time and occupy a > lot of CPU. > > INFO [CompactionExecutor:8] 2012-07-16 12:03:16,408 CompactionTask.java > (line 213) Compacted to [XXX]. 71,018,346 to 9,020 (~0% of original) bytes > for 3 keys at 0.000022MB/s. Time: 397,602ms. > > The stack of this over load Thread is: > "CompactionReducer:5" - Thread t@1073 > java.lang.Thread.State: RUNNABLE > at java.util.AbstractList$Itr.remove(AbstractList.java:360) > at > org.apache.cassandra.db.ColumnFamilyStore.removeDeletedStandard(ColumnFamilyStore.java:851) > at > org.apache.cassandra.db.ColumnFamilyStore.removeDeletedColumnsOnly(ColumnFamilyStore.java:835) > at > org.apache.cassandra.db.ColumnFamilyStore.removeDeleted(ColumnFamilyStore.java:826) > at > org.apache.cassandra.db.compaction.PrecompactedRow.removeDeletedAndOldShards(PrecompactedRow.java:77) > at > org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer$MergeTask.call(ParallelCompactionIterable.java:224) > at > org.apache.cassandra.db.compaction.ParallelCompactionIterable$Reducer$MergeTask.call(ParallelCompactionIterable.java:198) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > Locked ownable synchronizers: > - locked <4be5863d> (a > java.util.concurrent.locks.ReentrantLock$NonfairSync) > > I guess this problem due to huge amount of columns in my index. The column > which is indexed only have 3 kinds of values, and one possible value have > several million of record, so this index have several million columns. > Compact these columns take a long time. > > I find a similar issue on the jira: > https://issues.apache.org/jira/browse/CASSANDRA-3592 > > Is there any way to work around this issue? Is there any way to improve the > efficiency to compact this index? >