Hi Chris, cfstats shows large partitions.
Percentile SSTables Write Latency Read Latency Partition Size Cell Count (micros) (micros) (bytes) 50% 5.00 60.00 770.00 924 17 75% 7.00 86.00 1109.00 1109 17 95% 14.00 103.00 4768.00 2759 17 98% 17.00 124.00 6866.00 5722 17 99% 17.00 149.00 20501.00 8239 17 Min 0.00 6.00 25.00 21 0 Max 20.00 24601.00 5839588.00 30130992 126934 We use primary key in most our query, however I still don’t figure out the query patter hanging. “Images" is a list of url in our tables. It seems to me that the cell has huge image urls, but I can’t find the key of that row. Below is a piece of logs, which shows it repeatedly collected images with the same timestamp and increasing uuid in several seconds… Many Thanks, Xihui TRACE [SharedPool-Worker-1] 2015-12-05 03:03:51,574 SliceQueryFilter.java:269 - collecting 0 of 2147483647: :false:0@1449255672527000!86400000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:51,575 SliceQueryFilter.java:269 - collecting 1 of 2147483647: author:false:31@1448270734894000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:51,576 SliceQueryFilter.java:269 - collecting 1 of 2147483647: content:true:4@1448270734894000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:51,576 SliceQueryFilter.java:269 - collecting 1 of 2147483647: created_at:false:8@1448270734894000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:51,577 SliceQueryFilter.java:269 - collecting 1 of 2147483647: excerpt:false:0@1448270734894000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,125 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:f676b9416e6b11e5989c57f4c5416142:false:39@1444384568020000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,125 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84072dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,126 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84172dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,127 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84272dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,127 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84372dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,128 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84472dc11e589dd1f1c8f5df499:false:58@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,129 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84572dc11e589dd1f1c8f5df499:false:58@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,129 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84672dc11e589dd1f1c8f5df499:false:58@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,130 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84772dc11e589dd1f1c8f5df499:false:58@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,131 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84872dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,131 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84972dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,132 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84a72dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,132 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84b72dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,133 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84c72dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,133 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84d72dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,134 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84e72dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,134 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab84f72dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,135 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85072dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,135 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85172dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,136 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85272dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,136 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85372dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,137 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85472dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,137 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85572dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,138 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85672dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,138 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85772dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,139 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85872dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,139 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85972dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,140 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85a72dc11e589dd1f1c8f5df499:false:39@1444872636831000 TRACE [SharedPool-Worker-1] 2015-12-05 03:03:52,140 SliceQueryFilter.java:269 - collecting 1 of 2147483647: images:562ab85b72dc11e589dd1f1c8f5df499:false:39@1444872636831000 > On Dec 5, 2015, at 4:33 AM, Chris Lohfink <clohfin...@gmail.com> wrote: > > May just be going over a lot of data. Does output of 'nodetool cfstats' show > large partitions? (partition maximum bytes). "collecting 1 of 2147483647" is > suspicious. Are your queries using ALLOW FILTERING or have very high limits? > If trying to read 2 billion entries in 1 query you will have memory issues. > May want to check with jvmtop/htop to make sure its not GCs using CPU as > well. Is there a sane amount of sstables? Providing some more details can > help (cfstats, cfhistograms, queries your making, schema) > > Chris > > On Fri, Dec 4, 2015 at 10:43 AM, Xihui He <xihu...@gmail.com > <mailto:xihu...@gmail.com>> wrote: > Dear All, > > Recently one of node in our cluster has high cpu load ~100%. It seems to me > there is a infinite loop in SliceQueryFilter. > > The below log is repeated in 5000ms (range_request_timeout_in_ms). > TRACE [SharedPool-Worker-11] 2015-12-04 19:25:33,418 > SliceQueryFilter.java:269 - collecting 1 of 2147483647 <tel:2147483647>: > images:b608719e728d11e5812b57f4c5416142:false:62@1444838867382000 > > Our version is 2.19. Here is the bt: > org.apache.cassandra.db.composites.AbstractComposite.isEmpty(AbstractComposite.java:30) > org.apache.cassandra.db.OnDiskAtom$Serializer.deserializeFromSSTable(OnDiskAtom.java:76) > org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:52) > org.apache.cassandra.db.AbstractCell$1.computeNext(AbstractCell.java:46) > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:83) > org.apache.cassandra.db.columniterator.SimpleSliceReader.computeNext(SimpleSliceReader.java:37) > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > org.apache.cassandra.db.columniterator.SSTableSliceIterator.hasNext(SSTableSliceIterator.java:82) > org.apache.cassandra.db.filter.QueryFilter$2.getNext(QueryFilter.java:173) > org.apache.cassandra.db.filter.QueryFilter$2.hasNext(QueryFilter.java:156) > org.apache.cassandra.utils.MergeIterator$Candidate.advance(MergeIterator.java:146) > org.apache.cassandra.utils.MergeIterator$ManyToOne.advance(MergeIterator.java:125) > org.apache.cassandra.utils.MergeIterator$ManyToOne.computeNext(MergeIterator.java:99) > com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:143) > com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:138) > org.apache.cassandra.db.filter.SliceQueryFilter.collectReducedColumns(SliceQueryFilter.java:264) > org.apache.cassandra.db.filter.QueryFilter.collateColumns(QueryFilter.java:108) > org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:82) > org.apache.cassandra.db.filter.QueryFilter.collateOnDiskAtom(QueryFilter.java:69) > org.apache.cassandra.db.CollationController.collectAllData(CollationController.java:314) > org.apache.cassandra.db.CollationController.getTopLevelColumns(CollationController.java:65) > org.apache.cassandra.db.ColumnFamilyStore.getTopLevelColumns(ColumnFamilyStore.java:2033) > org.apache.cassandra.db.ColumnFamilyStore.getColumnFamily(ColumnFamilyStore.java:1876) > org.apache.cassandra.db.Keyspace.getRow(Keyspace.java:357) > org.apache.cassandra.db.SliceFromReadCommand.getRow(SliceFromReadCommand.java:85) > org.apache.cassandra.db.ReadVerbHandler.doVerb(ReadVerbHandler.java:47) > org.apache.cassandra.net.MessageDeliveryTask.run(MessageDeliveryTask.java:64) > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > org.apache.cassandra.concurrent.AbstractTracingAwareExecutorService$FutureTask.run(AbstractTracingAwareExecutorService.java:164) > org.apache.cassandra.concurrent.SEPWorker.run(SEPWorker.java:105) > java.lang.Thread.run(Thread.java:745) > > Appreciate if anyone could help. > > Thanks in advance, > Xihui >