Re: 100% CPU utilization, ParNew and never completing compactions

Arne Claassen Tue, 16 Dec 2014 16:36:46 -0800

That's just the thing. There is nothing in the logs except the constant ParNew 
collections like


DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) GC 
for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888

But the load is staying continuously high.

There's always some compaction on just that one table, media_tracks_raw going 
on and those values rarely changed (certainly the remaining time is meaningless)

pending tasks: 17
          compaction type        keyspace           table       completed       
    total      unit  progress
               Compaction           mediamedia_tracks_raw       444294932      
1310653468     bytes    33.90%
               Compaction           mediamedia_tracks_raw       131931354      
3411631999     bytes     3.87%
               Compaction           mediamedia_tracks_raw        30308970     
23097672194     bytes     0.13%
               Compaction           mediamedia_tracks_raw       899216961      
1815591081     bytes    49.53%
Active compaction remaining time :   0h27m56s

Here's a sample of a query trace:

 activity                                                                       
                  | timestamp    | source        | source_elapsed
--------------------------------------------------------------------------------------------------+--------------+---------------+----------------
                                                                               
execute_cql3_query | 00:11:46,612 | 10.140.22.236 |              0
 Parsing select * from media_tracks_raw where id 
=74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 10.140.22.236 
|             47
                                                                              
Preparing statement | 00:11:46,612 | 10.140.22.236 |            234
                                                                 Sending 
message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 |           7190
                                                             Message received 
from /10.140.22.236 | 00:11:46,622 |  10.140.21.54 |             12
                                             Executing single-partition query 
on media_tracks_raw | 00:11:46,644 |  10.140.21.54 |          21971
                                                                     Acquiring 
sstable references | 00:11:46,644 |  10.140.21.54 |          22029
                                                                      Merging 
memtable tombstones | 00:11:46,644 |  10.140.21.54 |          22131
                                                        Bloom filter allows 
skipping sstable 1395 | 00:11:46,644 |  10.140.21.54 |          22245
                                                        Bloom filter allows 
skipping sstable 1394 | 00:11:46,644 |  10.140.21.54 |          22279
                                                        Bloom filter allows 
skipping sstable 1391 | 00:11:46,644 |  10.140.21.54 |          22293
                                                        Bloom filter allows 
skipping sstable 1381 | 00:11:46,644 |  10.140.21.54 |          22304
                                                        Bloom filter allows 
skipping sstable 1376 | 00:11:46,644 |  10.140.21.54 |          22317
                                                        Bloom filter allows 
skipping sstable 1368 | 00:11:46,644 |  10.140.21.54 |          22328
                                                        Bloom filter allows 
skipping sstable 1365 | 00:11:46,644 |  10.140.21.54 |          22340
                                                        Bloom filter allows 
skipping sstable 1351 | 00:11:46,644 |  10.140.21.54 |          22352
                                                        Bloom filter allows 
skipping sstable 1367 | 00:11:46,644 |  10.140.21.54 |          22363
                                                        Bloom filter allows 
skipping sstable 1380 | 00:11:46,644 |  10.140.21.54 |          22374
                                                        Bloom filter allows 
skipping sstable 1343 | 00:11:46,644 |  10.140.21.54 |          22386
                                                        Bloom filter allows 
skipping sstable 1342 | 00:11:46,644 |  10.140.21.54 |          22397
                                                        Bloom filter allows 
skipping sstable 1334 | 00:11:46,644 |  10.140.21.54 |          22408
                                                        Bloom filter allows 
skipping sstable 1377 | 00:11:46,644 |  10.140.21.54 |          22429
                                                        Bloom filter allows 
skipping sstable 1330 | 00:11:46,644 |  10.140.21.54 |          22441
                                                        Bloom filter allows 
skipping sstable 1329 | 00:11:46,644 |  10.140.21.54 |          22452
                                                        Bloom filter allows 
skipping sstable 1328 | 00:11:46,644 |  10.140.21.54 |          22463
                                                        Bloom filter allows 
skipping sstable 1327 | 00:11:46,644 |  10.140.21.54 |          22475
                                                        Bloom filter allows 
skipping sstable 1326 | 00:11:46,644 |  10.140.21.54 |          22488
                                                        Bloom filter allows 
skipping sstable 1320 | 00:11:46,644 |  10.140.21.54 |          22506
                                                        Bloom filter allows 
skipping sstable 1319 | 00:11:46,644 |  10.140.21.54 |          22518
                                                        Bloom filter allows 
skipping sstable 1318 | 00:11:46,644 |  10.140.21.54 |          22528
                                                        Bloom filter allows 
skipping sstable 1317 | 00:11:46,644 |  10.140.21.54 |          22540
                                                        Bloom filter allows 
skipping sstable 1316 | 00:11:46,644 |  10.140.21.54 |          22552
                                                        Bloom filter allows 
skipping sstable 1315 | 00:11:46,644 |  10.140.21.54 |          22563
                                                        Bloom filter allows 
skipping sstable 1314 | 00:11:46,644 |  10.140.21.54 |          22572
                                                        Bloom filter allows 
skipping sstable 1313 | 00:11:46,644 |  10.140.21.54 |          22583
                                                        Bloom filter allows 
skipping sstable 1312 | 00:11:46,644 |  10.140.21.54 |          22594
                                                        Bloom filter allows 
skipping sstable 1311 | 00:11:46,644 |  10.140.21.54 |          22605
                                                        Bloom filter allows 
skipping sstable 1310 | 00:11:46,644 |  10.140.21.54 |          22616
                                                        Bloom filter allows 
skipping sstable 1309 | 00:11:46,644 |  10.140.21.54 |          22628
                                                        Bloom filter allows 
skipping sstable 1308 | 00:11:46,644 |  10.140.21.54 |          22640
                                                        Bloom filter allows 
skipping sstable 1307 | 00:11:46,644 |  10.140.21.54 |          22651
                                                        Bloom filter allows 
skipping sstable 1306 | 00:11:46,644 |  10.140.21.54 |          22663
                                                        Bloom filter allows 
skipping sstable 1305 | 00:11:46,644 |  10.140.21.54 |          22674
                                                        Bloom filter allows 
skipping sstable 1304 | 00:11:46,644 |  10.140.21.54 |          22684
                                                        Bloom filter allows 
skipping sstable 1303 | 00:11:46,644 |  10.140.21.54 |          22696
                                                        Bloom filter allows 
skipping sstable 1302 | 00:11:46,644 |  10.140.21.54 |          22707
                                                        Bloom filter allows 
skipping sstable 1301 | 00:11:46,644 |  10.140.21.54 |          22718
                                                        Bloom filter allows 
skipping sstable 1300 | 00:11:46,644 |  10.140.21.54 |          22729
                                                        Bloom filter allows 
skipping sstable 1299 | 00:11:46,644 |  10.140.21.54 |          22740
                                                        Bloom filter allows 
skipping sstable 1298 | 00:11:46,644 |  10.140.21.54 |          22752
                                                        Bloom filter allows 
skipping sstable 1297 | 00:11:46,644 |  10.140.21.54 |          22763
                                                        Bloom filter allows 
skipping sstable 1296 | 00:11:46,644 |  10.140.21.54 |          22774
                                                                   Key cache 
hit for sstable 1295 | 00:11:46,644 |  10.140.21.54 |          22817
                                                      Seeking to partition 
beginning in data file | 00:11:46,644 |  10.140.21.54 |          22842
                       Skipped 0/89 non-slice-intersecting sstables, included 0 
due to tombstones | 00:11:46,646 |  10.140.21.54 |          24109
                                                       Merging data from 
memtables and 1 sstables | 00:11:46,646 |  10.140.21.54 |          24238
                                                             Read 101 live and 
0 tombstoned cells | 00:11:46,663 |  10.140.21.54 |          41389
                                                             Enqueuing response 
to /10.140.22.236 | 00:11:46,663 |  10.140.21.54 |          41831
                                                                Sending message 
to /10.140.22.236 | 00:11:46,664 |  10.140.21.54 |          41972
                                                              Message received 
from /10.140.21.54 | 00:11:46,671 | 10.140.22.236 |          59498
                                                           Processing response 
from /10.140.21.54 | 00:11:46,672 | 10.140.22.236 |          59563
                                                                                
 Request complete | 00:11:46,704 | 10.140.22.236 |          92781

Every query I did always just had three mentions of tombstones
  Merging memtable tombstones
  Skipped 0/89 non-slice-intersecting sstables, included 0 due to tombstones  
  Read 101 live and 0 tombstoned cells
And unless i misread those, not of them claim that there are any tombstones.


On Dec 16, 2014, at 4:26 PM, Ryan Svihla <rsvi...@datastax.com> wrote:

> manual forced compactions create more problems than they solve, if you have 
> no evidence of tombstones in your selects (which seems odd, can you share 
> some of the tracing output?), then I'm not sure what it would solve for you.
> 
> Compaction running could explain a high load, logs messages with ERRORS, 
> WARN, GCInspector are all meaningful there, I suggest search jira for your 
> version to see if there are any interesting bugs.
> 
> 
> 
> On Tue, Dec 16, 2014 at 6:14 PM, Arne Claassen <a...@emotient.com> wrote:
> I just did a wide set of selects and ran across no tombstones. But while on 
> the subject of gc_grace_seconds, any reason, on a small cluster not to set it 
> to something low like a single day. It seems like 10 days is only need to 
> large clusters undergoing long partition splits, or am i misunderstanding 
> gc_grace_seconds.
> 
> Now, given all that, does any of this explain a high load when the cluster is 
> idle? Is it compaction catching up and would manual forced compaction 
> alleviate that?
> 
> thanks,
> arne
> 
> 
> On Dec 16, 2014, at 3:28 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
> 
>> so a delete is really another write for gc_grace_seconds (default 10 days), 
>> if you get enough tombstones it can make managing your cluster a challenge 
>> as is. open up cqlsh, turn on tracing and try a few queries..how many 
>> tombstones are scanned for a given query? It's possible the heap problems 
>> you're seeing are actually happening on the query side and not on the ingest 
>> side, the severity of this depends on driver and cassandra version, but 
>> older drivers and versions of cassandra could easily overload heap with 
>> expensive selects, when layered over tombstones it's certainly becomes a 
>> possibility this is your root cause.
>> 
>> Now this will primarily create more load on compaction and depending on your 
>> cassandra version there maybe some other issue at work, but something I can 
>> tell you is every time I see 1 dropped mutation I see a cluster that was 
>> overloaded enough it had to shed load. If I see 200k I see a 
>> cluster/configuration/hardware that is badly overloaded.
>> 
>> I suggest the following
>> trace some of the queries used in prod
>> monitor your ingest rate, see at what levels you run into issues 
>> (GCInspector log messages, dropped mutations, etc)
>> heap configuration we mentioned earlier..go ahead and monitor heap usage, if 
>> it hits 75% repeated this is an indication of heavy load
>> monitor dropped mutations..any dropped mutation is evidence of an overloaded 
>> server, again the root cause can be many other problems that are solvable 
>> with current hardware, and LOTS of people runs with nodes with similar 
>> configuration.
>> 
>> On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen <a...@emotient.com> wrote:
>> Not using any secondary indicies and memtable_flush_queue_size is the 
>> default 4.
>> 
>> But let me tell you how data is "mutated" right now, maybe that will give 
>> you an insight on how this is happening
>> 
>> Basically the frame data table has the following primary key: PRIMARY KEY 
>> ((id), trackid, "timestamp")
>> 
>> Generally data is inserted once. So day to day writes are all new rows.
>> However, when out process for generating analytics for these rows changes, 
>> we run the media back through again, causing overwrites.
>> 
>> Up until last night, this was just a new insert because the PK never changed 
>> so it was always 1-to-1 overwrite of every row.
>> 
>> Last night was the first time that a new change went in where the PK could 
>> actually change so now the process is always, DELETE by partition key, 
>> insert all rows for partition key, repeat.
>> 
>> We two tables that have similar frame data projections and some other 
>> aggregates with much smaller row count per partition key.
>> 
>> hope that helps,
>> arne
>> 
>> On Dec 16, 2014, at 2:46 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>> 
>>> so you've got some blocked flush writers but you have a incredibly large 
>>> number of dropped mutations, are you using secondary indexes? and if so how 
>>> many? what is your flush queue set to?
>>> 
>>> On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen <a...@emotient.com> wrote:
>>> Of course QA decided to start a test batch (still relatively low traffic), 
>>> so I hope it doesn't throw the tpstats off too much
>>> 
>>> Node 1:
>>> Pool Name                    Active   Pending      Completed   Blocked  All 
>>> time blocked
>>> MutationStage                     0         0       13804928         0      
>>>            0
>>> ReadStage                         0         0          10975         0      
>>>            0
>>> RequestResponseStage              0         0        7725378         0      
>>>            0
>>> ReadRepairStage                   0         0           1247         0      
>>>            0
>>> ReplicateOnWriteStage             0         0              0         0      
>>>            0
>>> MiscStage                         0         0              0         0      
>>>            0
>>> HintedHandoff                     1         1             50         0      
>>>            0
>>> FlushWriter                       0         0            306         0      
>>>           31
>>> MemoryMeter                       0         0            719         0      
>>>            0
>>> GossipStage                       0         0         286505         0      
>>>            0
>>> CacheCleanupExecutor              0         0              0         0      
>>>            0
>>> InternalResponseStage             0         0              0         0      
>>>            0
>>> CompactionExecutor                4        14            159         0      
>>>            0
>>> ValidationExecutor                0         0              0         0      
>>>            0
>>> MigrationStage                    0         0              0         0      
>>>            0
>>> commitlog_archiver                0         0              0         0      
>>>            0
>>> AntiEntropyStage                  0         0              0         0      
>>>            0
>>> PendingRangeCalculator            0         0             11         0      
>>>            0
>>> MemtablePostFlusher               0         0           1781         0      
>>>            0
>>> 
>>> Message type           Dropped
>>> READ                         0
>>> RANGE_SLICE                  0
>>> _TRACE                       0
>>> MUTATION                391041
>>> COUNTER_MUTATION             0
>>> BINARY                       0
>>> REQUEST_RESPONSE             0
>>> PAGED_RANGE                  0
>>> READ_REPAIR                  0
>>> 
>>> Node 2:
>>> Pool Name                    Active   Pending      Completed   Blocked  All 
>>> time blocked
>>> MutationStage                     0         0         997042         0      
>>>            0
>>> ReadStage                         0         0           2623         0      
>>>            0
>>> RequestResponseStage              0         0         706650         0      
>>>            0
>>> ReadRepairStage                   0         0            275         0      
>>>            0
>>> ReplicateOnWriteStage             0         0              0         0      
>>>            0
>>> MiscStage                         0         0              0         0      
>>>            0
>>> HintedHandoff                     2         2             12         0      
>>>            0
>>> FlushWriter                       0         0             37         0      
>>>            4
>>> MemoryMeter                       0         0             70         0      
>>>            0
>>> GossipStage                       0         0          14927         0      
>>>            0
>>> CacheCleanupExecutor              0         0              0         0      
>>>            0
>>> InternalResponseStage             0         0              0         0      
>>>            0
>>> CompactionExecutor                4         7             94         0      
>>>            0
>>> ValidationExecutor                0         0              0         0      
>>>            0
>>> MigrationStage                    0         0              0         0      
>>>            0
>>> commitlog_archiver                0         0              0         0      
>>>            0
>>> AntiEntropyStage                  0         0              0         0      
>>>            0
>>> PendingRangeCalculator            0         0              3         0      
>>>            0
>>> MemtablePostFlusher               0         0            114         0      
>>>            0
>>> 
>>> Message type           Dropped
>>> READ                         0
>>> RANGE_SLICE                  0
>>> _TRACE                       0
>>> MUTATION                     0
>>> COUNTER_MUTATION             0
>>> BINARY                       0
>>> REQUEST_RESPONSE             0
>>> PAGED_RANGE                  0
>>> READ_REPAIR                  0
>>> 
>>> Node 3:
>>> Pool Name                    Active   Pending      Completed   Blocked  All 
>>> time blocked
>>> MutationStage                     0         0        1539324         0      
>>>            0
>>> ReadStage                         0         0           2571         0      
>>>            0
>>> RequestResponseStage              0         0         373300         0      
>>>            0
>>> ReadRepairStage                   0         0            325         0      
>>>            0
>>> ReplicateOnWriteStage             0         0              0         0      
>>>            0
>>> MiscStage                         0         0              0         0      
>>>            0
>>> HintedHandoff                     1         1             21         0      
>>>            0
>>> FlushWriter                       0         0             38         0      
>>>            5
>>> MemoryMeter                       0         0             59         0      
>>>            0
>>> GossipStage                       0         0          21491         0      
>>>            0
>>> CacheCleanupExecutor              0         0              0         0      
>>>            0
>>> InternalResponseStage             0         0              0         0      
>>>            0
>>> CompactionExecutor                4         9             85         0      
>>>            0
>>> ValidationExecutor                0         0              0         0      
>>>            0
>>> MigrationStage                    0         0              0         0      
>>>            0
>>> commitlog_archiver                0         0              0         0      
>>>            0
>>> AntiEntropyStage                  0         0              0         0      
>>>            0
>>> PendingRangeCalculator            0         0              6         0      
>>>            0
>>> MemtablePostFlusher               0         0            164         0      
>>>            0
>>> 
>>> Message type           Dropped
>>> READ                         0
>>> RANGE_SLICE                  0
>>> _TRACE                       0
>>> MUTATION                205259
>>> COUNTER_MUTATION             0
>>> BINARY                       0
>>> REQUEST_RESPONSE             0
>>> PAGED_RANGE                  0
>>> READ_REPAIR                 18
>>> 
>>> 
>>> Compaction seems like the only thing consistently active and pending
>>> 
>>> On Tue, Dec 16, 2014 at 2:18 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>> Ok based on those numbers I have a theory..
>>> 
>>> can you show me nodetool tptats for all 3 nodes?
>>> 
>>> On Tue, Dec 16, 2014 at 4:04 PM, Arne Claassen <a...@emotient.com> wrote:
>>> No problem with the follow up questions. I'm on a crash course here trying 
>>> to understand what makes C* tick so I appreciate all feedback.
>>> 
>>> We reprocessed all media (1200 partition keys) last night where partition 
>>> keys had somewhere between 4k and 200k "rows". After that completed, no 
>>> traffic went to cluster at all for ~8 hours and throughout today, we may 
>>> get a couple (less than 10) queries per second and maybe 3-4 write batches 
>>> per hour.
>>> 
>>> I assume the last value in the Partition Size histogram is the largest row:
>>> 
>>> 20924300 bytes: 79
>>> 25109160 bytes: 57
>>> 
>>> The majority seems clustered around 200000 bytes.
>>> 
>>> I will look at switching my inserts to unlogged batches since they are 
>>> always for one partition key.
>>> 
>>> On Tue, Dec 16, 2014 at 1:47 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>> Can you define what is "virtual no traffic" sorry to be repetitive about 
>>> that, but I've worked on a lot of clusters in the past year and people have 
>>> wildly different ideas what that means.
>>> 
>>> unlogged batches of the same partition key are definitely a performance 
>>> optimization. Typically async is much faster and easier on the cluster when 
>>> you're using multip partition key batches.
>>> 
>>> nodetool cfhistograms <keyspace> <tablename>
>>> 
>>> On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen <a...@emotient.com> wrote:
>>> Actually not sure why the machine was originally configured at 6GB since we 
>>> even started it on an r3.large with 15GB.
>>> 
>>> Re: Batches
>>> 
>>> Not using batches. I actually have that as a separate question on the list. 
>>> Currently I fan out async single inserts and I'm wondering if batches are 
>>> better since my data is inherently inserted in blocks of ordered rows for a 
>>> single partition key.
>>> 
>>> 
>>> Re: Traffic
>>> 
>>> There isn't all that much traffic. Inserts come in as blocks per partition 
>>> key, but then can be 5k-200k rows for that partition key. Each of these 
>>> rows is less than 100k. It's small, lots of ordered rows. It's frame and 
>>> sub-frame information for media. and rows for one piece of media is 
>>> inserted at once (the partition key).
>>> 
>>> For the last 12 hours, where the load on all these machine has been stuck 
>>> there's been virtually no traffic at all. This is the nodes basically 
>>> sitting idle, except that they had  load of 4 each. 
>>> 
>>> BTW, how do you determine widest row or for that matter number of 
>>> tombstones in a row?
>>> 
>>> thanks,
>>> arne
>>> 
>>> On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>> So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly enough 
>>> to run Cassandra well in, especially if you're going full bore on loads. 
>>> However, you maybe just flat out be CPU bound on your write throughput, how 
>>> many TPS and what size writes do you have? Also what is your widest row?
>>> 
>>> Final question what is compaction throughput at?
>>> 
>>> 
>>> On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen <a...@emotient.com> wrote:
>>> The starting configuration I had, which is still running on two of the 
>>> nodes, was 6GB Heap, 1024MB parnew which is close to what you are 
>>> suggesting and those have been pegged at load 4 for the over 12 hours with 
>>> hardly and read or write traffic. I will set one to 8GB/400MB and see if 
>>> its load changes.
>>> 
>>> On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>> So heap of that size without some tuning will create a number of problems 
>>> (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew 
>>> (which I'd only set that low for that low cpu count) , or attempt the 
>>> tunings as indicated in https://issues.apache.org/jira/browse/CASSANDRA-8150
>>> 
>>> On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen <a...@emotient.com> wrote:
>>> Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. 
>>> Checked my dev cluster to see if the ParNew log entries are just par for 
>>> the course, but not seeing them there. However, both have the following 
>>> every 30 seconds:
>>> 
>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line 
>>> 165) Started replayAllFailedBatches
>>> DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 
>>> ColumnFamilyStore.java (line 866) forceFlush requested but everything is 
>>> clean in batchlog
>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line 
>>> 200) Finished replayAllFailedBatches
>>> 
>>> Is that just routine scheduled house-keeping or a sign of something else?
>>> 
>>> On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen <a...@emotient.com> wrote:
>>> Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. 
>>> The others are 6GB
>>> 
>>> On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen <a...@emotient.com> wrote:
>>> AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we 
>>> might go c3.2xlarge instead if CPU is more important than RAM
>>> Storage is optimized EBS SSD (but iostat shows no real IO going on)
>>> Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%.
>>> 
>>> The node on which I set the Heap to 10GB from 6GB the utlilization has 
>>> dropped to 46%nice now, but the ParNew log messages still continue at the 
>>> same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings that 
>>> nice CPU further down.
>>> 
>>> No TombstoneOverflowingExceptions.
>>> 
>>> On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>> What's CPU, RAM, Storage layer, and data density per node? Exact heap 
>>> settings would be nice. In the logs look for TombstoneOverflowingException
>>> 
>>> 
>>> On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen <a...@emotient.com> wrote:
>>> I'm running 2.0.10.
>>> 
>>> The data is all time series data and as we change our pipeline, we've been 
>>> periodically been reprocessing the data sources, which causes each time 
>>> series to be overwritten, i.e. every row per partition key is deleted and 
>>> re-written, so I assume i've been collecting a bunch of tombstones.
>>> 
>>> Also, the presence of the ever present and never completing compaction 
>>> types, i assumed were an artifact of tombstoning, but i fully admit to 
>>> conjecture based on about ~20 blog posts and stackoverflow questions i've 
>>> surveyed.
>>> 
>>> I doubled the Heap on one node and it changed nothing regarding the load or 
>>> the ParNew log statements. New Generation Usage is 50%, Eden itself is 56%.
>>> 
>>> Anything else i should look at and report, let me know.
>>> 
>>> On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield 
>>> <jlacefi...@datastax.com> wrote:
>>> Hello,
>>> 
>>>   What version of Cassandra are you running?  
>>> 
>>>   If it's 2.0, we recently experienced something similar with 8447 [1], 
>>> which 8485 [2] should hopefully resolve.  
>>> 
>>>   Please note that 8447 is not related to tombstones.  Tombstone processing 
>>> can put a lot of pressure on the heap as well. Why do you think you have a 
>>> lot of tombstones in that one particular table?
>>> 
>>>   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
>>>   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485
>>> 
>>> Jonathan
>>> 
>>> 
>>> Jonathan Lacefield
>>> Solution Architect | (404) 822 3487 | jlacefi...@datastax.com
>>> 
>>>      
>>> 
>>> On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen <a...@emotient.com> wrote:
>>> I have a three node cluster that has been sitting at a load of 4 (for each 
>>> node), 100% CPI utilization (although 92% nice) for that last 12 hours, 
>>> ever since some significant writes finished. I'm trying to determine what 
>>> tuning I should be doing to get it out of this state. The debug log is just 
>>> an endless series of:
>>> 
>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 
>>> 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 
>>> 8000634880
>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 
>>> 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 
>>> 8000634880
>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 
>>> 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 
>>> 8000634880
>>> 
>>> iostat shows virtually no I/O.
>>> 
>>> Compaction may enter into this, but i don't really know what to make of 
>>> compaction stats since they never change:
>>> 
>>> [root@cassandra-37919c3a ~]# nodetool compactionstats
>>> pending tasks: 10
>>>           compaction type        keyspace           table       completed   
>>>         total      unit  progress
>>>                Compaction           mediamedia_tracks_raw       271651482   
>>>     563615497     bytes    48.20%
>>>                Compaction           mediamedia_tracks_raw        30308910   
>>>   21676695677     bytes     0.14%
>>>                Compaction           mediamedia_tracks_raw      1198384080   
>>>    1815603161     bytes    66.00%
>>> Active compaction remaining time :   0h22m24s
>>> 
>>> 5 minutes later:
>>> 
>>> [root@cassandra-37919c3a ~]# nodetool compactionstats
>>> pending tasks: 9
>>>           compaction type        keyspace           table       completed   
>>>         total      unit  progress
>>>                Compaction           mediamedia_tracks_raw       271651482   
>>>     563615497     bytes    48.20%
>>>                Compaction           mediamedia_tracks_raw        30308910   
>>>   21676695677     bytes     0.14%
>>>                Compaction           mediamedia_tracks_raw      1198384080   
>>>    1815603161     bytes    66.00%
>>> Active compaction remaining time :   0h22m24s
>>> 
>>> Sure the pending tasks went down by one, but the rest is identical. 
>>> media_tracks_raw likely has a bunch of tombstones (can't figure out how to 
>>> get stats on that).
>>> 
>>> Is this behavior something that indicates that i need more Heap, larger new 
>>> generation? Should I be manually running compaction on tables with lots of 
>>> tombstones?
>>> 
>>> Any suggestions or places to educate myself better on performance tuning 
>>> would be appreciated.
>>> 
>>> arne
>>> 
>>> 
>>> -- 
>>> 
>>> Ryan Svihla
>>> Solution Architect
>>> 
>>>  
>>> 
>>> DataStax is the fastest, most scalable distributed database technology, 
>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>> database technology and transactional backbone of choice for the worlds 
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> Ryan Svihla
>>> Solution Architect
>>> 
>>>  
>>> 
>>> DataStax is the fastest, most scalable distributed database technology, 
>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>> database technology and transactional backbone of choice for the worlds 
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> Ryan Svihla
>>> Solution Architect
>>> 
>>>  
>>> 
>>> DataStax is the fastest, most scalable distributed database technology, 
>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>> database technology and transactional backbone of choice for the worlds 
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> Ryan Svihla
>>> Solution Architect
>>> 
>>>  
>>> 
>>> DataStax is the fastest, most scalable distributed database technology, 
>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>> database technology and transactional backbone of choice for the worlds 
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> Ryan Svihla
>>> Solution Architect
>>> 
>>>  
>>> 
>>> DataStax is the fastest, most scalable distributed database technology, 
>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>> database technology and transactional backbone of choice for the worlds 
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> Ryan Svihla
>>> Solution Architect
>>> 
>>>  
>>> 
>>> DataStax is the fastest, most scalable distributed database technology, 
>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>> database technology and transactional backbone of choice for the worlds 
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>> 
>> 
>> 
>> 
>> -- 
>> 
>> Ryan Svihla
>> Solution Architect
>> 
>>  
>> 
>> DataStax is the fastest, most scalable distributed database technology, 
>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>> Datastax is built to be agile, always-on, and predictably scalable to any 
>> size. With more than 500 customers in 45 countries, DataStax is the database 
>> technology and transactional backbone of choice for the worlds most 
>> innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>> 
> 
> 
> 
> -- 
> 
> Ryan Svihla
> Solution Architect
> 
>  
> 
> DataStax is the fastest, most scalable distributed database technology, 
> delivering Apache Cassandra to the world’s most innovative enterprises. 
> Datastax is built to be agile, always-on, and predictably scalable to any 
> size. With more than 500 customers in 45 countries, DataStax is the database 
> technology and transactional backbone of choice for the worlds most 
> innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>

Re: 100% CPU utilization, ParNew and never completing compactions

Reply via email to