Re: 100% CPU utilization, ParNew and never completing compactions

Arne Claassen Tue, 16 Dec 2014 16:51:59 -0800

Cassandra 2.0.10 and Datastax Java Driver 2.1.1

On Dec 16, 2014, at 4:48 PM, Ryan Svihla <rsvi...@datastax.com> wrote:


> What version of Cassandra?
> 
> On Dec 16, 2014 6:36 PM, "Arne Claassen" <a...@emotient.com> wrote:
> That's just the thing. There is nothing in the logs except the constant 
> ParNew collections like
> 
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 118) 
> GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 8000634888
> 
> But the load is staying continuously high.
> 
> There's always some compaction on just that one table, media_tracks_raw going 
> on and those values rarely changed (certainly the remaining time is 
> meaningless)
> 
> pending tasks: 17
>           compaction type        keyspace           table       completed     
>       total      unit  progress
>                Compaction           mediamedia_tracks_raw       444294932     
>  1310653468     bytes    33.90%
>                Compaction           mediamedia_tracks_raw       131931354     
>  3411631999     bytes     3.87%
>                Compaction           mediamedia_tracks_raw        30308970     
> 23097672194     bytes     0.13%
>                Compaction           mediamedia_tracks_raw       899216961     
>  1815591081     bytes    49.53%
> Active compaction remaining time :   0h27m56s
> 
> Here's a sample of a query trace:
> 
>  activity                                                                     
>                     | timestamp    | source        | source_elapsed
> --------------------------------------------------------------------------------------------------+--------------+---------------+----------------
>                                                                               
>  execute_cql3_query | 00:11:46,612 | 10.140.22.236 |              0
>  Parsing select * from media_tracks_raw where id 
> =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | 
> 10.140.22.236 |             47
>                                                                               
> Preparing statement | 00:11:46,612 | 10.140.22.236 |            234
>                                                                  Sending 
> message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 |           7190
>                                                              Message received 
> from /10.140.22.236 | 00:11:46,622 |  10.140.21.54 |             12
>                                              Executing single-partition query 
> on media_tracks_raw | 00:11:46,644 |  10.140.21.54 |          21971
>                                                                      
> Acquiring sstable references | 00:11:46,644 |  10.140.21.54 |          22029
>                                                                       Merging 
> memtable tombstones | 00:11:46,644 |  10.140.21.54 |          22131
>                                                         Bloom filter allows 
> skipping sstable 1395 | 00:11:46,644 |  10.140.21.54 |          22245
>                                                         Bloom filter allows 
> skipping sstable 1394 | 00:11:46,644 |  10.140.21.54 |          22279
>                                                         Bloom filter allows 
> skipping sstable 1391 | 00:11:46,644 |  10.140.21.54 |          22293
>                                                         Bloom filter allows 
> skipping sstable 1381 | 00:11:46,644 |  10.140.21.54 |          22304
>                                                         Bloom filter allows 
> skipping sstable 1376 | 00:11:46,644 |  10.140.21.54 |          22317
>                                                         Bloom filter allows 
> skipping sstable 1368 | 00:11:46,644 |  10.140.21.54 |          22328
>                                                         Bloom filter allows 
> skipping sstable 1365 | 00:11:46,644 |  10.140.21.54 |          22340
>                                                         Bloom filter allows 
> skipping sstable 1351 | 00:11:46,644 |  10.140.21.54 |          22352
>                                                         Bloom filter allows 
> skipping sstable 1367 | 00:11:46,644 |  10.140.21.54 |          22363
>                                                         Bloom filter allows 
> skipping sstable 1380 | 00:11:46,644 |  10.140.21.54 |          22374
>                                                         Bloom filter allows 
> skipping sstable 1343 | 00:11:46,644 |  10.140.21.54 |          22386
>                                                         Bloom filter allows 
> skipping sstable 1342 | 00:11:46,644 |  10.140.21.54 |          22397
>                                                         Bloom filter allows 
> skipping sstable 1334 | 00:11:46,644 |  10.140.21.54 |          22408
>                                                         Bloom filter allows 
> skipping sstable 1377 | 00:11:46,644 |  10.140.21.54 |          22429
>                                                         Bloom filter allows 
> skipping sstable 1330 | 00:11:46,644 |  10.140.21.54 |          22441
>                                                         Bloom filter allows 
> skipping sstable 1329 | 00:11:46,644 |  10.140.21.54 |          22452
>                                                         Bloom filter allows 
> skipping sstable 1328 | 00:11:46,644 |  10.140.21.54 |          22463
>                                                         Bloom filter allows 
> skipping sstable 1327 | 00:11:46,644 |  10.140.21.54 |          22475
>                                                         Bloom filter allows 
> skipping sstable 1326 | 00:11:46,644 |  10.140.21.54 |          22488
>                                                         Bloom filter allows 
> skipping sstable 1320 | 00:11:46,644 |  10.140.21.54 |          22506
>                                                         Bloom filter allows 
> skipping sstable 1319 | 00:11:46,644 |  10.140.21.54 |          22518
>                                                         Bloom filter allows 
> skipping sstable 1318 | 00:11:46,644 |  10.140.21.54 |          22528
>                                                         Bloom filter allows 
> skipping sstable 1317 | 00:11:46,644 |  10.140.21.54 |          22540
>                                                         Bloom filter allows 
> skipping sstable 1316 | 00:11:46,644 |  10.140.21.54 |          22552
>                                                         Bloom filter allows 
> skipping sstable 1315 | 00:11:46,644 |  10.140.21.54 |          22563
>                                                         Bloom filter allows 
> skipping sstable 1314 | 00:11:46,644 |  10.140.21.54 |          22572
>                                                         Bloom filter allows 
> skipping sstable 1313 | 00:11:46,644 |  10.140.21.54 |          22583
>                                                         Bloom filter allows 
> skipping sstable 1312 | 00:11:46,644 |  10.140.21.54 |          22594
>                                                         Bloom filter allows 
> skipping sstable 1311 | 00:11:46,644 |  10.140.21.54 |          22605
>                                                         Bloom filter allows 
> skipping sstable 1310 | 00:11:46,644 |  10.140.21.54 |          22616
>                                                         Bloom filter allows 
> skipping sstable 1309 | 00:11:46,644 |  10.140.21.54 |          22628
>                                                         Bloom filter allows 
> skipping sstable 1308 | 00:11:46,644 |  10.140.21.54 |          22640
>                                                         Bloom filter allows 
> skipping sstable 1307 | 00:11:46,644 |  10.140.21.54 |          22651
>                                                         Bloom filter allows 
> skipping sstable 1306 | 00:11:46,644 |  10.140.21.54 |          22663
>                                                         Bloom filter allows 
> skipping sstable 1305 | 00:11:46,644 |  10.140.21.54 |          22674
>                                                         Bloom filter allows 
> skipping sstable 1304 | 00:11:46,644 |  10.140.21.54 |          22684
>                                                         Bloom filter allows 
> skipping sstable 1303 | 00:11:46,644 |  10.140.21.54 |          22696
>                                                         Bloom filter allows 
> skipping sstable 1302 | 00:11:46,644 |  10.140.21.54 |          22707
>                                                         Bloom filter allows 
> skipping sstable 1301 | 00:11:46,644 |  10.140.21.54 |          22718
>                                                         Bloom filter allows 
> skipping sstable 1300 | 00:11:46,644 |  10.140.21.54 |          22729
>                                                         Bloom filter allows 
> skipping sstable 1299 | 00:11:46,644 |  10.140.21.54 |          22740
>                                                         Bloom filter allows 
> skipping sstable 1298 | 00:11:46,644 |  10.140.21.54 |          22752
>                                                         Bloom filter allows 
> skipping sstable 1297 | 00:11:46,644 |  10.140.21.54 |          22763
>                                                         Bloom filter allows 
> skipping sstable 1296 | 00:11:46,644 |  10.140.21.54 |          22774
>                                                                    Key cache 
> hit for sstable 1295 | 00:11:46,644 |  10.140.21.54 |          22817
>                                                       Seeking to partition 
> beginning in data file | 00:11:46,644 |  10.140.21.54 |          22842
>                        Skipped 0/89 non-slice-intersecting sstables, included 
> 0 due to tombstones | 00:11:46,646 |  10.140.21.54 |          24109
>                                                        Merging data from 
> memtables and 1 sstables | 00:11:46,646 |  10.140.21.54 |          24238
>                                                              Read 101 live 
> and 0 tombstoned cells | 00:11:46,663 |  10.140.21.54 |          41389
>                                                              Enqueuing 
> response to /10.140.22.236 | 00:11:46,663 |  10.140.21.54 |          41831
>                                                                 Sending 
> message to /10.140.22.236 | 00:11:46,664 |  10.140.21.54 |          41972
>                                                               Message 
> received from /10.140.21.54 | 00:11:46,671 | 10.140.22.236 |          59498
>                                                            Processing 
> response from /10.140.21.54 | 00:11:46,672 | 10.140.22.236 |          59563
>                                                                               
>    Request complete | 00:11:46,704 | 10.140.22.236 |          92781
> 
> Every query I did always just had three mentions of tombstones
>   Merging memtable tombstones
>   Skipped 0/89 non-slice-intersecting sstables, included 0 due to tombstones  
>   Read 101 live and 0 tombstoned cells
> And unless i misread those, not of them claim that there are any tombstones.
> 
> 
> On Dec 16, 2014, at 4:26 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
> 
>> manual forced compactions create more problems than they solve, if you have 
>> no evidence of tombstones in your selects (which seems odd, can you share 
>> some of the tracing output?), then I'm not sure what it would solve for you.
>> 
>> Compaction running could explain a high load, logs messages with ERRORS, 
>> WARN, GCInspector are all meaningful there, I suggest search jira for your 
>> version to see if there are any interesting bugs.
>> 
>> 
>> 
>> On Tue, Dec 16, 2014 at 6:14 PM, Arne Claassen <a...@emotient.com> wrote:
>> I just did a wide set of selects and ran across no tombstones. But while on 
>> the subject of gc_grace_seconds, any reason, on a small cluster not to set 
>> it to something low like a single day. It seems like 10 days is only need to 
>> large clusters undergoing long partition splits, or am i misunderstanding 
>> gc_grace_seconds.
>> 
>> Now, given all that, does any of this explain a high load when the cluster 
>> is idle? Is it compaction catching up and would manual forced compaction 
>> alleviate that?
>> 
>> thanks,
>> arne
>> 
>> 
>> On Dec 16, 2014, at 3:28 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>> 
>>> so a delete is really another write for gc_grace_seconds (default 10 days), 
>>> if you get enough tombstones it can make managing your cluster a challenge 
>>> as is. open up cqlsh, turn on tracing and try a few queries..how many 
>>> tombstones are scanned for a given query? It's possible the heap problems 
>>> you're seeing are actually happening on the query side and not on the 
>>> ingest side, the severity of this depends on driver and cassandra version, 
>>> but older drivers and versions of cassandra could easily overload heap with 
>>> expensive selects, when layered over tombstones it's certainly becomes a 
>>> possibility this is your root cause.
>>> 
>>> Now this will primarily create more load on compaction and depending on 
>>> your cassandra version there maybe some other issue at work, but something 
>>> I can tell you is every time I see 1 dropped mutation I see a cluster that 
>>> was overloaded enough it had to shed load. If I see 200k I see a 
>>> cluster/configuration/hardware that is badly overloaded.
>>> 
>>> I suggest the following
>>> trace some of the queries used in prod
>>> monitor your ingest rate, see at what levels you run into issues 
>>> (GCInspector log messages, dropped mutations, etc)
>>> heap configuration we mentioned earlier..go ahead and monitor heap usage, 
>>> if it hits 75% repeated this is an indication of heavy load
>>> monitor dropped mutations..any dropped mutation is evidence of an 
>>> overloaded server, again the root cause can be many other problems that are 
>>> solvable with current hardware, and LOTS of people runs with nodes with 
>>> similar configuration.
>>> 
>>> On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen <a...@emotient.com> wrote:
>>> Not using any secondary indicies and memtable_flush_queue_size is the 
>>> default 4.
>>> 
>>> But let me tell you how data is "mutated" right now, maybe that will give 
>>> you an insight on how this is happening
>>> 
>>> Basically the frame data table has the following primary key: PRIMARY KEY 
>>> ((id), trackid, "timestamp")
>>> 
>>> Generally data is inserted once. So day to day writes are all new rows.
>>> However, when out process for generating analytics for these rows changes, 
>>> we run the media back through again, causing overwrites.
>>> 
>>> Up until last night, this was just a new insert because the PK never 
>>> changed so it was always 1-to-1 overwrite of every row.
>>> 
>>> Last night was the first time that a new change went in where the PK could 
>>> actually change so now the process is always, DELETE by partition key, 
>>> insert all rows for partition key, repeat.
>>> 
>>> We two tables that have similar frame data projections and some other 
>>> aggregates with much smaller row count per partition key.
>>> 
>>> hope that helps,
>>> arne
>>> 
>>> On Dec 16, 2014, at 2:46 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>> 
>>>> so you've got some blocked flush writers but you have a incredibly large 
>>>> number of dropped mutations, are you using secondary indexes? and if so 
>>>> how many? what is your flush queue set to?
>>>> 
>>>> On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen <a...@emotient.com> wrote:
>>>> Of course QA decided to start a test batch (still relatively low traffic), 
>>>> so I hope it doesn't throw the tpstats off too much
>>>> 
>>>> Node 1:
>>>> Pool Name                    Active   Pending      Completed   Blocked  
>>>> All time blocked
>>>> MutationStage                     0         0       13804928         0     
>>>>             0
>>>> ReadStage                         0         0          10975         0     
>>>>             0
>>>> RequestResponseStage              0         0        7725378         0     
>>>>             0
>>>> ReadRepairStage                   0         0           1247         0     
>>>>             0
>>>> ReplicateOnWriteStage             0         0              0         0     
>>>>             0
>>>> MiscStage                         0         0              0         0     
>>>>             0
>>>> HintedHandoff                     1         1             50         0     
>>>>             0
>>>> FlushWriter                       0         0            306         0     
>>>>            31
>>>> MemoryMeter                       0         0            719         0     
>>>>             0
>>>> GossipStage                       0         0         286505         0     
>>>>             0
>>>> CacheCleanupExecutor              0         0              0         0     
>>>>             0
>>>> InternalResponseStage             0         0              0         0     
>>>>             0
>>>> CompactionExecutor                4        14            159         0     
>>>>             0
>>>> ValidationExecutor                0         0              0         0     
>>>>             0
>>>> MigrationStage                    0         0              0         0     
>>>>             0
>>>> commitlog_archiver                0         0              0         0     
>>>>             0
>>>> AntiEntropyStage                  0         0              0         0     
>>>>             0
>>>> PendingRangeCalculator            0         0             11         0     
>>>>             0
>>>> MemtablePostFlusher               0         0           1781         0     
>>>>             0
>>>> 
>>>> Message type           Dropped
>>>> READ                         0
>>>> RANGE_SLICE                  0
>>>> _TRACE                       0
>>>> MUTATION                391041
>>>> COUNTER_MUTATION             0
>>>> BINARY                       0
>>>> REQUEST_RESPONSE             0
>>>> PAGED_RANGE                  0
>>>> READ_REPAIR                  0
>>>> 
>>>> Node 2:
>>>> Pool Name                    Active   Pending      Completed   Blocked  
>>>> All time blocked
>>>> MutationStage                     0         0         997042         0     
>>>>             0
>>>> ReadStage                         0         0           2623         0     
>>>>             0
>>>> RequestResponseStage              0         0         706650         0     
>>>>             0
>>>> ReadRepairStage                   0         0            275         0     
>>>>             0
>>>> ReplicateOnWriteStage             0         0              0         0     
>>>>             0
>>>> MiscStage                         0         0              0         0     
>>>>             0
>>>> HintedHandoff                     2         2             12         0     
>>>>             0
>>>> FlushWriter                       0         0             37         0     
>>>>             4
>>>> MemoryMeter                       0         0             70         0     
>>>>             0
>>>> GossipStage                       0         0          14927         0     
>>>>             0
>>>> CacheCleanupExecutor              0         0              0         0     
>>>>             0
>>>> InternalResponseStage             0         0              0         0     
>>>>             0
>>>> CompactionExecutor                4         7             94         0     
>>>>             0
>>>> ValidationExecutor                0         0              0         0     
>>>>             0
>>>> MigrationStage                    0         0              0         0     
>>>>             0
>>>> commitlog_archiver                0         0              0         0     
>>>>             0
>>>> AntiEntropyStage                  0         0              0         0     
>>>>             0
>>>> PendingRangeCalculator            0         0              3         0     
>>>>             0
>>>> MemtablePostFlusher               0         0            114         0     
>>>>             0
>>>> 
>>>> Message type           Dropped
>>>> READ                         0
>>>> RANGE_SLICE                  0
>>>> _TRACE                       0
>>>> MUTATION                     0
>>>> COUNTER_MUTATION             0
>>>> BINARY                       0
>>>> REQUEST_RESPONSE             0
>>>> PAGED_RANGE                  0
>>>> READ_REPAIR                  0
>>>> 
>>>> Node 3:
>>>> Pool Name                    Active   Pending      Completed   Blocked  
>>>> All time blocked
>>>> MutationStage                     0         0        1539324         0     
>>>>             0
>>>> ReadStage                         0         0           2571         0     
>>>>             0
>>>> RequestResponseStage              0         0         373300         0     
>>>>             0
>>>> ReadRepairStage                   0         0            325         0     
>>>>             0
>>>> ReplicateOnWriteStage             0         0              0         0     
>>>>             0
>>>> MiscStage                         0         0              0         0     
>>>>             0
>>>> HintedHandoff                     1         1             21         0     
>>>>             0
>>>> FlushWriter                       0         0             38         0     
>>>>             5
>>>> MemoryMeter                       0         0             59         0     
>>>>             0
>>>> GossipStage                       0         0          21491         0     
>>>>             0
>>>> CacheCleanupExecutor              0         0              0         0     
>>>>             0
>>>> InternalResponseStage             0         0              0         0     
>>>>             0
>>>> CompactionExecutor                4         9             85         0     
>>>>             0
>>>> ValidationExecutor                0         0              0         0     
>>>>             0
>>>> MigrationStage                    0         0              0         0     
>>>>             0
>>>> commitlog_archiver                0         0              0         0     
>>>>             0
>>>> AntiEntropyStage                  0         0              0         0     
>>>>             0
>>>> PendingRangeCalculator            0         0              6         0     
>>>>             0
>>>> MemtablePostFlusher               0         0            164         0     
>>>>             0
>>>> 
>>>> Message type           Dropped
>>>> READ                         0
>>>> RANGE_SLICE                  0
>>>> _TRACE                       0
>>>> MUTATION                205259
>>>> COUNTER_MUTATION             0
>>>> BINARY                       0
>>>> REQUEST_RESPONSE             0
>>>> PAGED_RANGE                  0
>>>> READ_REPAIR                 18
>>>> 
>>>> 
>>>> Compaction seems like the only thing consistently active and pending
>>>> 
>>>> On Tue, Dec 16, 2014 at 2:18 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>>> Ok based on those numbers I have a theory..
>>>> 
>>>> can you show me nodetool tptats for all 3 nodes?
>>>> 
>>>> On Tue, Dec 16, 2014 at 4:04 PM, Arne Claassen <a...@emotient.com> wrote:
>>>> No problem with the follow up questions. I'm on a crash course here trying 
>>>> to understand what makes C* tick so I appreciate all feedback.
>>>> 
>>>> We reprocessed all media (1200 partition keys) last night where partition 
>>>> keys had somewhere between 4k and 200k "rows". After that completed, no 
>>>> traffic went to cluster at all for ~8 hours and throughout today, we may 
>>>> get a couple (less than 10) queries per second and maybe 3-4 write batches 
>>>> per hour.
>>>> 
>>>> I assume the last value in the Partition Size histogram is the largest row:
>>>> 
>>>> 20924300 bytes: 79
>>>> 25109160 bytes: 57
>>>> 
>>>> The majority seems clustered around 200000 bytes.
>>>> 
>>>> I will look at switching my inserts to unlogged batches since they are 
>>>> always for one partition key.
>>>> 
>>>> On Tue, Dec 16, 2014 at 1:47 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>>> Can you define what is "virtual no traffic" sorry to be repetitive about 
>>>> that, but I've worked on a lot of clusters in the past year and people 
>>>> have wildly different ideas what that means.
>>>> 
>>>> unlogged batches of the same partition key are definitely a performance 
>>>> optimization. Typically async is much faster and easier on the cluster 
>>>> when you're using multip partition key batches.
>>>> 
>>>> nodetool cfhistograms <keyspace> <tablename>
>>>> 
>>>> On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen <a...@emotient.com> wrote:
>>>> Actually not sure why the machine was originally configured at 6GB since 
>>>> we even started it on an r3.large with 15GB.
>>>> 
>>>> Re: Batches
>>>> 
>>>> Not using batches. I actually have that as a separate question on the 
>>>> list. Currently I fan out async single inserts and I'm wondering if 
>>>> batches are better since my data is inherently inserted in blocks of 
>>>> ordered rows for a single partition key.
>>>> 
>>>> 
>>>> Re: Traffic
>>>> 
>>>> There isn't all that much traffic. Inserts come in as blocks per partition 
>>>> key, but then can be 5k-200k rows for that partition key. Each of these 
>>>> rows is less than 100k. It's small, lots of ordered rows. It's frame and 
>>>> sub-frame information for media. and rows for one piece of media is 
>>>> inserted at once (the partition key).
>>>> 
>>>> For the last 12 hours, where the load on all these machine has been stuck 
>>>> there's been virtually no traffic at all. This is the nodes basically 
>>>> sitting idle, except that they had  load of 4 each. 
>>>> 
>>>> BTW, how do you determine widest row or for that matter number of 
>>>> tombstones in a row?
>>>> 
>>>> thanks,
>>>> arne
>>>> 
>>>> On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>>> So 1024 is still a good 2.5 times what I'm suggesting, 6GB is hardly 
>>>> enough to run Cassandra well in, especially if you're going full bore on 
>>>> loads. However, you maybe just flat out be CPU bound on your write 
>>>> throughput, how many TPS and what size writes do you have? Also what is 
>>>> your widest row?
>>>> 
>>>> Final question what is compaction throughput at?
>>>> 
>>>> 
>>>> On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen <a...@emotient.com> wrote:
>>>> The starting configuration I had, which is still running on two of the 
>>>> nodes, was 6GB Heap, 1024MB parnew which is close to what you are 
>>>> suggesting and those have been pegged at load 4 for the over 12 hours with 
>>>> hardly and read or write traffic. I will set one to 8GB/400MB and see if 
>>>> its load changes.
>>>> 
>>>> On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>>> So heap of that size without some tuning will create a number of problems 
>>>> (high cpu usage one of them), I suggest either 8GB heap and 400mb parnew 
>>>> (which I'd only set that low for that low cpu count) , or attempt the 
>>>> tunings as indicated in 
>>>> https://issues.apache.org/jira/browse/CASSANDRA-8150
>>>> 
>>>> On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen <a...@emotient.com> wrote:
>>>> Changed the 15GB node to 25GB heap and the nice CPU is down to ~20% now. 
>>>> Checked my dev cluster to see if the ParNew log entries are just par for 
>>>> the course, but not seeing them there. However, both have the following 
>>>> every 30 seconds:
>>>> 
>>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 BatchlogManager.java (line 
>>>> 165) Started replayAllFailedBatches
>>>> DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 
>>>> ColumnFamilyStore.java (line 866) forceFlush requested but everything is 
>>>> clean in batchlog
>>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 BatchlogManager.java (line 
>>>> 200) Finished replayAllFailedBatches
>>>> 
>>>> Is that just routine scheduled house-keeping or a sign of something else?
>>>> 
>>>> On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen <a...@emotient.com> wrote:
>>>> Sorry, I meant 15GB heap on the one machine that has less nice CPU% now. 
>>>> The others are 6GB
>>>> 
>>>> On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen <a...@emotient.com> wrote:
>>>> AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB because we 
>>>> might go c3.2xlarge instead if CPU is more important than RAM
>>>> Storage is optimized EBS SSD (but iostat shows no real IO going on)
>>>> Each node only has about 10GB with ownership of 67%, 64.7% & 68.3%.
>>>> 
>>>> The node on which I set the Heap to 10GB from 6GB the utlilization has 
>>>> dropped to 46%nice now, but the ParNew log messages still continue at the 
>>>> same pace. I'm gonna up the HEAP to 20GB for a bit, see if that brings 
>>>> that nice CPU further down.
>>>> 
>>>> No TombstoneOverflowingExceptions.
>>>> 
>>>> On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>>> What's CPU, RAM, Storage layer, and data density per node? Exact heap 
>>>> settings would be nice. In the logs look for TombstoneOverflowingException
>>>> 
>>>> 
>>>> On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen <a...@emotient.com> wrote:
>>>> I'm running 2.0.10.
>>>> 
>>>> The data is all time series data and as we change our pipeline, we've been 
>>>> periodically been reprocessing the data sources, which causes each time 
>>>> series to be overwritten, i.e. every row per partition key is deleted and 
>>>> re-written, so I assume i've been collecting a bunch of tombstones.
>>>> 
>>>> Also, the presence of the ever present and never completing compaction 
>>>> types, i assumed were an artifact of tombstoning, but i fully admit to 
>>>> conjecture based on about ~20 blog posts and stackoverflow questions i've 
>>>> surveyed.
>>>> 
>>>> I doubled the Heap on one node and it changed nothing regarding the load 
>>>> or the ParNew log statements. New Generation Usage is 50%, Eden itself is 
>>>> 56%.
>>>> 
>>>> Anything else i should look at and report, let me know.
>>>> 
>>>> On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield 
>>>> <jlacefi...@datastax.com> wrote:
>>>> Hello,
>>>> 
>>>>   What version of Cassandra are you running?  
>>>> 
>>>>   If it's 2.0, we recently experienced something similar with 8447 [1], 
>>>> which 8485 [2] should hopefully resolve.  
>>>> 
>>>>   Please note that 8447 is not related to tombstones.  Tombstone 
>>>> processing can put a lot of pressure on the heap as well. Why do you think 
>>>> you have a lot of tombstones in that one particular table?
>>>> 
>>>>   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
>>>>   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485
>>>> 
>>>> Jonathan
>>>> 
>>>> 
>>>> Jonathan Lacefield
>>>> Solution Architect | (404) 822 3487 | jlacefi...@datastax.com
>>>> 
>>>>      
>>>> 
>>>> On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen <a...@emotient.com> wrote:
>>>> I have a three node cluster that has been sitting at a load of 4 (for each 
>>>> node), 100% CPI utilization (although 92% nice) for that last 12 hours, 
>>>> ever since some significant writes finished. I'm trying to determine what 
>>>> tuning I should be doing to get it out of this state. The debug log is 
>>>> just an endless series of:
>>>> 
>>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line 
>>>> 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is 
>>>> 8000634880
>>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 GCInspector.java (line 
>>>> 118) GC for ParNew: 165 ms for 10 collections, 4440011176 used; max is 
>>>> 8000634880
>>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 GCInspector.java (line 
>>>> 118) GC for ParNew: 135 ms for 8 collections, 4402220568 used; max is 
>>>> 8000634880
>>>> 
>>>> iostat shows virtually no I/O.
>>>> 
>>>> Compaction may enter into this, but i don't really know what to make of 
>>>> compaction stats since they never change:
>>>> 
>>>> [root@cassandra-37919c3a ~]# nodetool compactionstats
>>>> pending tasks: 10
>>>>           compaction type        keyspace           table       completed  
>>>>          total      unit  progress
>>>>                Compaction           mediamedia_tracks_raw       271651482  
>>>>      563615497     bytes    48.20%
>>>>                Compaction           mediamedia_tracks_raw        30308910  
>>>>    21676695677     bytes     0.14%
>>>>                Compaction           mediamedia_tracks_raw      1198384080  
>>>>     1815603161     bytes    66.00%
>>>> Active compaction remaining time :   0h22m24s
>>>> 
>>>> 5 minutes later:
>>>> 
>>>> [root@cassandra-37919c3a ~]# nodetool compactionstats
>>>> pending tasks: 9
>>>>           compaction type        keyspace           table       completed  
>>>>          total      unit  progress
>>>>                Compaction           mediamedia_tracks_raw       271651482  
>>>>      563615497     bytes    48.20%
>>>>                Compaction           mediamedia_tracks_raw        30308910  
>>>>    21676695677     bytes     0.14%
>>>>                Compaction           mediamedia_tracks_raw      1198384080  
>>>>     1815603161     bytes    66.00%
>>>> Active compaction remaining time :   0h22m24s
>>>> 
>>>> Sure the pending tasks went down by one, but the rest is identical. 
>>>> media_tracks_raw likely has a bunch of tombstones (can't figure out how to 
>>>> get stats on that).
>>>> 
>>>> Is this behavior something that indicates that i need more Heap, larger 
>>>> new generation? Should I be manually running compaction on tables with 
>>>> lots of tombstones?
>>>> 
>>>> Any suggestions or places to educate myself better on performance tuning 
>>>> would be appreciated.
>>>> 
>>>> arne
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> Ryan Svihla
>>>> Solution Architect
>>>> 
>>>>  
>>>> 
>>>> DataStax is the fastest, most scalable distributed database technology, 
>>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>>> database technology and transactional backbone of choice for the worlds 
>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> Ryan Svihla
>>>> Solution Architect
>>>> 
>>>>  
>>>> 
>>>> DataStax is the fastest, most scalable distributed database technology, 
>>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>>> database technology and transactional backbone of choice for the worlds 
>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> Ryan Svihla
>>>> Solution Architect
>>>> 
>>>>  
>>>> 
>>>> DataStax is the fastest, most scalable distributed database technology, 
>>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>>> database technology and transactional backbone of choice for the worlds 
>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> Ryan Svihla
>>>> Solution Architect
>>>> 
>>>>  
>>>> 
>>>> DataStax is the fastest, most scalable distributed database technology, 
>>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>>> database technology and transactional backbone of choice for the worlds 
>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> Ryan Svihla
>>>> Solution Architect
>>>> 
>>>>  
>>>> 
>>>> DataStax is the fastest, most scalable distributed database technology, 
>>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>>> database technology and transactional backbone of choice for the worlds 
>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>>> 
>>>> 
>>>> 
>>>> -- 
>>>> 
>>>> Ryan Svihla
>>>> Solution Architect
>>>> 
>>>>  
>>>> 
>>>> DataStax is the fastest, most scalable distributed database technology, 
>>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>>> database technology and transactional backbone of choice for the worlds 
>>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>>> 
>>> 
>>> 
>>> 
>>> -- 
>>> 
>>> Ryan Svihla
>>> Solution Architect
>>> 
>>>  
>>> 
>>> DataStax is the fastest, most scalable distributed database technology, 
>>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>>> Datastax is built to be agile, always-on, and predictably scalable to any 
>>> size. With more than 500 customers in 45 countries, DataStax is the 
>>> database technology and transactional backbone of choice for the worlds 
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>>> 
>> 
>> 
>> 
>> -- 
>> 
>> Ryan Svihla
>> Solution Architect
>> 
>>  
>> 
>> DataStax is the fastest, most scalable distributed database technology, 
>> delivering Apache Cassandra to the world’s most innovative enterprises. 
>> Datastax is built to be agile, always-on, and predictably scalable to any 
>> size. With more than 500 customers in 45 countries, DataStax is the database 
>> technology and transactional backbone of choice for the worlds most 
>> innovative companies such as Netflix, Adobe, Intuit, and eBay. 
>> 
>

Re: 100% CPU utilization, ParNew and never completing compactions

Reply via email to