Re: 100% CPU utilization, ParNew and never completing compactions

Ryan Svihla Tue, 16 Dec 2014 16:48:54 -0800

What version of Cassandra?
On Dec 16, 2014 6:36 PM, "Arne Claassen" <a...@emotient.com> wrote:


> That's just the thing. There is nothing in the logs except the constant
> ParNew collections like
>
> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line
> 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is
> 8000634888
>
> But the load is staying continuously high.
>
> There's always some compaction on just that one table, media_tracks_raw
> going on and those values rarely changed (certainly the remaining time is
> meaningless)
>
> pending tasks: 17
>           compaction type        keyspace           table       completed
>           total      unit  progress
>                Compaction           mediamedia_tracks_raw       444294932
>      1310653468     bytes    33.90%
>                Compaction           mediamedia_tracks_raw       131931354
>      3411631999     bytes     3.87%
>                Compaction           mediamedia_tracks_raw        30308970
>     23097672194     bytes     0.13%
>                Compaction           mediamedia_tracks_raw       899216961
>      1815591081     bytes    49.53%
> Active compaction remaining time :   0h27m56s
>
> Here's a sample of a query trace:
>
>  activity
>                         | timestamp    | source        | source_elapsed
>
> --------------------------------------------------------------------------------------------------+--------------+---------------+----------------
>
>      execute_cql3_query | 00:11:46,612 | 10.140.22.236 |              0
>  Parsing select * from media_tracks_raw where id
> =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 |
> 10.140.22.236 |             47
>
>     Preparing statement | 00:11:46,612 | 10.140.22.236 |            234
>                                                                  Sending
> message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 |           7190
>                                                              Message
> received from /10.140.22.236 | 00:11:46,622 |  10.140.21.54 |
> 12
>                                              Executing single-partition
> query on media_tracks_raw | 00:11:46,644 |  10.140.21.54 |          21971
>
>  Acquiring sstable references | 00:11:46,644 |  10.140.21.54 |
>  22029
>
> Merging memtable tombstones | 00:11:46,644 |  10.140.21.54 |          22131
>                                                         Bloom filter
> allows skipping sstable 1395 | 00:11:46,644 |  10.140.21.54 |          22245
>                                                         Bloom filter
> allows skipping sstable 1394 | 00:11:46,644 |  10.140.21.54 |          22279
>                                                         Bloom filter
> allows skipping sstable 1391 | 00:11:46,644 |  10.140.21.54 |          22293
>                                                         Bloom filter
> allows skipping sstable 1381 | 00:11:46,644 |  10.140.21.54 |          22304
>                                                         Bloom filter
> allows skipping sstable 1376 | 00:11:46,644 |  10.140.21.54 |          22317
>                                                         Bloom filter
> allows skipping sstable 1368 | 00:11:46,644 |  10.140.21.54 |          22328
>                                                         Bloom filter
> allows skipping sstable 1365 | 00:11:46,644 |  10.140.21.54 |          22340
>                                                         Bloom filter
> allows skipping sstable 1351 | 00:11:46,644 |  10.140.21.54 |          22352
>                                                         Bloom filter
> allows skipping sstable 1367 | 00:11:46,644 |  10.140.21.54 |          22363
>                                                         Bloom filter
> allows skipping sstable 1380 | 00:11:46,644 |  10.140.21.54 |          22374
>                                                         Bloom filter
> allows skipping sstable 1343 | 00:11:46,644 |  10.140.21.54 |          22386
>                                                         Bloom filter
> allows skipping sstable 1342 | 00:11:46,644 |  10.140.21.54 |          22397
>                                                         Bloom filter
> allows skipping sstable 1334 | 00:11:46,644 |  10.140.21.54 |          22408
>                                                         Bloom filter
> allows skipping sstable 1377 | 00:11:46,644 |  10.140.21.54 |          22429
>                                                         Bloom filter
> allows skipping sstable 1330 | 00:11:46,644 |  10.140.21.54 |          22441
>                                                         Bloom filter
> allows skipping sstable 1329 | 00:11:46,644 |  10.140.21.54 |          22452
>                                                         Bloom filter
> allows skipping sstable 1328 | 00:11:46,644 |  10.140.21.54 |          22463
>                                                         Bloom filter
> allows skipping sstable 1327 | 00:11:46,644 |  10.140.21.54 |          22475
>                                                         Bloom filter
> allows skipping sstable 1326 | 00:11:46,644 |  10.140.21.54 |          22488
>                                                         Bloom filter
> allows skipping sstable 1320 | 00:11:46,644 |  10.140.21.54 |          22506
>                                                         Bloom filter
> allows skipping sstable 1319 | 00:11:46,644 |  10.140.21.54 |          22518
>                                                         Bloom filter
> allows skipping sstable 1318 | 00:11:46,644 |  10.140.21.54 |          22528
>                                                         Bloom filter
> allows skipping sstable 1317 | 00:11:46,644 |  10.140.21.54 |          22540
>                                                         Bloom filter
> allows skipping sstable 1316 | 00:11:46,644 |  10.140.21.54 |          22552
>                                                         Bloom filter
> allows skipping sstable 1315 | 00:11:46,644 |  10.140.21.54 |          22563
>                                                         Bloom filter
> allows skipping sstable 1314 | 00:11:46,644 |  10.140.21.54 |          22572
>                                                         Bloom filter
> allows skipping sstable 1313 | 00:11:46,644 |  10.140.21.54 |          22583
>                                                         Bloom filter
> allows skipping sstable 1312 | 00:11:46,644 |  10.140.21.54 |          22594
>                                                         Bloom filter
> allows skipping sstable 1311 | 00:11:46,644 |  10.140.21.54 |          22605
>                                                         Bloom filter
> allows skipping sstable 1310 | 00:11:46,644 |  10.140.21.54 |          22616
>                                                         Bloom filter
> allows skipping sstable 1309 | 00:11:46,644 |  10.140.21.54 |          22628
>                                                         Bloom filter
> allows skipping sstable 1308 | 00:11:46,644 |  10.140.21.54 |          22640
>                                                         Bloom filter
> allows skipping sstable 1307 | 00:11:46,644 |  10.140.21.54 |          22651
>                                                         Bloom filter
> allows skipping sstable 1306 | 00:11:46,644 |  10.140.21.54 |          22663
>                                                         Bloom filter
> allows skipping sstable 1305 | 00:11:46,644 |  10.140.21.54 |          22674
>                                                         Bloom filter
> allows skipping sstable 1304 | 00:11:46,644 |  10.140.21.54 |          22684
>                                                         Bloom filter
> allows skipping sstable 1303 | 00:11:46,644 |  10.140.21.54 |          22696
>                                                         Bloom filter
> allows skipping sstable 1302 | 00:11:46,644 |  10.140.21.54 |          22707
>                                                         Bloom filter
> allows skipping sstable 1301 | 00:11:46,644 |  10.140.21.54 |          22718
>                                                         Bloom filter
> allows skipping sstable 1300 | 00:11:46,644 |  10.140.21.54 |          22729
>                                                         Bloom filter
> allows skipping sstable 1299 | 00:11:46,644 |  10.140.21.54 |          22740
>                                                         Bloom filter
> allows skipping sstable 1298 | 00:11:46,644 |  10.140.21.54 |          22752
>                                                         Bloom filter
> allows skipping sstable 1297 | 00:11:46,644 |  10.140.21.54 |          22763
>                                                         Bloom filter
> allows skipping sstable 1296 | 00:11:46,644 |  10.140.21.54 |          22774
>                                                                    Key
> cache hit for sstable 1295 | 00:11:46,644 |  10.140.21.54 |          22817
>                                                       Seeking to partition
> beginning in data file | 00:11:46,644 |  10.140.21.54 |          22842
>                        Skipped 0/89 non-slice-intersecting sstables,
> included 0 due to tombstones | 00:11:46,646 |  10.140.21.54 |          24109
>                                                        Merging data from
> memtables and 1 sstables | 00:11:46,646 |  10.140.21.54 |          24238
>                                                              Read 101 live
> and 0 tombstoned cells | 00:11:46,663 |  10.140.21.54 |          41389
>                                                              Enqueuing
> response to /10.140.22.236 | 00:11:46,663 |  10.140.21.54 |          41831
>                                                                 Sending
> message to /10.140.22.236 | 00:11:46,664 |  10.140.21.54 |          41972
>                                                               Message
> received from /10.140.21.54 | 00:11:46,671 | 10.140.22.236 |
>  59498
>                                                            Processing
> response from /10.140.21.54 | 00:11:46,672 | 10.140.22.236 |
>  59563
>
>        Request complete | 00:11:46,704 | 10.140.22.236 |          92781
>
> Every query I did always just had three mentions of tombstones
>   Merging memtable tombstones
>   Skipped 0/89 non-slice-intersecting sstables, included 0 due to
> tombstones
>   Read 101 live and 0 tombstoned cells
> And unless i misread those, not of them claim that there are any
> tombstones.
>
>
> On Dec 16, 2014, at 4:26 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>
> manual forced compactions create more problems than they solve, if you
> have no evidence of tombstones in your selects (which seems odd, can you
> share some of the tracing output?), then I'm not sure what it would solve
> for you.
>
> Compaction running could explain a high load, logs messages with ERRORS,
> WARN, GCInspector are all meaningful there, I suggest search jira for your
> version to see if there are any interesting bugs.
>
>
>
> On Tue, Dec 16, 2014 at 6:14 PM, Arne Claassen <a...@emotient.com> wrote:
>>
>> I just did a wide set of selects and ran across no tombstones. But while
>> on the subject of gc_grace_seconds, any reason, on a small cluster not to
>> set it to something low like a single day. It seems like 10 days is only
>> need to large clusters undergoing long partition splits, or am i
>> misunderstanding gc_grace_seconds.
>>
>> Now, given all that, does any of this explain a high load when the
>> cluster is idle? Is it compaction catching up and would manual forced
>> compaction alleviate that?
>>
>> thanks,
>> arne
>>
>>
>> On Dec 16, 2014, at 3:28 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>
>> so a delete is really another write for gc_grace_seconds (default 10
>> days), if you get enough tombstones it can make managing your cluster a
>> challenge as is. open up cqlsh, turn on tracing and try a few queries..how
>> many tombstones are scanned for a given query? It's possible the heap
>> problems you're seeing are actually happening on the query side and not on
>> the ingest side, the severity of this depends on driver and cassandra
>> version, but older drivers and versions of cassandra could easily overload
>> heap with expensive selects, when layered over tombstones it's certainly
>> becomes a possibility this is your root cause.
>>
>> Now this will primarily create more load on compaction and depending on
>> your cassandra version there maybe some other issue at work, but something
>> I can tell you is every time I see 1 dropped mutation I see a cluster that
>> was overloaded enough it had to shed load. If I see 200k I see a
>> cluster/configuration/hardware that is badly overloaded.
>>
>> I suggest the following
>>
>>    - trace some of the queries used in prod
>>    - monitor your ingest rate, see at what levels you run into issues
>>    (GCInspector log messages, dropped mutations, etc)
>>    - heap configuration we mentioned earlier..go ahead and monitor heap
>>    usage, if it hits 75% repeated this is an indication of heavy load
>>    - monitor dropped mutations..any dropped mutation is evidence of an
>>    overloaded server, again the root cause can be many other problems that 
>> are
>>    solvable with current hardware, and LOTS of people runs with nodes with
>>    similar configuration.
>>
>>
>> On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen <a...@emotient.com> wrote:
>>>
>>> Not using any secondary indicies and memtable_flush_queue_size is the
>>> default 4.
>>>
>>> But let me tell you how data is "mutated" right now, maybe that will
>>> give you an insight on how this is happening
>>>
>>> Basically the frame data table has the following primary key: PRIMARY
>>> KEY ((id), trackid, "timestamp")
>>>
>>> Generally data is inserted once. So day to day writes are all new rows.
>>> However, when out process for generating analytics for these rows
>>> changes, we run the media back through again, causing overwrites.
>>>
>>> Up until last night, this was just a new insert because the PK never
>>> changed so it was always 1-to-1 overwrite of every row.
>>>
>>> Last night was the first time that a new change went in where the PK
>>> could actually change so now the process is always, DELETE by partition
>>> key, insert all rows for partition key, repeat.
>>>
>>> We two tables that have similar frame data projections and some other
>>> aggregates with much smaller row count per partition key.
>>>
>>> hope that helps,
>>> arne
>>>
>>> On Dec 16, 2014, at 2:46 PM, Ryan Svihla <rsvi...@datastax.com> wrote:
>>>
>>> so you've got some blocked flush writers but you have a incredibly large
>>> number of dropped mutations, are you using secondary indexes? and if so how
>>> many? what is your flush queue set to?
>>>
>>> On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen <a...@emotient.com>
>>> wrote:
>>>>
>>>> Of course QA decided to start a test batch (still relatively low
>>>> traffic), so I hope it doesn't throw the tpstats off too much
>>>>
>>>> Node 1:
>>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>  All time blocked
>>>> MutationStage                     0         0       13804928         0
>>>>                 0
>>>> ReadStage                         0         0          10975         0
>>>>                 0
>>>> RequestResponseStage              0         0        7725378         0
>>>>                 0
>>>> ReadRepairStage                   0         0           1247         0
>>>>                 0
>>>> ReplicateOnWriteStage             0         0              0         0
>>>>                 0
>>>> MiscStage                         0         0              0         0
>>>>                 0
>>>> HintedHandoff                     1         1             50         0
>>>>                 0
>>>> FlushWriter                       0         0            306         0
>>>>                31
>>>> MemoryMeter                       0         0            719         0
>>>>                 0
>>>> GossipStage                       0         0         286505         0
>>>>                 0
>>>> CacheCleanupExecutor              0         0              0         0
>>>>                 0
>>>> InternalResponseStage             0         0              0         0
>>>>                 0
>>>> CompactionExecutor                4        14            159         0
>>>>                 0
>>>> ValidationExecutor                0         0              0         0
>>>>                 0
>>>> MigrationStage                    0         0              0         0
>>>>                 0
>>>> commitlog_archiver                0         0              0         0
>>>>                 0
>>>> AntiEntropyStage                  0         0              0         0
>>>>                 0
>>>> PendingRangeCalculator            0         0             11         0
>>>>                 0
>>>> MemtablePostFlusher               0         0           1781         0
>>>>                 0
>>>>
>>>> Message type           Dropped
>>>> READ                         0
>>>> RANGE_SLICE                  0
>>>> _TRACE                       0
>>>> MUTATION                391041
>>>> COUNTER_MUTATION             0
>>>> BINARY                       0
>>>> REQUEST_RESPONSE             0
>>>> PAGED_RANGE                  0
>>>> READ_REPAIR                  0
>>>>
>>>> Node 2:
>>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>  All time blocked
>>>> MutationStage                     0         0         997042         0
>>>>                 0
>>>> ReadStage                         0         0           2623         0
>>>>                 0
>>>> RequestResponseStage              0         0         706650         0
>>>>                 0
>>>> ReadRepairStage                   0         0            275         0
>>>>                 0
>>>> ReplicateOnWriteStage             0         0              0         0
>>>>                 0
>>>> MiscStage                         0         0              0         0
>>>>                 0
>>>> HintedHandoff                     2         2             12         0
>>>>                 0
>>>> FlushWriter                       0         0             37         0
>>>>                 4
>>>> MemoryMeter                       0         0             70         0
>>>>                 0
>>>> GossipStage                       0         0          14927         0
>>>>                 0
>>>> CacheCleanupExecutor              0         0              0         0
>>>>                 0
>>>> InternalResponseStage             0         0              0         0
>>>>                 0
>>>> CompactionExecutor                4         7             94         0
>>>>                 0
>>>> ValidationExecutor                0         0              0         0
>>>>                 0
>>>> MigrationStage                    0         0              0         0
>>>>                 0
>>>> commitlog_archiver                0         0              0         0
>>>>                 0
>>>> AntiEntropyStage                  0         0              0         0
>>>>                 0
>>>> PendingRangeCalculator            0         0              3         0
>>>>                 0
>>>> MemtablePostFlusher               0         0            114         0
>>>>                 0
>>>>
>>>> Message type           Dropped
>>>> READ                         0
>>>> RANGE_SLICE                  0
>>>> _TRACE                       0
>>>> MUTATION                     0
>>>> COUNTER_MUTATION             0
>>>> BINARY                       0
>>>> REQUEST_RESPONSE             0
>>>> PAGED_RANGE                  0
>>>> READ_REPAIR                  0
>>>>
>>>> Node 3:
>>>> Pool Name                    Active   Pending      Completed   Blocked
>>>>  All time blocked
>>>> MutationStage                     0         0        1539324         0
>>>>                 0
>>>> ReadStage                         0         0           2571         0
>>>>                 0
>>>> RequestResponseStage              0         0         373300         0
>>>>                 0
>>>> ReadRepairStage                   0         0            325         0
>>>>                 0
>>>> ReplicateOnWriteStage             0         0              0         0
>>>>                 0
>>>> MiscStage                         0         0              0         0
>>>>                 0
>>>> HintedHandoff                     1         1             21         0
>>>>                 0
>>>> FlushWriter                       0         0             38         0
>>>>                 5
>>>> MemoryMeter                       0         0             59         0
>>>>                 0
>>>> GossipStage                       0         0          21491         0
>>>>                 0
>>>> CacheCleanupExecutor              0         0              0         0
>>>>                 0
>>>> InternalResponseStage             0         0              0         0
>>>>                 0
>>>> CompactionExecutor                4         9             85         0
>>>>                 0
>>>> ValidationExecutor                0         0              0         0
>>>>                 0
>>>> MigrationStage                    0         0              0         0
>>>>                 0
>>>> commitlog_archiver                0         0              0         0
>>>>                 0
>>>> AntiEntropyStage                  0         0              0         0
>>>>                 0
>>>> PendingRangeCalculator            0         0              6         0
>>>>                 0
>>>> MemtablePostFlusher               0         0            164         0
>>>>                 0
>>>>
>>>> Message type           Dropped
>>>> READ                         0
>>>> RANGE_SLICE                  0
>>>> _TRACE                       0
>>>> MUTATION                205259
>>>> COUNTER_MUTATION             0
>>>> BINARY                       0
>>>> REQUEST_RESPONSE             0
>>>> PAGED_RANGE                  0
>>>> READ_REPAIR                 18
>>>>
>>>>
>>>> Compaction seems like the only thing consistently active and pending
>>>>
>>>> On Tue, Dec 16, 2014 at 2:18 PM, Ryan Svihla <rsvi...@datastax.com>
>>>> wrote:
>>>>>
>>>>> Ok based on those numbers I have a theory..
>>>>>
>>>>> can you show me nodetool tptats for all 3 nodes?
>>>>>
>>>>> On Tue, Dec 16, 2014 at 4:04 PM, Arne Claassen <a...@emotient.com>
>>>>> wrote:
>>>>>>
>>>>>> No problem with the follow up questions. I'm on a crash course here
>>>>>> trying to understand what makes C* tick so I appreciate all feedback.
>>>>>>
>>>>>> We reprocessed all media (1200 partition keys) last night where
>>>>>> partition keys had somewhere between 4k and 200k "rows". After that
>>>>>> completed, no traffic went to cluster at all for ~8 hours and throughout
>>>>>> today, we may get a couple (less than 10) queries per second and maybe 
>>>>>> 3-4
>>>>>> write batches per hour.
>>>>>>
>>>>>> I assume the last value in the Partition Size histogram is the
>>>>>> largest row:
>>>>>>
>>>>>> 20924300 bytes: 79
>>>>>> 25109160 bytes: 57
>>>>>>
>>>>>> The majority seems clustered around 200000 bytes.
>>>>>>
>>>>>> I will look at switching my inserts to unlogged batches since they
>>>>>> are always for one partition key.
>>>>>>
>>>>>> On Tue, Dec 16, 2014 at 1:47 PM, Ryan Svihla <rsvi...@datastax.com>
>>>>>> wrote:
>>>>>>>
>>>>>>> Can you define what is "virtual no traffic" sorry to be repetitive
>>>>>>> about that, but I've worked on a lot of clusters in the past year and
>>>>>>> people have wildly different ideas what that means.
>>>>>>>
>>>>>>> unlogged batches of the same partition key are definitely a
>>>>>>> performance optimization. Typically async is much faster and easier on 
>>>>>>> the
>>>>>>> cluster when you're using multip partition key batches.
>>>>>>>
>>>>>>> nodetool cfhistograms <keyspace> <tablename>
>>>>>>>
>>>>>>> On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen <a...@emotient.com>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Actually not sure why the machine was originally configured at 6GB
>>>>>>>> since we even started it on an r3.large with 15GB.
>>>>>>>>
>>>>>>>> Re: Batches
>>>>>>>>
>>>>>>>> Not using batches. I actually have that as a separate question on
>>>>>>>> the list. Currently I fan out async single inserts and I'm wondering if
>>>>>>>> batches are better since my data is inherently inserted in blocks of
>>>>>>>> ordered rows for a single partition key.
>>>>>>>>
>>>>>>>>
>>>>>>>> Re: Traffic
>>>>>>>>
>>>>>>>> There isn't all that much traffic. Inserts come in as blocks per
>>>>>>>> partition key, but then can be 5k-200k rows for that partition key. 
>>>>>>>> Each of
>>>>>>>> these rows is less than 100k. It's small, lots of ordered rows. It's 
>>>>>>>> frame
>>>>>>>> and sub-frame information for media. and rows for one piece of media is
>>>>>>>> inserted at once (the partition key).
>>>>>>>>
>>>>>>>> For the last 12 hours, where the load on all these machine has been
>>>>>>>> stuck there's been virtually no traffic at all. This is the nodes 
>>>>>>>> basically
>>>>>>>> sitting idle, except that they had  load of 4 each.
>>>>>>>>
>>>>>>>> BTW, how do you determine widest row or for that matter number of
>>>>>>>> tombstones in a row?
>>>>>>>>
>>>>>>>> thanks,
>>>>>>>> arne
>>>>>>>>
>>>>>>>> On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla <rsvi...@datastax.com>
>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>> So 1024 is still a good 2.5 times what I'm suggesting, 6GB is
>>>>>>>>> hardly enough to run Cassandra well in, especially if you're going 
>>>>>>>>> full
>>>>>>>>> bore on loads. However, you maybe just flat out be CPU bound on your 
>>>>>>>>> write
>>>>>>>>> throughput, how many TPS and what size writes do you have? Also what 
>>>>>>>>> is
>>>>>>>>> your widest row?
>>>>>>>>>
>>>>>>>>> Final question what is compaction throughput at?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen <a...@emotient.com>
>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>> The starting configuration I had, which is still running on two
>>>>>>>>>> of the nodes, was 6GB Heap, 1024MB parnew which is close to what you 
>>>>>>>>>> are
>>>>>>>>>> suggesting and those have been pegged at load 4 for the over 12 
>>>>>>>>>> hours with
>>>>>>>>>> hardly and read or write traffic. I will set one to 8GB/400MB and 
>>>>>>>>>> see if
>>>>>>>>>> its load changes.
>>>>>>>>>>
>>>>>>>>>> On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla <
>>>>>>>>>> rsvi...@datastax.com> wrote:
>>>>>>>>>>
>>>>>>>>>>> So heap of that size without some tuning will create a number of
>>>>>>>>>>> problems (high cpu usage one of them), I suggest either 8GB heap 
>>>>>>>>>>> and 400mb
>>>>>>>>>>> parnew (which I'd only set that low for that low cpu count) , or 
>>>>>>>>>>> attempt
>>>>>>>>>>> the tunings as indicated in
>>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-8150
>>>>>>>>>>>
>>>>>>>>>>> On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen <
>>>>>>>>>>> a...@emotient.com> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> Changed the 15GB node to 25GB heap and the nice CPU is down to
>>>>>>>>>>>> ~20% now. Checked my dev cluster to see if the ParNew log entries 
>>>>>>>>>>>> are just
>>>>>>>>>>>> par for the course, but not seeing them there. However, both have 
>>>>>>>>>>>> the
>>>>>>>>>>>> following every 30 seconds:
>>>>>>>>>>>>
>>>>>>>>>>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898
>>>>>>>>>>>> BatchlogManager.java (line 165) Started replayAllFailedBatches
>>>>>>>>>>>> DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899
>>>>>>>>>>>> ColumnFamilyStore.java (line 866) forceFlush requested but 
>>>>>>>>>>>> everything is
>>>>>>>>>>>> clean in batchlog
>>>>>>>>>>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899
>>>>>>>>>>>> BatchlogManager.java (line 200) Finished replayAllFailedBatches
>>>>>>>>>>>>
>>>>>>>>>>>> Is that just routine scheduled house-keeping or a sign of
>>>>>>>>>>>> something else?
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen <
>>>>>>>>>>>> a...@emotient.com> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Sorry, I meant 15GB heap on the one machine that has less nice
>>>>>>>>>>>>> CPU% now. The others are 6GB
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen <
>>>>>>>>>>>>> a...@emotient.com> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB
>>>>>>>>>>>>>> because we might go c3.2xlarge instead if CPU is more important 
>>>>>>>>>>>>>> than RAM
>>>>>>>>>>>>>> Storage is optimized EBS SSD (but iostat shows no real IO
>>>>>>>>>>>>>> going on)
>>>>>>>>>>>>>> Each node only has about 10GB with ownership of 67%, 64.7% &
>>>>>>>>>>>>>> 68.3%.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The node on which I set the Heap to 10GB from 6GB the
>>>>>>>>>>>>>> utlilization has dropped to 46%nice now, but the ParNew log 
>>>>>>>>>>>>>> messages still
>>>>>>>>>>>>>> continue at the same pace. I'm gonna up the HEAP to 20GB for a 
>>>>>>>>>>>>>> bit, see if
>>>>>>>>>>>>>> that brings that nice CPU further down.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> No TombstoneOverflowingExceptions.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla <
>>>>>>>>>>>>>> rsvi...@datastax.com> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> What's CPU, RAM, Storage layer, and data density per node?
>>>>>>>>>>>>>>> Exact heap settings would be nice. In the logs look for
>>>>>>>>>>>>>>> TombstoneOverflowingException
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen <
>>>>>>>>>>>>>>> a...@emotient.com> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm running 2.0.10.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> The data is all time series data and as we change our
>>>>>>>>>>>>>>>> pipeline, we've been periodically been reprocessing the data 
>>>>>>>>>>>>>>>> sources, which
>>>>>>>>>>>>>>>> causes each time series to be overwritten, i.e. every row per 
>>>>>>>>>>>>>>>> partition key
>>>>>>>>>>>>>>>> is deleted and re-written, so I assume i've been collecting a 
>>>>>>>>>>>>>>>> bunch of
>>>>>>>>>>>>>>>> tombstones.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Also, the presence of the ever present and never completing
>>>>>>>>>>>>>>>> compaction types, i assumed were an artifact of tombstoning, 
>>>>>>>>>>>>>>>> but i fully
>>>>>>>>>>>>>>>> admit to conjecture based on about ~20 blog posts and 
>>>>>>>>>>>>>>>> stackoverflow
>>>>>>>>>>>>>>>> questions i've surveyed.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I doubled the Heap on one node and it changed nothing
>>>>>>>>>>>>>>>> regarding the load or the ParNew log statements. New 
>>>>>>>>>>>>>>>> Generation Usage is
>>>>>>>>>>>>>>>> 50%, Eden itself is 56%.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Anything else i should look at and report, let me know.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield <
>>>>>>>>>>>>>>>> jlacefi...@datastax.com> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   What version of Cassandra are you running?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   If it's 2.0, we recently experienced something similar
>>>>>>>>>>>>>>>>> with 8447 [1], which 8485 [2] should hopefully resolve.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   Please note that 8447 is not related to tombstones.
>>>>>>>>>>>>>>>>> Tombstone processing can put a lot of pressure on the heap as 
>>>>>>>>>>>>>>>>> well. Why do
>>>>>>>>>>>>>>>>> you think you have a lot of tombstones in that one particular 
>>>>>>>>>>>>>>>>> table?
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>   [1] https://issues.apache.org/jira/browse/CASSANDRA-8447
>>>>>>>>>>>>>>>>>   [2] https://issues.apache.org/jira/browse/CASSANDRA-8485
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Jonathan
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [image: datastax_logo.png]
>>>>>>>>>>>>>>>>> Jonathan Lacefield
>>>>>>>>>>>>>>>>> Solution Architect | (404) 822 3487 |
>>>>>>>>>>>>>>>>> jlacefi...@datastax.com
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> [image: linkedin.png]
>>>>>>>>>>>>>>>>> <http://www.linkedin.com/in/jlacefield/> [image:
>>>>>>>>>>>>>>>>> facebook.png] <https://www.facebook.com/datastax> [image:
>>>>>>>>>>>>>>>>> twitter.png] <https://twitter.com/datastax> [image:
>>>>>>>>>>>>>>>>> g+.png] <https://plus.google.com/+Datastax/about>
>>>>>>>>>>>>>>>>> <http://feeds.feedburner.com/datastax>
>>>>>>>>>>>>>>>>> <https://github.com/datastax/>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen <
>>>>>>>>>>>>>>>>> a...@emotient.com> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> I have a three node cluster that has been sitting at a
>>>>>>>>>>>>>>>>>> load of 4 (for each node), 100% CPI utilization (although 
>>>>>>>>>>>>>>>>>> 92% nice) for
>>>>>>>>>>>>>>>>>> that last 12 hours, ever since some significant writes 
>>>>>>>>>>>>>>>>>> finished. I'm trying
>>>>>>>>>>>>>>>>>> to determine what tuning I should be doing to get it out of 
>>>>>>>>>>>>>>>>>> this state. The
>>>>>>>>>>>>>>>>>> debug log is just an endless series of:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042
>>>>>>>>>>>>>>>>>> GCInspector.java (line 118) GC for ParNew: 166 ms for 10 
>>>>>>>>>>>>>>>>>> collections,
>>>>>>>>>>>>>>>>>> 4400928736 used; max is 8000634880
>>>>>>>>>>>>>>>>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043
>>>>>>>>>>>>>>>>>> GCInspector.java (line 118) GC for ParNew: 165 ms for 10 
>>>>>>>>>>>>>>>>>> collections,
>>>>>>>>>>>>>>>>>> 4440011176 used; max is 8000634880
>>>>>>>>>>>>>>>>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043
>>>>>>>>>>>>>>>>>> GCInspector.java (line 118) GC for ParNew: 135 ms for 8 
>>>>>>>>>>>>>>>>>> collections,
>>>>>>>>>>>>>>>>>> 4402220568 used; max is 8000634880
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> iostat shows virtually no I/O.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Compaction may enter into this, but i don't really know
>>>>>>>>>>>>>>>>>> what to make of compaction stats since they never change:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [root@cassandra-37919c3a ~]# nodetool compactionstats
>>>>>>>>>>>>>>>>>> pending tasks: 10
>>>>>>>>>>>>>>>>>>           compaction type        keyspace           table
>>>>>>>>>>>>>>>>>>       completed           total      unit  progress
>>>>>>>>>>>>>>>>>>                Compaction           mediamedia_tracks_raw
>>>>>>>>>>>>>>>>>>       271651482       563615497     bytes    48.20%
>>>>>>>>>>>>>>>>>>                Compaction           mediamedia_tracks_raw
>>>>>>>>>>>>>>>>>>        30308910     21676695677     bytes     0.14%
>>>>>>>>>>>>>>>>>>                Compaction           mediamedia_tracks_raw
>>>>>>>>>>>>>>>>>>      1198384080      1815603161     bytes    66.00%
>>>>>>>>>>>>>>>>>> Active compaction remaining time :   0h22m24s
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> 5 minutes later:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> [root@cassandra-37919c3a ~]# nodetool compactionstats
>>>>>>>>>>>>>>>>>> pending tasks: 9
>>>>>>>>>>>>>>>>>>           compaction type        keyspace           table
>>>>>>>>>>>>>>>>>>       completed           total      unit  progress
>>>>>>>>>>>>>>>>>>                Compaction           mediamedia_tracks_raw
>>>>>>>>>>>>>>>>>>       271651482       563615497     bytes    48.20%
>>>>>>>>>>>>>>>>>>                Compaction           mediamedia_tracks_raw
>>>>>>>>>>>>>>>>>>        30308910     21676695677     bytes     0.14%
>>>>>>>>>>>>>>>>>>                Compaction           mediamedia_tracks_raw
>>>>>>>>>>>>>>>>>>      1198384080      1815603161     bytes    66.00%
>>>>>>>>>>>>>>>>>> Active compaction remaining time :   0h22m24s
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Sure the pending tasks went down by one, but the rest is
>>>>>>>>>>>>>>>>>> identical. media_tracks_raw likely has a bunch of tombstones 
>>>>>>>>>>>>>>>>>> (can't figure
>>>>>>>>>>>>>>>>>> out how to get stats on that).
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Is this behavior something that indicates that i need
>>>>>>>>>>>>>>>>>> more Heap, larger new generation? Should I be manually 
>>>>>>>>>>>>>>>>>> running compaction
>>>>>>>>>>>>>>>>>> on tables with lots of tombstones?
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Any suggestions or places to educate myself better on
>>>>>>>>>>>>>>>>>> performance tuning would be appreciated.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> arne
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>>>>>>>>>>> Ryan Svihla
>>>>>>>>>>>>>>> Solution Architect
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> [image: twitter.png] <https://twitter.com/foundev> [image:
>>>>>>>>>>>>>>> linkedin.png]
>>>>>>>>>>>>>>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> DataStax is the fastest, most scalable distributed database
>>>>>>>>>>>>>>> technology, delivering Apache Cassandra to the world’s most 
>>>>>>>>>>>>>>> innovative
>>>>>>>>>>>>>>> enterprises. Datastax is built to be agile, always-on, and 
>>>>>>>>>>>>>>> predictably
>>>>>>>>>>>>>>> scalable to any size. With more than 500 customers in 45 
>>>>>>>>>>>>>>> countries, DataStax
>>>>>>>>>>>>>>> is the database technology and transactional backbone of choice 
>>>>>>>>>>>>>>> for the
>>>>>>>>>>>>>>> worlds most innovative companies such as Netflix, Adobe, 
>>>>>>>>>>>>>>> Intuit, and eBay.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> --
>>>>>>>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>>>>>>> Ryan Svihla
>>>>>>>>>>> Solution Architect
>>>>>>>>>>>
>>>>>>>>>>> [image: twitter.png] <https://twitter.com/foundev> [image:
>>>>>>>>>>> linkedin.png]
>>>>>>>>>>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>>>>>>>>
>>>>>>>>>>> DataStax is the fastest, most scalable distributed database
>>>>>>>>>>> technology, delivering Apache Cassandra to the world’s most 
>>>>>>>>>>> innovative
>>>>>>>>>>> enterprises. Datastax is built to be agile, always-on, and 
>>>>>>>>>>> predictably
>>>>>>>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>>>>>>>> DataStax
>>>>>>>>>>> is the database technology and transactional backbone of choice for 
>>>>>>>>>>> the
>>>>>>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, 
>>>>>>>>>>> and eBay.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>>>>> Ryan Svihla
>>>>>>>>> Solution Architect
>>>>>>>>>
>>>>>>>>> [image: twitter.png] <https://twitter.com/foundev> [image:
>>>>>>>>> linkedin.png]
>>>>>>>>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>>>>>>
>>>>>>>>> DataStax is the fastest, most scalable distributed database
>>>>>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>>>>>> DataStax
>>>>>>>>> is the database technology and transactional backbone of choice for 
>>>>>>>>> the
>>>>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and 
>>>>>>>>> eBay.
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>>>> Ryan Svihla
>>>>>>> Solution Architect
>>>>>>>
>>>>>>> [image: twitter.png] <https://twitter.com/foundev> [image:
>>>>>>> linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>>>>
>>>>>>> DataStax is the fastest, most scalable distributed database
>>>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>>>> DataStax
>>>>>>> is the database technology and transactional backbone of choice for the
>>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and 
>>>>>>> eBay.
>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>>>> Ryan Svihla
>>>>> Solution Architect
>>>>>
>>>>> [image: twitter.png] <https://twitter.com/foundev> [image:
>>>>> linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>>>
>>>>> DataStax is the fastest, most scalable distributed database
>>>>> technology, delivering Apache Cassandra to the world’s most innovative
>>>>> enterprises. Datastax is built to be agile, always-on, and predictably
>>>>> scalable to any size. With more than 500 customers in 45 countries, 
>>>>> DataStax
>>>>> is the database technology and transactional backbone of choice for the
>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>>>
>>>>>
>>>
>>> --
>>> [image: datastax_logo.png] <http://www.datastax.com/>
>>> Ryan Svihla
>>> Solution Architect
>>>
>>> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
>>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>>
>>> DataStax is the fastest, most scalable distributed database technology,
>>> delivering Apache Cassandra to the world’s most innovative enterprises.
>>> Datastax is built to be agile, always-on, and predictably scalable to any
>>> size. With more than 500 customers in 45 countries, DataStax is the
>>> database technology and transactional backbone of choice for the worlds
>>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>>
>>>
>>>
>>
>> --
>> [image: datastax_logo.png] <http://www.datastax.com/>
>> Ryan Svihla
>> Solution Architect
>>
>> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>>
>> DataStax is the fastest, most scalable distributed database technology,
>> delivering Apache Cassandra to the world’s most innovative enterprises.
>> Datastax is built to be agile, always-on, and predictably scalable to any
>> size. With more than 500 customers in 45 countries, DataStax is the
>> database technology and transactional backbone of choice for the worlds
>> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>>
>>
>>
>
> --
> [image: datastax_logo.png] <http://www.datastax.com/>
> Ryan Svihla
> Solution Architect
>
> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png]
> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/>
>
> DataStax is the fastest, most scalable distributed database technology,
> delivering Apache Cassandra to the world’s most innovative enterprises.
> Datastax is built to be agile, always-on, and predictably scalable to any
> size. With more than 500 customers in 45 countries, DataStax is the
> database technology and transactional backbone of choice for the worlds
> most innovative companies such as Netflix, Adobe, Intuit, and eBay.
>
>
>

Re: 100% CPU utilization, ParNew and never completing compactions

Reply via email to