What version of Cassandra? On Dec 16, 2014 6:36 PM, "Arne Claassen" <a...@emotient.com> wrote:
> That's just the thing. There is nothing in the logs except the constant > ParNew collections like > > DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 GCInspector.java (line > 118) GC for ParNew: 166 ms for 10 collections, 4400928736 used; max is > 8000634888 > > But the load is staying continuously high. > > There's always some compaction on just that one table, media_tracks_raw > going on and those values rarely changed (certainly the remaining time is > meaningless) > > pending tasks: 17 > compaction type keyspace table completed > total unit progress > Compaction mediamedia_tracks_raw 444294932 > 1310653468 bytes 33.90% > Compaction mediamedia_tracks_raw 131931354 > 3411631999 bytes 3.87% > Compaction mediamedia_tracks_raw 30308970 > 23097672194 bytes 0.13% > Compaction mediamedia_tracks_raw 899216961 > 1815591081 bytes 49.53% > Active compaction remaining time : 0h27m56s > > Here's a sample of a query trace: > > activity > | timestamp | source | source_elapsed > > --------------------------------------------------------------------------------------------------+--------------+---------------+---------------- > > execute_cql3_query | 00:11:46,612 | 10.140.22.236 | 0 > Parsing select * from media_tracks_raw where id > =74fe9449-8ac4-accb-a723-4bad024101e3 limit 100; | 00:11:46,612 | > 10.140.22.236 | 47 > > Preparing statement | 00:11:46,612 | 10.140.22.236 | 234 > Sending > message to /10.140.21.54 | 00:11:46,619 | 10.140.22.236 | 7190 > Message > received from /10.140.22.236 | 00:11:46,622 | 10.140.21.54 | > 12 > Executing single-partition > query on media_tracks_raw | 00:11:46,644 | 10.140.21.54 | 21971 > > Acquiring sstable references | 00:11:46,644 | 10.140.21.54 | > 22029 > > Merging memtable tombstones | 00:11:46,644 | 10.140.21.54 | 22131 > Bloom filter > allows skipping sstable 1395 | 00:11:46,644 | 10.140.21.54 | 22245 > Bloom filter > allows skipping sstable 1394 | 00:11:46,644 | 10.140.21.54 | 22279 > Bloom filter > allows skipping sstable 1391 | 00:11:46,644 | 10.140.21.54 | 22293 > Bloom filter > allows skipping sstable 1381 | 00:11:46,644 | 10.140.21.54 | 22304 > Bloom filter > allows skipping sstable 1376 | 00:11:46,644 | 10.140.21.54 | 22317 > Bloom filter > allows skipping sstable 1368 | 00:11:46,644 | 10.140.21.54 | 22328 > Bloom filter > allows skipping sstable 1365 | 00:11:46,644 | 10.140.21.54 | 22340 > Bloom filter > allows skipping sstable 1351 | 00:11:46,644 | 10.140.21.54 | 22352 > Bloom filter > allows skipping sstable 1367 | 00:11:46,644 | 10.140.21.54 | 22363 > Bloom filter > allows skipping sstable 1380 | 00:11:46,644 | 10.140.21.54 | 22374 > Bloom filter > allows skipping sstable 1343 | 00:11:46,644 | 10.140.21.54 | 22386 > Bloom filter > allows skipping sstable 1342 | 00:11:46,644 | 10.140.21.54 | 22397 > Bloom filter > allows skipping sstable 1334 | 00:11:46,644 | 10.140.21.54 | 22408 > Bloom filter > allows skipping sstable 1377 | 00:11:46,644 | 10.140.21.54 | 22429 > Bloom filter > allows skipping sstable 1330 | 00:11:46,644 | 10.140.21.54 | 22441 > Bloom filter > allows skipping sstable 1329 | 00:11:46,644 | 10.140.21.54 | 22452 > Bloom filter > allows skipping sstable 1328 | 00:11:46,644 | 10.140.21.54 | 22463 > Bloom filter > allows skipping sstable 1327 | 00:11:46,644 | 10.140.21.54 | 22475 > Bloom filter > allows skipping sstable 1326 | 00:11:46,644 | 10.140.21.54 | 22488 > Bloom filter > allows skipping sstable 1320 | 00:11:46,644 | 10.140.21.54 | 22506 > Bloom filter > allows skipping sstable 1319 | 00:11:46,644 | 10.140.21.54 | 22518 > Bloom filter > allows skipping sstable 1318 | 00:11:46,644 | 10.140.21.54 | 22528 > Bloom filter > allows skipping sstable 1317 | 00:11:46,644 | 10.140.21.54 | 22540 > Bloom filter > allows skipping sstable 1316 | 00:11:46,644 | 10.140.21.54 | 22552 > Bloom filter > allows skipping sstable 1315 | 00:11:46,644 | 10.140.21.54 | 22563 > Bloom filter > allows skipping sstable 1314 | 00:11:46,644 | 10.140.21.54 | 22572 > Bloom filter > allows skipping sstable 1313 | 00:11:46,644 | 10.140.21.54 | 22583 > Bloom filter > allows skipping sstable 1312 | 00:11:46,644 | 10.140.21.54 | 22594 > Bloom filter > allows skipping sstable 1311 | 00:11:46,644 | 10.140.21.54 | 22605 > Bloom filter > allows skipping sstable 1310 | 00:11:46,644 | 10.140.21.54 | 22616 > Bloom filter > allows skipping sstable 1309 | 00:11:46,644 | 10.140.21.54 | 22628 > Bloom filter > allows skipping sstable 1308 | 00:11:46,644 | 10.140.21.54 | 22640 > Bloom filter > allows skipping sstable 1307 | 00:11:46,644 | 10.140.21.54 | 22651 > Bloom filter > allows skipping sstable 1306 | 00:11:46,644 | 10.140.21.54 | 22663 > Bloom filter > allows skipping sstable 1305 | 00:11:46,644 | 10.140.21.54 | 22674 > Bloom filter > allows skipping sstable 1304 | 00:11:46,644 | 10.140.21.54 | 22684 > Bloom filter > allows skipping sstable 1303 | 00:11:46,644 | 10.140.21.54 | 22696 > Bloom filter > allows skipping sstable 1302 | 00:11:46,644 | 10.140.21.54 | 22707 > Bloom filter > allows skipping sstable 1301 | 00:11:46,644 | 10.140.21.54 | 22718 > Bloom filter > allows skipping sstable 1300 | 00:11:46,644 | 10.140.21.54 | 22729 > Bloom filter > allows skipping sstable 1299 | 00:11:46,644 | 10.140.21.54 | 22740 > Bloom filter > allows skipping sstable 1298 | 00:11:46,644 | 10.140.21.54 | 22752 > Bloom filter > allows skipping sstable 1297 | 00:11:46,644 | 10.140.21.54 | 22763 > Bloom filter > allows skipping sstable 1296 | 00:11:46,644 | 10.140.21.54 | 22774 > Key > cache hit for sstable 1295 | 00:11:46,644 | 10.140.21.54 | 22817 > Seeking to partition > beginning in data file | 00:11:46,644 | 10.140.21.54 | 22842 > Skipped 0/89 non-slice-intersecting sstables, > included 0 due to tombstones | 00:11:46,646 | 10.140.21.54 | 24109 > Merging data from > memtables and 1 sstables | 00:11:46,646 | 10.140.21.54 | 24238 > Read 101 live > and 0 tombstoned cells | 00:11:46,663 | 10.140.21.54 | 41389 > Enqueuing > response to /10.140.22.236 | 00:11:46,663 | 10.140.21.54 | 41831 > Sending > message to /10.140.22.236 | 00:11:46,664 | 10.140.21.54 | 41972 > Message > received from /10.140.21.54 | 00:11:46,671 | 10.140.22.236 | > 59498 > Processing > response from /10.140.21.54 | 00:11:46,672 | 10.140.22.236 | > 59563 > > Request complete | 00:11:46,704 | 10.140.22.236 | 92781 > > Every query I did always just had three mentions of tombstones > Merging memtable tombstones > Skipped 0/89 non-slice-intersecting sstables, included 0 due to > tombstones > Read 101 live and 0 tombstoned cells > And unless i misread those, not of them claim that there are any > tombstones. > > > On Dec 16, 2014, at 4:26 PM, Ryan Svihla <rsvi...@datastax.com> wrote: > > manual forced compactions create more problems than they solve, if you > have no evidence of tombstones in your selects (which seems odd, can you > share some of the tracing output?), then I'm not sure what it would solve > for you. > > Compaction running could explain a high load, logs messages with ERRORS, > WARN, GCInspector are all meaningful there, I suggest search jira for your > version to see if there are any interesting bugs. > > > > On Tue, Dec 16, 2014 at 6:14 PM, Arne Claassen <a...@emotient.com> wrote: >> >> I just did a wide set of selects and ran across no tombstones. But while >> on the subject of gc_grace_seconds, any reason, on a small cluster not to >> set it to something low like a single day. It seems like 10 days is only >> need to large clusters undergoing long partition splits, or am i >> misunderstanding gc_grace_seconds. >> >> Now, given all that, does any of this explain a high load when the >> cluster is idle? Is it compaction catching up and would manual forced >> compaction alleviate that? >> >> thanks, >> arne >> >> >> On Dec 16, 2014, at 3:28 PM, Ryan Svihla <rsvi...@datastax.com> wrote: >> >> so a delete is really another write for gc_grace_seconds (default 10 >> days), if you get enough tombstones it can make managing your cluster a >> challenge as is. open up cqlsh, turn on tracing and try a few queries..how >> many tombstones are scanned for a given query? It's possible the heap >> problems you're seeing are actually happening on the query side and not on >> the ingest side, the severity of this depends on driver and cassandra >> version, but older drivers and versions of cassandra could easily overload >> heap with expensive selects, when layered over tombstones it's certainly >> becomes a possibility this is your root cause. >> >> Now this will primarily create more load on compaction and depending on >> your cassandra version there maybe some other issue at work, but something >> I can tell you is every time I see 1 dropped mutation I see a cluster that >> was overloaded enough it had to shed load. If I see 200k I see a >> cluster/configuration/hardware that is badly overloaded. >> >> I suggest the following >> >> - trace some of the queries used in prod >> - monitor your ingest rate, see at what levels you run into issues >> (GCInspector log messages, dropped mutations, etc) >> - heap configuration we mentioned earlier..go ahead and monitor heap >> usage, if it hits 75% repeated this is an indication of heavy load >> - monitor dropped mutations..any dropped mutation is evidence of an >> overloaded server, again the root cause can be many other problems that >> are >> solvable with current hardware, and LOTS of people runs with nodes with >> similar configuration. >> >> >> On Tue, Dec 16, 2014 at 5:08 PM, Arne Claassen <a...@emotient.com> wrote: >>> >>> Not using any secondary indicies and memtable_flush_queue_size is the >>> default 4. >>> >>> But let me tell you how data is "mutated" right now, maybe that will >>> give you an insight on how this is happening >>> >>> Basically the frame data table has the following primary key: PRIMARY >>> KEY ((id), trackid, "timestamp") >>> >>> Generally data is inserted once. So day to day writes are all new rows. >>> However, when out process for generating analytics for these rows >>> changes, we run the media back through again, causing overwrites. >>> >>> Up until last night, this was just a new insert because the PK never >>> changed so it was always 1-to-1 overwrite of every row. >>> >>> Last night was the first time that a new change went in where the PK >>> could actually change so now the process is always, DELETE by partition >>> key, insert all rows for partition key, repeat. >>> >>> We two tables that have similar frame data projections and some other >>> aggregates with much smaller row count per partition key. >>> >>> hope that helps, >>> arne >>> >>> On Dec 16, 2014, at 2:46 PM, Ryan Svihla <rsvi...@datastax.com> wrote: >>> >>> so you've got some blocked flush writers but you have a incredibly large >>> number of dropped mutations, are you using secondary indexes? and if so how >>> many? what is your flush queue set to? >>> >>> On Tue, Dec 16, 2014 at 4:43 PM, Arne Claassen <a...@emotient.com> >>> wrote: >>>> >>>> Of course QA decided to start a test batch (still relatively low >>>> traffic), so I hope it doesn't throw the tpstats off too much >>>> >>>> Node 1: >>>> Pool Name Active Pending Completed Blocked >>>> All time blocked >>>> MutationStage 0 0 13804928 0 >>>> 0 >>>> ReadStage 0 0 10975 0 >>>> 0 >>>> RequestResponseStage 0 0 7725378 0 >>>> 0 >>>> ReadRepairStage 0 0 1247 0 >>>> 0 >>>> ReplicateOnWriteStage 0 0 0 0 >>>> 0 >>>> MiscStage 0 0 0 0 >>>> 0 >>>> HintedHandoff 1 1 50 0 >>>> 0 >>>> FlushWriter 0 0 306 0 >>>> 31 >>>> MemoryMeter 0 0 719 0 >>>> 0 >>>> GossipStage 0 0 286505 0 >>>> 0 >>>> CacheCleanupExecutor 0 0 0 0 >>>> 0 >>>> InternalResponseStage 0 0 0 0 >>>> 0 >>>> CompactionExecutor 4 14 159 0 >>>> 0 >>>> ValidationExecutor 0 0 0 0 >>>> 0 >>>> MigrationStage 0 0 0 0 >>>> 0 >>>> commitlog_archiver 0 0 0 0 >>>> 0 >>>> AntiEntropyStage 0 0 0 0 >>>> 0 >>>> PendingRangeCalculator 0 0 11 0 >>>> 0 >>>> MemtablePostFlusher 0 0 1781 0 >>>> 0 >>>> >>>> Message type Dropped >>>> READ 0 >>>> RANGE_SLICE 0 >>>> _TRACE 0 >>>> MUTATION 391041 >>>> COUNTER_MUTATION 0 >>>> BINARY 0 >>>> REQUEST_RESPONSE 0 >>>> PAGED_RANGE 0 >>>> READ_REPAIR 0 >>>> >>>> Node 2: >>>> Pool Name Active Pending Completed Blocked >>>> All time blocked >>>> MutationStage 0 0 997042 0 >>>> 0 >>>> ReadStage 0 0 2623 0 >>>> 0 >>>> RequestResponseStage 0 0 706650 0 >>>> 0 >>>> ReadRepairStage 0 0 275 0 >>>> 0 >>>> ReplicateOnWriteStage 0 0 0 0 >>>> 0 >>>> MiscStage 0 0 0 0 >>>> 0 >>>> HintedHandoff 2 2 12 0 >>>> 0 >>>> FlushWriter 0 0 37 0 >>>> 4 >>>> MemoryMeter 0 0 70 0 >>>> 0 >>>> GossipStage 0 0 14927 0 >>>> 0 >>>> CacheCleanupExecutor 0 0 0 0 >>>> 0 >>>> InternalResponseStage 0 0 0 0 >>>> 0 >>>> CompactionExecutor 4 7 94 0 >>>> 0 >>>> ValidationExecutor 0 0 0 0 >>>> 0 >>>> MigrationStage 0 0 0 0 >>>> 0 >>>> commitlog_archiver 0 0 0 0 >>>> 0 >>>> AntiEntropyStage 0 0 0 0 >>>> 0 >>>> PendingRangeCalculator 0 0 3 0 >>>> 0 >>>> MemtablePostFlusher 0 0 114 0 >>>> 0 >>>> >>>> Message type Dropped >>>> READ 0 >>>> RANGE_SLICE 0 >>>> _TRACE 0 >>>> MUTATION 0 >>>> COUNTER_MUTATION 0 >>>> BINARY 0 >>>> REQUEST_RESPONSE 0 >>>> PAGED_RANGE 0 >>>> READ_REPAIR 0 >>>> >>>> Node 3: >>>> Pool Name Active Pending Completed Blocked >>>> All time blocked >>>> MutationStage 0 0 1539324 0 >>>> 0 >>>> ReadStage 0 0 2571 0 >>>> 0 >>>> RequestResponseStage 0 0 373300 0 >>>> 0 >>>> ReadRepairStage 0 0 325 0 >>>> 0 >>>> ReplicateOnWriteStage 0 0 0 0 >>>> 0 >>>> MiscStage 0 0 0 0 >>>> 0 >>>> HintedHandoff 1 1 21 0 >>>> 0 >>>> FlushWriter 0 0 38 0 >>>> 5 >>>> MemoryMeter 0 0 59 0 >>>> 0 >>>> GossipStage 0 0 21491 0 >>>> 0 >>>> CacheCleanupExecutor 0 0 0 0 >>>> 0 >>>> InternalResponseStage 0 0 0 0 >>>> 0 >>>> CompactionExecutor 4 9 85 0 >>>> 0 >>>> ValidationExecutor 0 0 0 0 >>>> 0 >>>> MigrationStage 0 0 0 0 >>>> 0 >>>> commitlog_archiver 0 0 0 0 >>>> 0 >>>> AntiEntropyStage 0 0 0 0 >>>> 0 >>>> PendingRangeCalculator 0 0 6 0 >>>> 0 >>>> MemtablePostFlusher 0 0 164 0 >>>> 0 >>>> >>>> Message type Dropped >>>> READ 0 >>>> RANGE_SLICE 0 >>>> _TRACE 0 >>>> MUTATION 205259 >>>> COUNTER_MUTATION 0 >>>> BINARY 0 >>>> REQUEST_RESPONSE 0 >>>> PAGED_RANGE 0 >>>> READ_REPAIR 18 >>>> >>>> >>>> Compaction seems like the only thing consistently active and pending >>>> >>>> On Tue, Dec 16, 2014 at 2:18 PM, Ryan Svihla <rsvi...@datastax.com> >>>> wrote: >>>>> >>>>> Ok based on those numbers I have a theory.. >>>>> >>>>> can you show me nodetool tptats for all 3 nodes? >>>>> >>>>> On Tue, Dec 16, 2014 at 4:04 PM, Arne Claassen <a...@emotient.com> >>>>> wrote: >>>>>> >>>>>> No problem with the follow up questions. I'm on a crash course here >>>>>> trying to understand what makes C* tick so I appreciate all feedback. >>>>>> >>>>>> We reprocessed all media (1200 partition keys) last night where >>>>>> partition keys had somewhere between 4k and 200k "rows". After that >>>>>> completed, no traffic went to cluster at all for ~8 hours and throughout >>>>>> today, we may get a couple (less than 10) queries per second and maybe >>>>>> 3-4 >>>>>> write batches per hour. >>>>>> >>>>>> I assume the last value in the Partition Size histogram is the >>>>>> largest row: >>>>>> >>>>>> 20924300 bytes: 79 >>>>>> 25109160 bytes: 57 >>>>>> >>>>>> The majority seems clustered around 200000 bytes. >>>>>> >>>>>> I will look at switching my inserts to unlogged batches since they >>>>>> are always for one partition key. >>>>>> >>>>>> On Tue, Dec 16, 2014 at 1:47 PM, Ryan Svihla <rsvi...@datastax.com> >>>>>> wrote: >>>>>>> >>>>>>> Can you define what is "virtual no traffic" sorry to be repetitive >>>>>>> about that, but I've worked on a lot of clusters in the past year and >>>>>>> people have wildly different ideas what that means. >>>>>>> >>>>>>> unlogged batches of the same partition key are definitely a >>>>>>> performance optimization. Typically async is much faster and easier on >>>>>>> the >>>>>>> cluster when you're using multip partition key batches. >>>>>>> >>>>>>> nodetool cfhistograms <keyspace> <tablename> >>>>>>> >>>>>>> On Tue, Dec 16, 2014 at 3:42 PM, Arne Claassen <a...@emotient.com> >>>>>>> wrote: >>>>>>>> >>>>>>>> Actually not sure why the machine was originally configured at 6GB >>>>>>>> since we even started it on an r3.large with 15GB. >>>>>>>> >>>>>>>> Re: Batches >>>>>>>> >>>>>>>> Not using batches. I actually have that as a separate question on >>>>>>>> the list. Currently I fan out async single inserts and I'm wondering if >>>>>>>> batches are better since my data is inherently inserted in blocks of >>>>>>>> ordered rows for a single partition key. >>>>>>>> >>>>>>>> >>>>>>>> Re: Traffic >>>>>>>> >>>>>>>> There isn't all that much traffic. Inserts come in as blocks per >>>>>>>> partition key, but then can be 5k-200k rows for that partition key. >>>>>>>> Each of >>>>>>>> these rows is less than 100k. It's small, lots of ordered rows. It's >>>>>>>> frame >>>>>>>> and sub-frame information for media. and rows for one piece of media is >>>>>>>> inserted at once (the partition key). >>>>>>>> >>>>>>>> For the last 12 hours, where the load on all these machine has been >>>>>>>> stuck there's been virtually no traffic at all. This is the nodes >>>>>>>> basically >>>>>>>> sitting idle, except that they had load of 4 each. >>>>>>>> >>>>>>>> BTW, how do you determine widest row or for that matter number of >>>>>>>> tombstones in a row? >>>>>>>> >>>>>>>> thanks, >>>>>>>> arne >>>>>>>> >>>>>>>> On Tue, Dec 16, 2014 at 1:24 PM, Ryan Svihla <rsvi...@datastax.com> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> So 1024 is still a good 2.5 times what I'm suggesting, 6GB is >>>>>>>>> hardly enough to run Cassandra well in, especially if you're going >>>>>>>>> full >>>>>>>>> bore on loads. However, you maybe just flat out be CPU bound on your >>>>>>>>> write >>>>>>>>> throughput, how many TPS and what size writes do you have? Also what >>>>>>>>> is >>>>>>>>> your widest row? >>>>>>>>> >>>>>>>>> Final question what is compaction throughput at? >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, Dec 16, 2014 at 3:20 PM, Arne Claassen <a...@emotient.com> >>>>>>>>> wrote: >>>>>>>>>> >>>>>>>>>> The starting configuration I had, which is still running on two >>>>>>>>>> of the nodes, was 6GB Heap, 1024MB parnew which is close to what you >>>>>>>>>> are >>>>>>>>>> suggesting and those have been pegged at load 4 for the over 12 >>>>>>>>>> hours with >>>>>>>>>> hardly and read or write traffic. I will set one to 8GB/400MB and >>>>>>>>>> see if >>>>>>>>>> its load changes. >>>>>>>>>> >>>>>>>>>> On Tue, Dec 16, 2014 at 1:12 PM, Ryan Svihla < >>>>>>>>>> rsvi...@datastax.com> wrote: >>>>>>>>>> >>>>>>>>>>> So heap of that size without some tuning will create a number of >>>>>>>>>>> problems (high cpu usage one of them), I suggest either 8GB heap >>>>>>>>>>> and 400mb >>>>>>>>>>> parnew (which I'd only set that low for that low cpu count) , or >>>>>>>>>>> attempt >>>>>>>>>>> the tunings as indicated in >>>>>>>>>>> https://issues.apache.org/jira/browse/CASSANDRA-8150 >>>>>>>>>>> >>>>>>>>>>> On Tue, Dec 16, 2014 at 3:06 PM, Arne Claassen < >>>>>>>>>>> a...@emotient.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>> Changed the 15GB node to 25GB heap and the nice CPU is down to >>>>>>>>>>>> ~20% now. Checked my dev cluster to see if the ParNew log entries >>>>>>>>>>>> are just >>>>>>>>>>>> par for the course, but not seeing them there. However, both have >>>>>>>>>>>> the >>>>>>>>>>>> following every 30 seconds: >>>>>>>>>>>> >>>>>>>>>>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,898 >>>>>>>>>>>> BatchlogManager.java (line 165) Started replayAllFailedBatches >>>>>>>>>>>> DEBUG [MemtablePostFlusher:1] 2014-12-16 21:00:44,899 >>>>>>>>>>>> ColumnFamilyStore.java (line 866) forceFlush requested but >>>>>>>>>>>> everything is >>>>>>>>>>>> clean in batchlog >>>>>>>>>>>> DEBUG [BatchlogTasks:1] 2014-12-16 21:00:44,899 >>>>>>>>>>>> BatchlogManager.java (line 200) Finished replayAllFailedBatches >>>>>>>>>>>> >>>>>>>>>>>> Is that just routine scheduled house-keeping or a sign of >>>>>>>>>>>> something else? >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Dec 16, 2014 at 12:52 PM, Arne Claassen < >>>>>>>>>>>> a...@emotient.com> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry, I meant 15GB heap on the one machine that has less nice >>>>>>>>>>>>> CPU% now. The others are 6GB >>>>>>>>>>>>> >>>>>>>>>>>>> On Tue, Dec 16, 2014 at 12:50 PM, Arne Claassen < >>>>>>>>>>>>> a...@emotient.com> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> AWS r3.xlarge, 30GB, but only using a Heap of 10GB, new 2GB >>>>>>>>>>>>>> because we might go c3.2xlarge instead if CPU is more important >>>>>>>>>>>>>> than RAM >>>>>>>>>>>>>> Storage is optimized EBS SSD (but iostat shows no real IO >>>>>>>>>>>>>> going on) >>>>>>>>>>>>>> Each node only has about 10GB with ownership of 67%, 64.7% & >>>>>>>>>>>>>> 68.3%. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The node on which I set the Heap to 10GB from 6GB the >>>>>>>>>>>>>> utlilization has dropped to 46%nice now, but the ParNew log >>>>>>>>>>>>>> messages still >>>>>>>>>>>>>> continue at the same pace. I'm gonna up the HEAP to 20GB for a >>>>>>>>>>>>>> bit, see if >>>>>>>>>>>>>> that brings that nice CPU further down. >>>>>>>>>>>>>> >>>>>>>>>>>>>> No TombstoneOverflowingExceptions. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Dec 16, 2014 at 11:50 AM, Ryan Svihla < >>>>>>>>>>>>>> rsvi...@datastax.com> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> What's CPU, RAM, Storage layer, and data density per node? >>>>>>>>>>>>>>> Exact heap settings would be nice. In the logs look for >>>>>>>>>>>>>>> TombstoneOverflowingException >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Tue, Dec 16, 2014 at 1:36 PM, Arne Claassen < >>>>>>>>>>>>>>> a...@emotient.com> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I'm running 2.0.10. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> The data is all time series data and as we change our >>>>>>>>>>>>>>>> pipeline, we've been periodically been reprocessing the data >>>>>>>>>>>>>>>> sources, which >>>>>>>>>>>>>>>> causes each time series to be overwritten, i.e. every row per >>>>>>>>>>>>>>>> partition key >>>>>>>>>>>>>>>> is deleted and re-written, so I assume i've been collecting a >>>>>>>>>>>>>>>> bunch of >>>>>>>>>>>>>>>> tombstones. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Also, the presence of the ever present and never completing >>>>>>>>>>>>>>>> compaction types, i assumed were an artifact of tombstoning, >>>>>>>>>>>>>>>> but i fully >>>>>>>>>>>>>>>> admit to conjecture based on about ~20 blog posts and >>>>>>>>>>>>>>>> stackoverflow >>>>>>>>>>>>>>>> questions i've surveyed. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I doubled the Heap on one node and it changed nothing >>>>>>>>>>>>>>>> regarding the load or the ParNew log statements. New >>>>>>>>>>>>>>>> Generation Usage is >>>>>>>>>>>>>>>> 50%, Eden itself is 56%. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Anything else i should look at and report, let me know. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Tue, Dec 16, 2014 at 11:14 AM, Jonathan Lacefield < >>>>>>>>>>>>>>>> jlacefi...@datastax.com> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hello, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> What version of Cassandra are you running? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> If it's 2.0, we recently experienced something similar >>>>>>>>>>>>>>>>> with 8447 [1], which 8485 [2] should hopefully resolve. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Please note that 8447 is not related to tombstones. >>>>>>>>>>>>>>>>> Tombstone processing can put a lot of pressure on the heap as >>>>>>>>>>>>>>>>> well. Why do >>>>>>>>>>>>>>>>> you think you have a lot of tombstones in that one particular >>>>>>>>>>>>>>>>> table? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/CASSANDRA-8447 >>>>>>>>>>>>>>>>> [2] https://issues.apache.org/jira/browse/CASSANDRA-8485 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Jonathan >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [image: datastax_logo.png] >>>>>>>>>>>>>>>>> Jonathan Lacefield >>>>>>>>>>>>>>>>> Solution Architect | (404) 822 3487 | >>>>>>>>>>>>>>>>> jlacefi...@datastax.com >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [image: linkedin.png] >>>>>>>>>>>>>>>>> <http://www.linkedin.com/in/jlacefield/> [image: >>>>>>>>>>>>>>>>> facebook.png] <https://www.facebook.com/datastax> [image: >>>>>>>>>>>>>>>>> twitter.png] <https://twitter.com/datastax> [image: >>>>>>>>>>>>>>>>> g+.png] <https://plus.google.com/+Datastax/about> >>>>>>>>>>>>>>>>> <http://feeds.feedburner.com/datastax> >>>>>>>>>>>>>>>>> <https://github.com/datastax/> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Tue, Dec 16, 2014 at 2:04 PM, Arne Claassen < >>>>>>>>>>>>>>>>> a...@emotient.com> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I have a three node cluster that has been sitting at a >>>>>>>>>>>>>>>>>> load of 4 (for each node), 100% CPI utilization (although >>>>>>>>>>>>>>>>>> 92% nice) for >>>>>>>>>>>>>>>>>> that last 12 hours, ever since some significant writes >>>>>>>>>>>>>>>>>> finished. I'm trying >>>>>>>>>>>>>>>>>> to determine what tuning I should be doing to get it out of >>>>>>>>>>>>>>>>>> this state. The >>>>>>>>>>>>>>>>>> debug log is just an endless series of: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:35,042 >>>>>>>>>>>>>>>>>> GCInspector.java (line 118) GC for ParNew: 166 ms for 10 >>>>>>>>>>>>>>>>>> collections, >>>>>>>>>>>>>>>>>> 4400928736 used; max is 8000634880 >>>>>>>>>>>>>>>>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:36,043 >>>>>>>>>>>>>>>>>> GCInspector.java (line 118) GC for ParNew: 165 ms for 10 >>>>>>>>>>>>>>>>>> collections, >>>>>>>>>>>>>>>>>> 4440011176 used; max is 8000634880 >>>>>>>>>>>>>>>>>> DEBUG [ScheduledTasks:1] 2014-12-16 19:03:37,043 >>>>>>>>>>>>>>>>>> GCInspector.java (line 118) GC for ParNew: 135 ms for 8 >>>>>>>>>>>>>>>>>> collections, >>>>>>>>>>>>>>>>>> 4402220568 used; max is 8000634880 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> iostat shows virtually no I/O. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Compaction may enter into this, but i don't really know >>>>>>>>>>>>>>>>>> what to make of compaction stats since they never change: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [root@cassandra-37919c3a ~]# nodetool compactionstats >>>>>>>>>>>>>>>>>> pending tasks: 10 >>>>>>>>>>>>>>>>>> compaction type keyspace table >>>>>>>>>>>>>>>>>> completed total unit progress >>>>>>>>>>>>>>>>>> Compaction mediamedia_tracks_raw >>>>>>>>>>>>>>>>>> 271651482 563615497 bytes 48.20% >>>>>>>>>>>>>>>>>> Compaction mediamedia_tracks_raw >>>>>>>>>>>>>>>>>> 30308910 21676695677 bytes 0.14% >>>>>>>>>>>>>>>>>> Compaction mediamedia_tracks_raw >>>>>>>>>>>>>>>>>> 1198384080 1815603161 bytes 66.00% >>>>>>>>>>>>>>>>>> Active compaction remaining time : 0h22m24s >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 5 minutes later: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> [root@cassandra-37919c3a ~]# nodetool compactionstats >>>>>>>>>>>>>>>>>> pending tasks: 9 >>>>>>>>>>>>>>>>>> compaction type keyspace table >>>>>>>>>>>>>>>>>> completed total unit progress >>>>>>>>>>>>>>>>>> Compaction mediamedia_tracks_raw >>>>>>>>>>>>>>>>>> 271651482 563615497 bytes 48.20% >>>>>>>>>>>>>>>>>> Compaction mediamedia_tracks_raw >>>>>>>>>>>>>>>>>> 30308910 21676695677 bytes 0.14% >>>>>>>>>>>>>>>>>> Compaction mediamedia_tracks_raw >>>>>>>>>>>>>>>>>> 1198384080 1815603161 bytes 66.00% >>>>>>>>>>>>>>>>>> Active compaction remaining time : 0h22m24s >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sure the pending tasks went down by one, but the rest is >>>>>>>>>>>>>>>>>> identical. media_tracks_raw likely has a bunch of tombstones >>>>>>>>>>>>>>>>>> (can't figure >>>>>>>>>>>>>>>>>> out how to get stats on that). >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Is this behavior something that indicates that i need >>>>>>>>>>>>>>>>>> more Heap, larger new generation? Should I be manually >>>>>>>>>>>>>>>>>> running compaction >>>>>>>>>>>>>>>>>> on tables with lots of tombstones? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Any suggestions or places to educate myself better on >>>>>>>>>>>>>>>>>> performance tuning would be appreciated. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> arne >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>> [image: datastax_logo.png] <http://www.datastax.com/> >>>>>>>>>>>>>>> Ryan Svihla >>>>>>>>>>>>>>> Solution Architect >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [image: twitter.png] <https://twitter.com/foundev> [image: >>>>>>>>>>>>>>> linkedin.png] >>>>>>>>>>>>>>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> DataStax is the fastest, most scalable distributed database >>>>>>>>>>>>>>> technology, delivering Apache Cassandra to the world’s most >>>>>>>>>>>>>>> innovative >>>>>>>>>>>>>>> enterprises. Datastax is built to be agile, always-on, and >>>>>>>>>>>>>>> predictably >>>>>>>>>>>>>>> scalable to any size. With more than 500 customers in 45 >>>>>>>>>>>>>>> countries, DataStax >>>>>>>>>>>>>>> is the database technology and transactional backbone of choice >>>>>>>>>>>>>>> for the >>>>>>>>>>>>>>> worlds most innovative companies such as Netflix, Adobe, >>>>>>>>>>>>>>> Intuit, and eBay. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> -- >>>>>>>>>>> [image: datastax_logo.png] <http://www.datastax.com/> >>>>>>>>>>> Ryan Svihla >>>>>>>>>>> Solution Architect >>>>>>>>>>> >>>>>>>>>>> [image: twitter.png] <https://twitter.com/foundev> [image: >>>>>>>>>>> linkedin.png] >>>>>>>>>>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> >>>>>>>>>>> >>>>>>>>>>> DataStax is the fastest, most scalable distributed database >>>>>>>>>>> technology, delivering Apache Cassandra to the world’s most >>>>>>>>>>> innovative >>>>>>>>>>> enterprises. Datastax is built to be agile, always-on, and >>>>>>>>>>> predictably >>>>>>>>>>> scalable to any size. With more than 500 customers in 45 countries, >>>>>>>>>>> DataStax >>>>>>>>>>> is the database technology and transactional backbone of choice for >>>>>>>>>>> the >>>>>>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, >>>>>>>>>>> and eBay. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>>>> -- >>>>>>>>> [image: datastax_logo.png] <http://www.datastax.com/> >>>>>>>>> Ryan Svihla >>>>>>>>> Solution Architect >>>>>>>>> >>>>>>>>> [image: twitter.png] <https://twitter.com/foundev> [image: >>>>>>>>> linkedin.png] >>>>>>>>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> >>>>>>>>> >>>>>>>>> DataStax is the fastest, most scalable distributed database >>>>>>>>> technology, delivering Apache Cassandra to the world’s most innovative >>>>>>>>> enterprises. Datastax is built to be agile, always-on, and predictably >>>>>>>>> scalable to any size. With more than 500 customers in 45 countries, >>>>>>>>> DataStax >>>>>>>>> is the database technology and transactional backbone of choice for >>>>>>>>> the >>>>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and >>>>>>>>> eBay. >>>>>>>>> >>>>>>>>> >>>>>>> >>>>>>> -- >>>>>>> [image: datastax_logo.png] <http://www.datastax.com/> >>>>>>> Ryan Svihla >>>>>>> Solution Architect >>>>>>> >>>>>>> [image: twitter.png] <https://twitter.com/foundev> [image: >>>>>>> linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> >>>>>>> >>>>>>> DataStax is the fastest, most scalable distributed database >>>>>>> technology, delivering Apache Cassandra to the world’s most innovative >>>>>>> enterprises. Datastax is built to be agile, always-on, and predictably >>>>>>> scalable to any size. With more than 500 customers in 45 countries, >>>>>>> DataStax >>>>>>> is the database technology and transactional backbone of choice for the >>>>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and >>>>>>> eBay. >>>>>>> >>>>>>> >>>>> >>>>> -- >>>>> [image: datastax_logo.png] <http://www.datastax.com/> >>>>> Ryan Svihla >>>>> Solution Architect >>>>> >>>>> [image: twitter.png] <https://twitter.com/foundev> [image: >>>>> linkedin.png] <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> >>>>> >>>>> DataStax is the fastest, most scalable distributed database >>>>> technology, delivering Apache Cassandra to the world’s most innovative >>>>> enterprises. Datastax is built to be agile, always-on, and predictably >>>>> scalable to any size. With more than 500 customers in 45 countries, >>>>> DataStax >>>>> is the database technology and transactional backbone of choice for the >>>>> worlds most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>>>> >>>>> >>> >>> -- >>> [image: datastax_logo.png] <http://www.datastax.com/> >>> Ryan Svihla >>> Solution Architect >>> >>> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png] >>> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> >>> >>> DataStax is the fastest, most scalable distributed database technology, >>> delivering Apache Cassandra to the world’s most innovative enterprises. >>> Datastax is built to be agile, always-on, and predictably scalable to any >>> size. With more than 500 customers in 45 countries, DataStax is the >>> database technology and transactional backbone of choice for the worlds >>> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >>> >>> >>> >> >> -- >> [image: datastax_logo.png] <http://www.datastax.com/> >> Ryan Svihla >> Solution Architect >> >> [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png] >> <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> >> >> DataStax is the fastest, most scalable distributed database technology, >> delivering Apache Cassandra to the world’s most innovative enterprises. >> Datastax is built to be agile, always-on, and predictably scalable to any >> size. With more than 500 customers in 45 countries, DataStax is the >> database technology and transactional backbone of choice for the worlds >> most innovative companies such as Netflix, Adobe, Intuit, and eBay. >> >> >> > > -- > [image: datastax_logo.png] <http://www.datastax.com/> > Ryan Svihla > Solution Architect > > [image: twitter.png] <https://twitter.com/foundev> [image: linkedin.png] > <http://www.linkedin.com/pub/ryan-svihla/12/621/727/> > > DataStax is the fastest, most scalable distributed database technology, > delivering Apache Cassandra to the world’s most innovative enterprises. > Datastax is built to be agile, always-on, and predictably scalable to any > size. With more than 500 customers in 45 countries, DataStax is the > database technology and transactional backbone of choice for the worlds > most innovative companies such as Netflix, Adobe, Intuit, and eBay. > > >