There is one peculiar thing about this problem When I copy the cluster to other 2 servers (much weaker servers in means of CPU and RAM) and run cassandra there, I see no massive CPU load and no huge GC times. Thus I suppose it has something to do with hardware or software installed on the machines. I plan to check software and hardware issues step-by-step to find that difference between servers which brings big problems to one of (otherwise identical) clusters.
What I plan to compare between 'bad' cluster and 'good' cluster: - Configs, schemas, data etc: same - java version : same - RAM and CPU : 'bad' cluster has more - Ubuntu version - Networking - What else??? 26.07.2012, 10:25, "Nikolay Kоvshov" <nkovs...@yandex.ru>: > Sun Java 6 didn't help it at all > > Sar shows no special activity on the long GC times > > And I have remembered that I do a lot of get_range_slices requests, each time > on a range of 100, if this could be important. > > Still my GC times are growing incredible > > INFO [ScheduledTasks:1] 2012-07-25 12:17:04,029 GCInspector.java (line 122) > GC for ParNew: 497253 ms for 1 collections, 114036128 used; max is 8464105472 > INFO [ScheduledTasks:1] 2012-07-25 13:26:21,282 GCInspector.java (line 122) > GC for ParNew: 149508 ms for 1 collections, 87015440 used; max is 8464105472 > INFO [ScheduledTasks:1] 2012-07-25 18:39:44,654 GCInspector.java (line 122) > GC for ParNew: 83584 ms for 1 collections, 54368032 used; max is 8464105472 > INFO [ScheduledTasks:1] 2012-07-25 22:45:42,448 GCInspector.java (line 122) > GC for ParNew: 48454 ms for 1 collections, 44072488 used; max is 8464105472 > > 24.07.2012, 13:07, "Joost van de Wijgerd" <jwijg...@gmail.com>: > >> You are better off using Sun Java 6 to run Cassandra. In the past >> there were issues reported on 7. Can you try running it on Sun Java 6? >> >> kind regards >> >> Joost >> >> On Tue, Jul 24, 2012 at 10:04 AM, Nikolay Kоvshov <nkovs...@yandex.ru> >> wrote: >>> 48 G of Ram on that machine, swap is not used. I will disable swap at all >>> just in case >>> I have 4 cassandra processes (parts of 4 different clusters), each >>> allocated 8 GB and using 4 of them >>>> java -version >>> java version "1.7.0" >>> Java(TM) SE Runtime Environment (build 1.7.0-b147) >>> Java HotSpot(TM) 64-Bit Server VM (build 21.0-b17, mixed mode) >>> >>> 23.07.2012, 20:12, "Joost van de Wijgerd" <jwijg...@gmail.com>: >>>> Howmuch memory do you have on the machine. Seems like you have 8G >>>> reserved for the Cassandra java process, If this is all the memory on >>>> the machine you might be swapping. Also which jvm do you use? >>>> >>>> kind regards >>>> >>>> Joost >>>> >>>> On Mon, Jul 23, 2012 at 10:07 AM, Nikolay Kоvshov <nkovs...@yandex.ru> >>>> wrote: >>>>> 21th I have mirgated to cassandra 1.1.2 but see no improvement >>>>> >>>>> cat /var/log/cassandra/Earth1.log | grep "GC for" >>>>> INFO [ScheduledTasks:1] 2012-05-22 17:42:48,445 GCInspector.java (line >>>>> 123) GC for ParNew: 345 ms for 1 collections, 82451888 used; max is >>>>> 8464105472 >>>>> INFO [ScheduledTasks:1] 2012-05-23 02:47:13,911 GCInspector.java (line >>>>> 123) GC for ParNew: 312 ms for 1 collections, 110617416 used; max is >>>>> 8464105472 >>>>> INFO [ScheduledTasks:1] 2012-05-23 11:57:54,317 GCInspector.java (line >>>>> 123) GC for ParNew: 298 ms for 1 collections, 98161920 used; max is >>>>> 8464105472 >>>>> INFO [ScheduledTasks:1] 2012-07-02 08:52:37,019 GCInspector.java (line >>>>> 123) GC for ParNew: 196886 ms for 1 collections, 2310058496 used; max is >>>>> 8464105472 >>>>> INFO [ScheduledTasks:1] 2012-07-16 17:41:25,940 GCInspector.java (line >>>>> 123) GC for ParNew: 200146 ms for 1 collections, 2345987088 used; max is >>>>> 8464105472 >>>>> === Migrated from 1.0.0 to 1.1.2 >>>>> INFO [ScheduledTasks:1] 2012-07-21 09:05:08,280 GCInspector.java (line >>>>> 122) GC for ParNew: 282 ms for 1 collections, 466406864 used; max is >>>>> 8464105472 >>>>> INFO [ScheduledTasks:1] 2012-07-21 12:38:43,132 GCInspector.java (line >>>>> 122) GC for ParNew: 233 ms for 1 collections, 405269504 used; max is >>>>> 8464105472 >>>>> INFO [ScheduledTasks:1] 2012-07-22 02:29:09,596 GCInspector.java (line >>>>> 122) GC for ParNew: 253 ms for 1 collections, 389700768 used; max is >>>>> 8464105472 >>>>> INFO [ScheduledTasks:1] 2012-07-22 17:45:46,357 GCInspector.java (line >>>>> 122) GC for ParNew: 57391 ms for 1 collections, 400083984 used; max is >>>>> 8464105472 >>>>> >>>>> Memory and yaml memory-related settings are default >>>>> I do not do deletes >>>>> I have 2 CF's and no secondary indexes >>>>> >>>>> LiveRatio's: >>>>> INFO [pool-1-thread-1] 2012-06-09 02:36:07,759 Memtable.java (line >>>>> 177) CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 >>>>> (just-counted was 1.0). calculation took 85ms for 6236 columns >>>>> INFO [MemoryMeter:1] 2012-07-21 09:04:47,614 Memtable.java (line 213) >>>>> CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 >>>>> (just-counted was 1.0). calculation took 8ms for 1 columns >>>>> INFO [MemoryMeter:1] 2012-07-21 09:04:51,012 Memtable.java (line 213) >>>>> CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 >>>>> (just-counted was 1.0). calculation took 99ms for 1094 columns >>>>> INFO [MemoryMeter:1] 2012-07-21 09:04:51,331 Memtable.java (line 213) >>>>> CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 >>>>> (just-counted was 1.0). calculation took 80ms for 242 columns >>>>> INFO [MemoryMeter:1] 2012-07-21 09:04:51,856 Memtable.java (line 213) >>>>> CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 >>>>> (just-counted was 1.0). calculation took 505ms for 2678 columns >>>>> INFO [MemoryMeter:1] 2012-07-21 09:04:52,881 Memtable.java (line 213) >>>>> CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 >>>>> (just-counted was 1.0). calculation took 776ms for 5236 columns >>>>> INFO [MemoryMeter:1] 2012-07-21 09:04:52,945 Memtable.java (line 213) >>>>> CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 >>>>> (just-counted was 1.0). calculation took 64ms for 389 columns >>>>> INFO [MemoryMeter:1] 2012-07-21 09:04:55,162 Memtable.java (line 213) >>>>> CFS(Keyspace='Keyspace1', ColumnFamily='PSS') liveRatio is 1.0 >>>>> (just-counted was 1.0). calculation took 1378ms for 8948 columns >>>>> INFO [MemoryMeter:1] 2012-07-21 09:04:55,304 Memtable.java (line 213) >>>>> CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is 1.0 >>>>> (just-counted was 1.0). calculation took 140ms for 1082 columns >>>>> INFO [MemoryMeter:1] 2012-07-21 09:05:08,439 Memtable.java (line 213) >>>>> CFS(Keyspace='Keyspace1', ColumnFamily='Standard1') liveRatio is >>>>> 2.5763038186160894 (just-counted was 2.5763038186160894). calculation >>>>> took 8796ms for 102193 columns >>>>> >>>>> 18.07.2012, 07:51, "aaron morton" <aa...@thelastpickle.com>: >>>>>> Assuming all the memory and yaml settings default that does not sound >>>>>> right. The first thought would be the memory meter not counting >>>>>> correctly... >>>>>> Do you do a lot of deletes ? >>>>>> Do you have a lot of CF's and/or secondary indexes ? >>>>>> Can you see log lines about the "liveRatio" for your cf's ? >>>>>> I would upgrade to 1.0.10 before getting too carried away though. >>>>>> Cheers >>>>>> >>>>>> ----------------- >>>>>> Aaron Morton >>>>>> Freelance Developer >>>>>> @aaronmorton >>>>>> http://www.thelastpickle.com >>>>>> >>>>>> On 17/07/2012, at 8:14 PM, Nikolay Kоvshov wrote: >>>>>>> This is a cluster of 2 nodes, each having 8G of operating memory, >>>>>>> replicationfactor=2 >>>>>>> Write/read pressure is quite low and almost never exceeds 10/second >>>>>>> >>>>>>> From time to time (2-3 times in a month) I see GC activity in logs >>>>>>> and for this time cassandra stops responding to requests which results >>>>>>> in a timeout in upper-layer application. Total time of unavailability >>>>>>> can be over 5 minues (like in the following case) >>>>>>> >>>>>>> What can I do with that? Wiil it become much worse when my cluster >>>>>>> grows up? >>>>>>> >>>>>>> INFO [GossipTasks:1] 2012-07-16 13:10:50,055 Gossiper.java (line >>>>>>> 736) InetAddress /10.220.50.9 is now dead. >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,056 GCInspector.java >>>>>>> (line 123) GC for ParNew: 391383 ms for 1 collections, 2025808488 used; >>>>>>> max is 8464105472 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,067 StatusLogger.java >>>>>>> (line 50) Pool Name Active Pending Blocked >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,071 StatusLogger.java >>>>>>> (line 65) ReadStage 0 0 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,071 StatusLogger.java >>>>>>> (line 65) RequestResponseStage 0 0 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,072 StatusLogger.java >>>>>>> (line 65) ReadRepairStage 0 0 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,072 StatusLogger.java >>>>>>> (line 65) MutationStage 0 0 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,073 StatusLogger.java >>>>>>> (line 65) ReplicateOnWriteStage 0 0 0 >>>>>>> INFO [GossipStage:1] 2012-07-16 13:10:50,074 Gossiper.java (line >>>>>>> 722) InetAddress /10.220.50.9 is now UP >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,075 StatusLogger.java >>>>>>> (line 65) GossipStage 1 59 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,075 StatusLogger.java >>>>>>> (line 65) AntiEntropyStage 0 0 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,076 StatusLogger.java >>>>>>> (line 65) MigrationStage 0 0 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,076 StatusLogger.java >>>>>>> (line 65) StreamStage 0 0 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,077 StatusLogger.java >>>>>>> (line 65) MemtablePostFlusher 0 0 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,077 StatusLogger.java >>>>>>> (line 65) FlushWriter 0 0 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,077 StatusLogger.java >>>>>>> (line 65) MiscStage 0 0 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,078 StatusLogger.java >>>>>>> (line 65) InternalResponseStage 0 0 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,078 StatusLogger.java >>>>>>> (line 65) HintedHandoff 0 0 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,079 StatusLogger.java >>>>>>> (line 69) CompactionManager n/a 0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,079 StatusLogger.java >>>>>>> (line 81) MessagingService n/a 0,0 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,080 StatusLogger.java >>>>>>> (line 85) ColumnFamily Memtable ops,data Row cache >>>>>>> size/cap Key cache size/cap >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,080 StatusLogger.java >>>>>>> (line 88) Keyspace1.PSS 36712,343842617 >>>>>>> 0/0 97995/1000000 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,085 StatusLogger.java >>>>>>> (line 88) Keyspace1.Standard1 128679,162567721 >>>>>>> 0/0 0/1000000 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,085 StatusLogger.java >>>>>>> (line 88) system.NodeIdInfo 0,0 >>>>>>> 0/0 0/1 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,086 StatusLogger.java >>>>>>> (line 88) system.IndexInfo 0,0 >>>>>>> 0/0 0/1 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,086 StatusLogger.java >>>>>>> (line 88) system.LocationInfo 0,0 >>>>>>> 0/0 1/2 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,086 StatusLogger.java >>>>>>> (line 88) system.Migrations 0,0 >>>>>>> 0/0 3/3 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,087 StatusLogger.java >>>>>>> (line 88) system.HintsColumnFamily 1,528 >>>>>>> 0/0 0/1 >>>>>>> INFO [ScheduledTasks:1] 2012-07-16 13:10:50,087 StatusLogger.java >>>>>>> (line 88) system.Schema 0,0 >>>>>>> 0/0 2/3 >>>>>>> -------- Завершение пересылаемого сообщения -------- >>>> -- >>>> Joost van de Wijgerd >>>> Visseringstraat 21B >>>> 1051KH Amsterdam >>>> +31624111401 >>>> joost.van.de.wijgerd@Skype >>>> http://www.linkedin.com/in/jwijgerd >> -- >> Joost van de Wijgerd >> Visseringstraat 21B >> 1051KH Amsterdam >> +31624111401 >> joost.van.de.wijgerd@Skype >> http://www.linkedin.com/in/jwijgerd