As a follow up to this conversation; we are still having issues with our
Cassandra cluster on EC2.

It *looks* to be related to Garbage Collection; however we aren't sure what
the root cause of the problem is. Here is an extract from logs:

 INFO [GMFD:1] 2010-09-20 15:22:00,242 Gossiper.java (line 578) InetAddress
/10.102.57.197 is now UP
 INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:22:00,242
HintedHandOffManager.java (line 165) Started hinted handoff for endPoint /
10.102.57.197
 INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:22:00,247
HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to
endpoint /10.102.57.197
 INFO [GC inspection] 2010-09-20 15:27:42,046 GCInspector.java (line 129) GC
for ParNew: 325411 ms, 84284896 reclaimed leaving 640770336 used; max is
25907560448
 INFO [WRITE-/10.102.7.187] 2010-09-20 15:27:42,052
OutboundTcpConnection.java (line 105) error writing to /10.102.7.187
 INFO [GC inspection] 2010-09-20 15:27:42,082 GCInspector.java (line 150)
Pool Name                    Active   Pending
 INFO [GC inspection] 2010-09-20 15:27:42,083 GCInspector.java (line 156)
STREAM-STAGE                      0         0
 INFO [GC inspection] 2010-09-20 15:27:42,083 GCInspector.java (line 156)
RESPONSE-STAGE                    0         3
 INFO [GC inspection] 2010-09-20 15:27:42,083 GCInspector.java (line 156)
ROW-READ-STAGE                    6        25
 INFO [GC inspection] 2010-09-20 15:27:42,084 GCInspector.java (line 156)
LB-OPERATIONS                     0         0
 INFO [GC inspection] 2010-09-20 15:27:42,084 GCInspector.java (line 156)
MISCELLANEOUS-POOL                0         0
 INFO [GC inspection] 2010-09-20 15:27:42,084 GCInspector.java (line 156)
GMFD                              1       129
 INFO [GC inspection] 2010-09-20 15:27:42,084 GCInspector.java (line 156)
CONSISTENCY-MANAGER               0         0
 INFO [GC inspection] 2010-09-20 15:27:42,085 GCInspector.java (line 156)
LB-TARGET                         0         0
 INFO [GC inspection] 2010-09-20 15:27:42,085 GCInspector.java (line 156)
ROW-MUTATION-STAGE                1         1
 INFO [GC inspection] 2010-09-20 15:27:42,085 GCInspector.java (line 156)
MESSAGE-STREAMING-POOL            0         0
 INFO [GC inspection] 2010-09-20 15:27:42,086 GCInspector.java (line 156)
LOAD-BALANCER-STAGE               0         0
 INFO [GC inspection] 2010-09-20 15:27:42,086 GCInspector.java (line 156)
FLUSH-SORTER-POOL                 0         0
 INFO [GC inspection] 2010-09-20 15:27:42,086 GCInspector.java (line 156)
MEMTABLE-POST-FLUSHER             0         0
 INFO [GC inspection] 2010-09-20 15:27:42,086 GCInspector.java (line 156)
AE-SERVICE-STAGE                  0         0
 INFO [GC inspection] 2010-09-20 15:27:42,087 GCInspector.java (line 156)
FLUSH-WRITER-POOL                 0         0
 INFO [GC inspection] 2010-09-20 15:27:42,087 GCInspector.java (line 156)
HINTED-HANDOFF-POOL               0         0
 INFO [GC inspection] 2010-09-20 15:27:42,087 GCInspector.java (line 161)
CompactionManager               n/a         0
 INFO [GMFD:1] 2010-09-20 15:27:42,088 Gossiper.java (line 592) Node /
10.102.7.187 has restarted, now UP again
 INFO [GMFD:1] 2010-09-20 15:27:42,089 StorageService.java (line 548) Node /
10.102.7.187 state jump to normal
 INFO [GMFD:1] 2010-09-20 15:27:42,089 StorageService.java (line 555) Will
not change my token ownership to /10.102.7.187
 INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:27:42,089
HintedHandOffManager.java (line 165) Started hinted handoff for endPoint /
10.102.7.187
 INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:27:42,112
HintedHandOffManager.java (line 222) Finished hinted handoff of 0 rows to
endpoint /10.102.7.187
 INFO [WRITE-/10.102.7.187] 2010-09-20 15:27:42,389
OutboundTcpConnection.java (line 105) error writing to /10.102.7.187
 INFO [WRITE-/10.102.57.197] 2010-09-20 15:34:15,911
OutboundTcpConnection.java (line 105) error writing to /10.102.57.197
 INFO [GC inspection] 2010-09-20 15:34:15,924 GCInspector.java (line 129) GC
for ParNew: 372272 ms, 82471240 reclaimed leaving 671616240 used; max is
25907560448
 INFO [GC inspection] 2010-09-20 15:34:15,925 GCInspector.java (line 150)
Pool Name                    Active   Pending
 INFO [GC inspection] 2010-09-20 15:34:15,925 GCInspector.java (line 156)
STREAM-STAGE                      0         0
 INFO [GC inspection] 2010-09-20 15:34:15,926 GCInspector.java (line 156)
RESPONSE-STAGE                    0        21
 INFO [Timer-0] 2010-09-20 15:34:15,926 Gossiper.java (line 180) InetAddress
/10.102.7.187 is now dead.
 INFO [GC inspection] 2010-09-20 15:34:15,926 GCInspector.java (line 156)
ROW-READ-STAGE                    1        85
 INFO [GC inspection] 2010-09-20 15:34:15,926 GCInspector.java (line 156)
LB-OPERATIONS                     0         0
 INFO [GC inspection] 2010-09-20 15:34:15,935 GCInspector.java (line 156)
MISCELLANEOUS-POOL                0         0
 INFO [GC inspection] 2010-09-20 15:34:15,935 GCInspector.java (line 156)
GMFD                              1       548
 INFO [GC inspection] 2010-09-20 15:34:15,935 GCInspector.java (line 156)
CONSISTENCY-MANAGER               0         0
 INFO [GC inspection] 2010-09-20 15:34:15,936 GCInspector.java (line 156)
LB-TARGET                         0         0
 INFO [GC inspection] 2010-09-20 15:34:15,936 GCInspector.java (line 156)
ROW-MUTATION-STAGE                0         0
 INFO [GC inspection] 2010-09-20 15:34:15,936 GCInspector.java (line 156)
MESSAGE-STREAMING-POOL            0         0
 INFO [GC inspection] 2010-09-20 15:34:15,937 GCInspector.java (line 156)
LOAD-BALANCER-STAGE               0         0
 INFO [GC inspection] 2010-09-20 15:34:15,937 GCInspector.java (line 156)
FLUSH-SORTER-POOL                 0         0
 INFO [GC inspection] 2010-09-20 15:34:15,937 GCInspector.java (line 156)
MEMTABLE-POST-FLUSHER             0         0
 INFO [GC inspection] 2010-09-20 15:34:15,938 GCInspector.java (line 156)
AE-SERVICE-STAGE                  0         0
 INFO [GC inspection] 2010-09-20 15:34:15,938 GCInspector.java (line 156)
FLUSH-WRITER-POOL                 0         0
 INFO [GC inspection] 2010-09-20 15:34:15,938 GCInspector.java (line 156)
HINTED-HANDOFF-POOL               0         0
 INFO [GC inspection] 2010-09-20 15:34:15,938 GCInspector.java (line 161)
CompactionManager               n/a         0
 INFO [WRITE-/10.102.57.197] 2010-09-20 15:34:15,931
OutboundTcpConnection.java (line 105) error writing to /10.102.57.197
 INFO [GMFD:1] 2010-09-20 15:34:15,960 Gossiper.java (line 578) InetAddress
/10.102.7.187 is now UP
 INFO [GMFD:1] 2010-09-20 15:34:15,961 Gossiper.java (line 592) Node /
10.102.57.197 has restarted, now UP again
 INFO [GMFD:1] 2010-09-20 15:34:15,961 StorageService.java (line 548) Node /
10.102.57.197 state jump to normal
 INFO [GMFD:1] 2010-09-20 15:34:15,961 StorageService.java (line 555) Will
not change my token ownership to /10.102.57.197
 INFO [HINTED-HANDOFF-POOL:1] 2010-09-20 15:34:15,961
HintedHandOffManager.java (line 165) Started hinted handoff for endPoint /
10.102.7.187


Current thoughts - launching a GC for ParNew at 15:27:42, which takes 325411
ms, followed (almost immediately) by another at 15:34:15 which takes 372272
ms is probably sub-optimal.

 INFO [GC inspection] 2010-09-20 15:27:42,046 GCInspector.java (line 129) GC
for ParNew: 325411 ms, 84284896 reclaimed leaving 640770336 used; max is
25907560448
 INFO [GC inspection] 2010-09-20 15:34:15,924 GCInspector.java (line 129) GC
for ParNew: 372272 ms, 82471240 reclaimed leaving 671616240 used; max is
25907560448

Can anyone help shed any light on why this might be happening? We've tried a
variety of JVM settings to alleviate this; currently with no luck.

Dave



On 17 September 2010 17:04, Jedd Rashbrooke <jedd.rashbro...@imagini.net>wrote:

>  Hi Rob,
>
>  Thanks for your suggestions.  I should have been a bit more verbose
>  in my platform description -- I'm using 64-bit instances, which I think
>  in a Ben Black video I saw led to a sensible default usage of mmap
>  when left at auto.  Should I look at forcing this setting?
>
> > You don't mention which version of the deb package you're using, but :
>
>  I'm using 0.6.5 - the ones bundled by Eric Evans <eev...@apache.org>
>
>  Because these are 32GB machines, I've not configured them with
>  any swap at all.  I've rarely done this in the past - but was aware
>  there was this swap-hell scenario with JVM's, and the rationale
>  makes sense -- it's better to have the JVM crash ''cleanly" than
>  to have it grind the machine to a halt and make it impossible
>  to get onto the machine to kill the process.
>
>  cheers,
>  Jedd.
>

Reply via email to