aaron morton <aaron <at> thelastpickle.com> writes:

> 
> The cluster is running into GC problems and this is slowing it down under the 
stress test. When it slows down one or more of the nodes is failing to perform 
the write within rpc_timeout . This causes the coordinator of the write to 
raise 
the TimedOutException. 
> You options are:
> 
> * allocate more memory
> * ease back on the stress test. 
> * work as a CL QUORUM so that one node failing does result in the error. 
> 
> see also http://wiki.apache.org/cassandra/FAQ#slows_down_after_lotso_inserts
> 
> 
> Cheers
>  
> 
> 
> 
> 
> 
> -----------------
> Aaron Morton
> Freelance Developer
>  <at> aaronmorton
> http://www.thelastpickle.com
> 
> 
> 
> On 28/05/2012, at 12:59 PM, Jason Tang wrote:
> Hi
> My system is 4 nodes 64 bit cassandra cluster, 6G big per node,default 
configuration (which means 1/3 heap for memtable), replicate number 3, write 
all, read one.
> When I run stress load testing, I got this TimedOutException, and some 
operation failed, and all traffic hang for a while. 
> 
> And when I have 1G memory 32 bit cassandra on standalone model, I didn't find 
so frequently "Stop the world" behavior.
> 
> So I wonder what kind of operation will hang the cassandra system. 
> 
> 
> How to collect information for tuning.
> 
> From the system log and document, I guess there are three type operations:
> 1) Flush memtable when meet max size
> 
> 2) Compact SSTable (why?)
> 3) Java GC
> 
> system.log:
> 
>  INFO [main] 2012-05-25 16:12:17,054 ColumnFamilyStore.java (line 688) 
Enqueuing flush of Memtable-LocationInfo <at> 1229893321(53/66 serialized/live 
bytes, 2 ops)
>  INFO [FlushWriter:1] 2012-05-25 16:12:17,054 Memtable.java (line 239) 
> Writing 
Memtable-LocationInfo <at> 1229893321(53/66 serialized/live bytes, 2 ops)
>  INFO [FlushWriter:1] 2012-05-25 16:12:17,166 Memtable.java (line 275) 
Completed flushing /var/proclog/raw/cassandra/data/system/LocationInfo-hb-2-
Data.db (163 bytes)
> 
> ...
> 
>  INFO [CompactionExecutor:441] 2012-05-28 08:02:55,345 CompactionTask.java 
(line 112) Compacting 
[SSTableReader(path='/var/proclog/raw/cassandra/data/myks/queue-hb-41-Data.db'),
 
SSTableReader(path='/var/proclog/raw/cassandra/data/
> myks /queue-hb-32-Data.db'), 
SSTableReader(path='/var/proclog/raw/cassandra/data/
> myks /queue-hb-37-Data.db'), 
SSTableReader(path='/var/proclog/raw/cassandra/data/
> myks /queue-hb-53-Data.db')]
> ...
> 
> 
>  WARN [ScheduledTasks:1] 2012-05-28 08:02:26,619 GCInspector.java (line 146) 
Heap is 0.7993011015621736 full.  You may need to reduce memtable and/or cache 
sizes.  Cassandra will now flush up to the two largest memtables to free up 
memory.  Adjust flush_largest_memtables_at threshold in cassandra.yaml if you 
don't want Cassandra to do this automatically
>  INFO [ScheduledTasks:1] 2012-05-28 08:02:54,980 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 728 ms for 2 collections, 3594946600 used; max is 
6274678784
>  INFO [ScheduledTasks:1] 2012-05-28 08:41:34,030 GCInspector.java (line 123) 
GC for ParNew: 1668 ms for 1 collections, 4171503448 used; max is 6274678784
>  INFO [ScheduledTasks:1] 2012-05-28 08:41:48,978 GCInspector.java (line 123) 
GC for ParNew: 1087 ms for 1 collections, 2623067496 used; max is 6274678784
>  INFO [ScheduledTasks:1] 2012-05-28 08:41:48,987 GCInspector.java (line 123) 
GC for ConcurrentMarkSweep: 3198 ms for 3 collections, 2623361280 used; max is 
6274678784
> 
> 
> 
> Timeout Exception:
> 
> Caused by: org.apache.cassandra.thrift.TimedOutException: null
>         at 
org.apache.cassandra.thrift.Cassandra$batch_mutate_result.read(Cassandra.java:19
495) ~[na:na]
>         at 
org.apache.cassandra.thrift.Cassandra$Client.recv_batch_mutate(Cassandra.java:10
35) ~[na:na]
>         at 
org.apache.cassandra.thrift.Cassandra$Client.batch_mutate(Cassandra.java:1009) ~
[na:na]
>         at 
me.prettyprint.cassandra.service.KeyspaceServiceImpl$1.execute(KeyspaceServiceIm
pl.java:95) ~[na:na]
>         ... 64 common frames omitted
> 
> 
> BRs
> //Tang Weiqiang
> 
> 
> 
> 
> 
> 
> 

Hi, I've been running into the same type of issue, but on a single machine with 
CL ONE. Also a custom insertion stress utility. What would I need to do to 
address the timeouts? By allocate more memory do you mean increase heap size in 
the environment conf file?

Thanks,
J.





Reply via email to