Pavel,

In my case, the heap was filling up faster than it was draining. I am still
looking for the cause of it, as I could drain really fast with SSD.

However, in your case you could check (AFAIK) nodetool tpstats and see if
there are too many pending write tasks, for instance. Maybe you really are
writting more than the nodes are able to flush to disk.

How many writes per second are you achieving?

Also, I would look for GCInspector in the log:

cat system.log* | grep GCInspector | wc -l
tail -1000 system.log | grep GCInspector

Do you see it running a lot? Is it taking much more time to run each time
it runs?

I am no Cassandra expert, but I would try these things first and post the
results here. Maybe other people in the list have more ideas.

Best regards,
Marcelo.


2014-06-20 8:50 GMT-03:00 Pavel Kogan <pavel.ko...@cortica.com>:

> The cluster is new, so no updates were done. Version 2.0.8.
> It happened when I did many writes (no reads). Writes are done in small
> batches of 2 inserts (writing to 2 column families). The values are big
> blobs (up to 100Kb).
>
> Any clues?
>
> Pavel
>
>
> On Thu, Jun 19, 2014 at 8:07 PM, Marcelo Elias Del Valle <
> marc...@s1mbi0se.com.br> wrote:
>
>> Pavel,
>>
>> Out of curiosity, did it start to happen before some update? Which
>> version of Cassandra are you using?
>>
>> []s
>>
>>
>> 2014-06-19 16:10 GMT-03:00 Pavel Kogan <pavel.ko...@cortica.com>:
>>
>>> What a coincidence! Today happened in my cluster of 7 nodes as well.
>>>
>>> Regards,
>>>   Pavel
>>>
>>>
>>> On Wed, Jun 18, 2014 at 11:13 AM, Marcelo Elias Del Valle <
>>> marc...@s1mbi0se.com.br> wrote:
>>>
>>>> I have a 10 node cluster with cassandra 2.0.8.
>>>>
>>>> I am taking this exceptions in the log when I run my code. What my code
>>>> does is just reading data from a CF and in some cases it writes new data.
>>>>
>>>>  WARN [Native-Transport-Requests:553] 2014-06-18 11:04:51,391
>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>> [identification1.entity, identification1.entity_lookup] is of size 6165,
>>>> exceeding specified threshold of 5120 by 1045.
>>>>  WARN [Native-Transport-Requests:583] 2014-06-18 11:05:01,152
>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>> [identification1.entity, identification1.entity_lookup] is of size 21266,
>>>> exceeding specified threshold of 5120 by 16146.
>>>>  WARN [Native-Transport-Requests:581] 2014-06-18 11:05:20,229
>>>> BatchStatement.java (line 228) Batch of prepared statements for
>>>> [identification1.entity, identification1.entity_lookup] is of size 22978,
>>>> exceeding specified threshold of 5120 by 17858.
>>>>  INFO [MemoryMeter:1] 2014-06-18 11:05:32,682 Memtable.java (line 481)
>>>> CFS(Keyspace='OpsCenter', ColumnFamily='rollups300') liveRatio is
>>>> 14.249755859375 (just-counted was 9.85302734375).  calculation took 3ms for
>>>> 1024 cells
>>>>
>>>> After some time, one node of the cluster goes down. Then it goes back
>>>> after some seconds and another node goes down. It keeps happening and there
>>>> is always a node down in the cluster, when it goes back another one falls.
>>>>
>>>> The only exceptions I see in the log is "connected reset by the peer",
>>>> which seems to be relative to gossip protocol, when a node goes down.
>>>>
>>>> Any hint of what could I do to investigate this problem further?
>>>>
>>>> Best regards,
>>>> Marcelo Valle.
>>>>
>>>
>>>
>>
>

Reply via email to