Hi,

We have a Cassandra cluster of 28 nodes. Each one is an EC2 m1.xLarge based
on datastax AMI with 4 storage in raid0 mode.

Here is the ticket we opened with amazon support :

"This raid is created using the datastax public AMI : ami-b2212dc6. Sources
are also available here : https://github.com/riptano/ComboAMI

As you can see in the screenshot attached (
http://imageshack.com/a/img854/4592/xbqc.jpg)  randomly but frequently one
of the storage get fully used (100%) but 3 others are standing in low use.

Because of this, the node becomes slow and the whole cassandra cluster is
impacted. We are losing data due to writes fails and availability for our
customers.

it was in this state for one hour, and we decided to restart it.

We already removed 3 other instances because of this same issue."
(see other screenshots)
http://imageshack.com/a/img824/2391/s7q3.jpg
http://imageshack.com/a/img10/556/zzk8.jpg

Amazon support took a close look at the instance as well as it's underlying
hardware for any potential health issues and both seem to be healthy.

Have someone already experienced something like this ?

Should I contact the AMI author better?

Thanks a lot,

Philippe.

Reply via email to