Alain,

Can you post your mdadm --detail /dev/md0 output here as well as your
iostat -x -d when that happens. A bad ephemeral drive on EC2 is not unheard
of.

Alexis | @alq | http://datadog.com

P.S. also, disk utilization is not a reliable metric, iostat's await and
svctm are more useful imho.


On Sun, Mar 31, 2013 at 6:03 AM, aaron morton <aa...@thelastpickle.com>wrote:

> Ok, if you're going to look into it, please keep me/us posted.
>
> It's not on my radar.
>
> Cheers
>
>    -----------------
> Aaron Morton
> Freelance Cassandra Consultant
> New Zealand
>
> @aaronmorton
> http://www.thelastpickle.com
>
> On 28/03/2013, at 2:43 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:
>
> Ok, if you're going to look into it, please keep me/us posted.
>
> It happen twice for me, the same day, within a few hours on the same node
> and only happened to 1 node out of 12, making this node almost unreachable.
>
>
> 2013/3/28 aaron morton <aa...@thelastpickle.com>
>
>> I noticed this on an m1.xlarge (cassandra 1.1.10) instance today as well,
>> 1 or 2 disks in a raid 0 running at 85 to 100% the others 35 to 50ish.
>>
>> Have not looked into it.
>>
>> Cheers
>>
>>    -----------------
>> Aaron Morton
>> Freelance Cassandra Consultant
>> New Zealand
>>
>> @aaronmorton
>> http://www.thelastpickle.com
>>
>> On 26/03/2013, at 11:57 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote:
>>
>> We use C* on m1.xLarge AWS EC2 servers, with 4 disks xvdb, xvdc, xvdd,
>> xvde parts of a logical Raid0 (md0).
>>
>> I use to see their use increasing in the same way. This morning there was
>> a normal minor compaction followed by messages dropped on one node (out of
>> 12).
>>
>> Looking closely at this node I saw the following:
>>
>> http://img69.imageshack.us/img69/9425/opscenterweirddisk.png
>>
>> On this node, one of the four disks (xvdd) started working hardly while
>> other worked less intensively.
>>
>> This is quite weird since I always saw this 4 disks being used the exact
>> same way at every moment (as you can see on 5 other nodes or when the node
>> ".239" come back to normal).
>>
>> Any idea on what happened and on how it can be avoided ?
>>
>> Alain
>>
>>
>>
>
>

Reply via email to