I've seen the same behaviour (SLOW ephemeral disk) a few times. You can't do anything with a single slow disk except not using it. Our solution was always: Replace the m1.xlarge instance asap and everything is good. -Rudolf.
On 31.03.2013, at 18:58, Alexis Lê-Quôc wrote: > Alain, > > Can you post your mdadm --detail /dev/md0 output here as well as your iostat > -x -d when that happens. A bad ephemeral drive on EC2 is not unheard of. > > Alexis | @alq | http://datadog.com > > P.S. also, disk utilization is not a reliable metric, iostat's await and > svctm are more useful imho. > > > On Sun, Mar 31, 2013 at 6:03 AM, aaron morton <aa...@thelastpickle.com> wrote: >> Ok, if you're going to look into it, please keep me/us posted. > > It's not on my radar. > > Cheers > > ----------------- > Aaron Morton > Freelance Cassandra Consultant > New Zealand > > @aaronmorton > http://www.thelastpickle.com > > On 28/03/2013, at 2:43 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: > >> Ok, if you're going to look into it, please keep me/us posted. >> >> It happen twice for me, the same day, within a few hours on the same node >> and only happened to 1 node out of 12, making this node almost unreachable. >> >> >> 2013/3/28 aaron morton <aa...@thelastpickle.com> >> I noticed this on an m1.xlarge (cassandra 1.1.10) instance today as well, 1 >> or 2 disks in a raid 0 running at 85 to 100% the others 35 to 50ish. >> >> Have not looked into it. >> >> Cheers >> >> ----------------- >> Aaron Morton >> Freelance Cassandra Consultant >> New Zealand >> >> @aaronmorton >> http://www.thelastpickle.com >> >> On 26/03/2013, at 11:57 PM, Alain RODRIGUEZ <arodr...@gmail.com> wrote: >> >>> We use C* on m1.xLarge AWS EC2 servers, with 4 disks xvdb, xvdc, xvdd, xvde >>> parts of a logical Raid0 (md0). >>> >>> I use to see their use increasing in the same way. This morning there was a >>> normal minor compaction followed by messages dropped on one node (out of >>> 12). >>> >>> Looking closely at this node I saw the following: >>> >>> http://img69.imageshack.us/img69/9425/opscenterweirddisk.png >>> >>> On this node, one of the four disks (xvdd) started working hardly while >>> other worked less intensively. >>> >>> This is quite weird since I always saw this 4 disks being used the exact >>> same way at every moment (as you can see on 5 other nodes or when the node >>> ".239" come back to normal). >>> >>> Any idea on what happened and on how it can be avoided ? >>> >>> Alain >> >> > >