Re: [ceph-users] Running on disks that lose their head

2013-11-07 Thread James Pearce
when one head out of ten fails: disks can keep working with the nine remaining heads... some info on this at last in the SATA-IO 3.2 Spec... "Rebuild Assist... Some info on the command set (SAS & SATA implementations): http://www.seagate.com/files/staticfiles/docs/pdf/whitepaper/tp620-1-1110us

Re: [ceph-users] Running on disks that lose their head

2013-11-07 Thread james
On 2013-11-06 09:33, Sage Weil wrote: On Wed, 6 Nov 2013, Loic Dachary wrote: Hi Ceph, People from Western Digital suggested ways to better take advantage of the disk error reporting... when one head out of ten fails : disks can keep working with the nine remaining heads. Losing 1/10 of the

Re: [ceph-users] Running on disks that lose their head

2013-11-07 Thread Kyle Bader
> Zackc, Loicd, and I have been the main participants in a weekly Teuthology > call the past few weeks. We've talked mostly about methods to extend > Teuthology to capture performance metrics. Would you be willing to join us > during the Teuthology and Ceph-Brag sessions at the Firefly Developer >

Re: [ceph-users] Running on disks that lose their head

2013-11-07 Thread Mike Dawson
Thanks, Mike Dawson Co-Founder & Director of Cloud Architecture Cloudapt LLC 6330 East 75th Street, Suite 170 Indianapolis, IN 46250 On 11/7/2013 2:12 PM, Kyle Bader wrote: Once I know a drive has had a head failure, do I trust that the rest of the drive isn't going to go at an inconvenient

Re: [ceph-users] Running on disks that lose their head

2013-11-07 Thread Kyle Bader
>> Once I know a drive has had a head failure, do I trust that the rest of the >> drive isn't going to go at an inconvenient moment vs just fixing it right >> now when it's not 3AM on Christmas morning? (true story) As good as Ceph >> is, do I trust that Ceph is smart enough to prevent spreadin

Re: [ceph-users] Running on disks that lose their head

2013-11-06 Thread Loic Dachary
> Putting my sysadmin hat on: > > Once I know a drive has had a head failure, do I trust that the rest of the > drive isn't going to go at an inconvenient moment vs just fixing it right now > when it's not 3AM on Christmas morning? (true story) As good as Ceph is, do > I trust that Ceph is s

Re: [ceph-users] Running on disks that lose their head

2013-11-06 Thread Loic Dachary
An anonymous kernel developer sends this link: http://en.wikipedia.org/wiki/Error_recovery_control On 06/11/2013 08:32, Loic Dachary wrote: > Hi Ceph, > > People from Western Digital suggested ways to better take advantage of the > disk error reporting. They gave two examples that struck my im

Re: [ceph-users] Running on disks that lose their head

2013-11-06 Thread Mark Nelson
On 11/06/2013 03:33 AM, Sage Weil wrote: On Wed, 6 Nov 2013, Loic Dachary wrote: Hi Ceph, People from Western Digital suggested ways to better take advantage of the disk error reporting. They gave two examples that struck my imagination. First there are errors that look like the disk is dying (

Re: [ceph-users] Running on disks that lose their head

2013-11-06 Thread james
On 2013-11-06 09:33, Sage Weil wrote: This make me think we really need to build or integrate with some generic SMART reporting infrastructure so that we can identify disks that are failing or going to fail. It could be of use especially for SSD devices used for journals. Unfortunately ther

Re: [ceph-users] Running on disks that lose their head

2013-11-06 Thread Sage Weil
On Wed, 6 Nov 2013, Loic Dachary wrote: > Hi Ceph, > > People from Western Digital suggested ways to better take advantage of > the disk error reporting. They gave two examples that struck my > imagination. First there are errors that look like the disk is dying ( > read / write failures ) but

Re: [ceph-users] Running on disks that lose their head

2013-11-05 Thread james
It is cool - and it's interesting that more and more access to the inner workings of the drives would be useful, given ATA controller history (an evolution of the WD1010 MFM controller) having hidden steadily more, to maintain compatibility with the old CHS addressing (later LBA). The streami