Re: [ceph-users] Disk failures

2016-06-15 Thread Christian Balzer
On Wed, 15 Jun 2016 12:46:49 +0200 Gandalf Corvotempesta wrote: > Il 15 giu 2016 09:58, "Christian Balzer" ha scritto > > You _do_ know how and where Ceph/RBD store their data? > > > > Right now that's on disks/SSDs, formated with a file system. > > And XFS or EXT4 will not protect against bitrot

Re: [ceph-users] Disk failures

2016-06-15 Thread Gandalf Corvotempesta
Il 15 giu 2016 09:58, "Christian Balzer" ha scritto > You _do_ know how and where Ceph/RBD store their data? > > Right now that's on disks/SSDs, formated with a file system. > And XFS or EXT4 will not protect against bitrot, while BTRFS and ZFS will. > Wait, I'm new to ceph and some things are no

Re: [ceph-users] Disk failures

2016-06-15 Thread Christian Balzer
On Wed, 15 Jun 2016 09:50:43 +0200 Gandalf Corvotempesta wrote: > Il 15 giu 2016 09:42, "Christian Balzer" ha scritto: > > > > This is why people are using BTRFS and ZFS for filestore (despite the > > problems they in turn create) and why the roadmap for bluestore has > > checksums for reads on i

Re: [ceph-users] Disk failures

2016-06-15 Thread Gandalf Corvotempesta
Il 15 giu 2016 09:42, "Christian Balzer" ha scritto: > > This is why people are using BTRFS and ZFS for filestore (despite the > problems they in turn create) and why the roadmap for bluestore has > checksums for reads on it as well (or so we've been told). Bitrot happens only on files? what abou

Re: [ceph-users] Disk failures

2016-06-15 Thread Christian Balzer
Hello, On Wed, 15 Jun 2016 08:48:57 +0200 Gandalf Corvotempesta wrote: > Il 15 giu 2016 03:27, "Christian Balzer" ha scritto: > > And that makes deep-scrubbing something of quite limited value. > > This is not true. Did you read what I and Jan wrote? > If you checksum *before* writing to dis

Re: [ceph-users] Disk failures

2016-06-14 Thread Gandalf Corvotempesta
Il 15 giu 2016 03:27, "Christian Balzer" ha scritto: > And that makes deep-scrubbing something of quite limited value. This is not true. If you checksum *before* writing to disk (so when data is still in ram) then when reading back from disk you could do the checksum verification and if doesn't m

Re: [ceph-users] Disk failures

2016-06-14 Thread Bill Sharer
This is why I use btrfs mirror sets underneath ceph and hopefully more than make up for the space loss by going with 2 replicas instead of 3 and on the fly lzo compression. The ceph deep scrubs replace any need for btrfs scrubs, but I still get the benefit of self healing when btrfs finds bit

Re: [ceph-users] Disk failures

2016-06-14 Thread Christian Balzer
Hello, On Tue, 14 Jun 2016 14:26:41 +0200 Jan Schermer wrote: > Hi, > bit rot is not "bit rot" per se - nothing is rotting on the drive > platter. Never mind that I used the wrong terminology (according to Wiki) and that my long experience with "laser-rot" probably caused me to choose that ter

Re: [ceph-users] Disk failures

2016-06-14 Thread Jan Schermer
Hi, bit rot is not "bit rot" per se - nothing is rotting on the drive platter. It occurs during reads (mostly, anyway), and it's random. You can happily read a block and get the correct data, then read it again and get garbage, then get correct data again. This could be caused by a worn out cell

Re: [ceph-users] Disk failures

2016-06-09 Thread Gandalf Corvotempesta
2016-06-09 10:28 GMT+02:00 Christian Balzer : > Define "small" cluster. Max 14 OSD nodes with 12 disks each, replica 3. > Your smallest failure domain both in Ceph (CRUSH rules) and for > calculating how much over-provisioning you need should always be the > node/host. > This is the default CRUSH

Re: [ceph-users] Disk failures

2016-06-09 Thread Christian Balzer
Hello, On Thu, 9 Jun 2016 09:59:04 +0200 Gandalf Corvotempesta wrote: > 2016-06-09 9:16 GMT+02:00 Christian Balzer : > > Neither, a journal failure is lethal for the OSD involved and unless > > you have LOTS of money RAID1 SSDs are a waste. > > Ok, so if a journal failure is lethal, ceph automa

Re: [ceph-users] Disk failures

2016-06-09 Thread Gandalf Corvotempesta
2016-06-09 9:16 GMT+02:00 Christian Balzer : > Neither, a journal failure is lethal for the OSD involved and unless you > have LOTS of money RAID1 SSDs are a waste. Ok, so if a journal failure is lethal, ceph automatically remove the affected OSD and start rebalance, right ? > Additionally your c

Re: [ceph-users] Disk failures

2016-06-09 Thread Christian Balzer
Hello, On Thu, 9 Jun 2016 08:43:23 +0200 Gandalf Corvotempesta wrote: > Il 09 giu 2016 02:09, "Christian Balzer" ha scritto: > > Ceph currently doesn't do any (relevant) checksumming at all, so if a > > PRIMARY PG suffers from bit-rot this will be undetected until the next > > deep-scrub. > > >

Re: [ceph-users] Disk failures

2016-06-08 Thread Gandalf Corvotempesta
Il 09 giu 2016 02:09, "Christian Balzer" ha scritto: > Ceph currently doesn't do any (relevant) checksumming at all, so if a > PRIMARY PG suffers from bit-rot this will be undetected until the next > deep-scrub. > > This is one of the longest and gravest outstanding issues with Ceph and > supposed

Re: [ceph-users] Disk failures

2016-06-08 Thread Christian Balzer
Hello, On Wed, 08 Jun 2016 20:26:56 + Krzysztof Nowicki wrote: > Hi, > > śr., 8.06.2016 o 21:35 użytkownik Gandalf Corvotempesta < > gandalf.corvotempe...@gmail.com> napisał: > > > 2016-06-08 20:49 GMT+02:00 Krzysztof Nowicki < > > krzysztof.a.nowi...@gmail.com>: > > > From my own experien

Re: [ceph-users] Disk failures

2016-06-08 Thread list
As long as there hasn't been a change recently Ceph does not store checksums. Deep scrub compares checksums across replicas. See http://lists.ceph.com/pipermail/ceph-users-ceph.com/2013-October/034646.html Am 8. Juni 2016 22:27:46 schrieb Krzysztof Nowicki : Hi, śr., 8.06.2016 o 21:35 u

Re: [ceph-users] Disk failures

2016-06-08 Thread list
Am 8. Juni 2016 22:27:46 schrieb Krzysztof Nowicki : Hi, śr., 8.06.2016 o 21:35 użytkownik Gandalf Corvotempesta < gandalf.corvotempe...@gmail.com> napisał: 2016-06-08 20:49 GMT+02:00 Krzysztof Nowicki < krzysztof.a.nowi...@gmail.com>: > From my own experience with failing HDDs I've seen

Re: [ceph-users] Disk failures

2016-06-08 Thread Krzysztof Nowicki
Hi, śr., 8.06.2016 o 21:35 użytkownik Gandalf Corvotempesta < gandalf.corvotempe...@gmail.com> napisał: > 2016-06-08 20:49 GMT+02:00 Krzysztof Nowicki < > krzysztof.a.nowi...@gmail.com>: > > From my own experience with failing HDDs I've seen cases where the drive > was > > failing silently initia

Re: [ceph-users] Disk failures

2016-06-08 Thread Gandalf Corvotempesta
2016-06-08 20:49 GMT+02:00 Krzysztof Nowicki : > From my own experience with failing HDDs I've seen cases where the drive was > failing silently initially. This manifested itself in repeated deep scrub > failures. Correct me if I'm wrong here, but Ceph keeps checksums of data > being written and in

Re: [ceph-users] Disk failures

2016-06-08 Thread Krzysztof Nowicki
Hi, >From my own experience with failing HDDs I've seen cases where the drive was failing silently initially. This manifested itself in repeated deep scrub failures. Correct me if I'm wrong here, but Ceph keeps checksums of data being written and in case that data is read back corrupted on one of

[ceph-users] Disk failures

2016-06-07 Thread Gandalf Corvotempesta
Hi, How ceph detect and manage disk failures? What happens if some data are wrote on a bad sector? Are there any change to get the bad sector "distributed" across the cluster due to the replication? Is ceph able to remove the OSD bound to the failed disk automatically? __