Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

Mike Lovell Thu, 22 Feb 2018 14:32:53 -0800

hrm. intel has, until a year ago, been very good with ssds. the description
of your experience definitely doesn't inspire confidence. intel also
dropping the entire s3xxx and p3xxx series last year before having a viable
replacement has been driving me nuts.


i don't know that i have the luxury of being able to return all of the ones
i have or just buying replacements. i'm going to need to at least try them
in production. it'll probably happen with the s4600 limited to a particular
fault domain. these are also going to be filestore osds so maybe that will
result in a different behavior. i'll try to post updates as i have them.

mike

On Thu, Feb 22, 2018 at 2:33 PM, David Herselman <d...@syrex.co> wrote:

> Hi Mike,
>
>
>
> I eventually got hold of a customer relations manager at Intel but his
> attitude was lack luster and Intel never officially responded to any
> correspondence we sent them. The Intel s4600 drives all passed our standard
> burn-in tests, they exclusively appear to fail once they handle production
> BlueStore usage, generally after a couple days use.
>
>
>
> Intel really didn’t seem interested, even after explaining that the drives
> were in different physical systems in different data centres and that I had
> been in contact with another Intel customer who had experienced similar
> failures in Dell equipment (our servers are pure Intel).
>
>
>
>
>
> Perhaps there’s interest in a Lawyer picking up the issue and their
> attitude. Not advising customers of a known issue which leads to data loss
> is simply negligent, especially on a product that they tout as being more
> reliable than spinners and has their Data Centre reliability stamp.
>
>
>
> I returned the lot and am done with Intel SSDs, will advise as many
> customers and peers to do the same…
>
>
>
>
>
> Regards
>
> David Herselman
>
>
>
>
>
> *From:* ceph-users [mailto:ceph-users-boun...@lists.ceph.com] *On Behalf
> Of *Mike Lovell
> *Sent:* Thursday, 22 February 2018 11:19 PM
> *To:* ceph-users@lists.ceph.com
>
> *Subject:* Re: [ceph-users] Many concurrent drive failures - How do I
> activate pgs?
>
>
>
> has anyone tried with the most recent firmwares from intel? i've had a
> number of s4600 960gb drives that have been waiting for me to get around to
> adding them to a ceph cluster. this as well as having 2 die almost
> simultaneously in a different storage box is giving me pause. i noticed
> that David listed some output showing his ssds were running firmware
> version SCV10100. the drives i have came with the same one. it looks
> like SCV10111 is available through the latest isdct package. i'm working
> through upgrading mine and attempting some burn in testing. just curious if
> anyone has had any luck there.
>
>
>
> mike
>
>
>
> On Thu, Feb 22, 2018 at 9:49 AM, Chris Sarginson <csarg...@gmail.com>
> wrote:
>
> Hi Caspar,
>
>
>
> Sean and I replaced the problematic DC S4600 disks (after all but one had
> failed) in our cluster with Samsung SM863a disks.
>
> There was an NDA for new Intel firmware (as mentioned earlier in the
> thread by David) but given the problems we were experiencing we moved all
> Intel disks to a single failure domain but were unable to get to deploy
> additional firmware to test.
>
>
> The Samsung should fit your requirements.
>
>
>
> http://www.samsung.com/semiconductor/minisite/ssd/
> product/enterprise/sm863a/
>
>
>
> Regards
>
> Chris
>
>
>
> On Thu, 22 Feb 2018 at 12:50 Caspar Smit <caspars...@supernas.eu> wrote:
>
> Hi Sean and David,
>
>
>
> Do you have any follow ups / news on the Intel DC S4600 case? We are
> looking into this drives to use as DB/WAL devices for a new to be build
> cluster.
>
>
>
> Did Intel provide anything (like new firmware) which should fix the issues
> you were having or are these drives still unreliable?
>
>
>
> At the moment we are also looking into the Intel DC S3610 as an
> alternative which are a step back in performance but should be very
> reliable.
>
>
>
> Maybe any other recommendations for a ~200GB 2,5" SATA SSD to use as
> DB/WAL? (Aiming for ~3 DWPD should be sufficient for DB/WAL?)
>
>
>
> Kind regards,
>
> Caspar
>
>
>
> 2018-01-12 15:45 GMT+01:00 Sean Redmond <sean.redmo...@gmail.com>:
>
> Hi David,
>
>
>
> To follow up on this I had a 4th drive fail (out of 12) and have opted to
> order the below disks as a replacement, I have an ongoing case with Intel
> via the supplier - Will report back anything useful - But I am going to
> avoid the Intel s4600 2TB SSD's for the moment.
>
>
>
> 1.92TB Samsung SM863a 2.5" Enterprise SSD, SATA3 6Gb/s, 2-bit MLC V-NAND
>
>
>
> Regards
>
> Sean Redmond
>
>
>
> On Wed, Jan 10, 2018 at 11:08 PM, Sean Redmond <sean.redmo...@gmail.com>
> wrote:
>
> Hi David,
>
>
>
> Thanks for your email, they are connected inside Dell R730XD (2.5 inch 24
> disk model) in None RAID mode via a perc RAID card.
>
>
>
> The version of ceph is Jewel with kernel 4.13.X and ubuntu 16.04.
>
>
>
> Thanks for your feedback on the HGST disks.
>
>
>
> Thanks
>
>
>
> On Wed, Jan 10, 2018 at 10:55 PM, David Herselman <d...@syrex.co> wrote:
>
> Hi Sean,
>
>
>
> No, Intel’s feedback has been… Pathetic… I have yet to receive anything
> more than a request to ‘sign’ a non-disclosure agreement, to obtain beta
> firmware. No official answer as to whether or not one can logically unlock
> the drives, no answer to my question whether or not Intel publish serial
> numbers anywhere pertaining to recalled batches and no information
> pertaining to whether or not firmware updates would address any known
> issues.
>
>
>
> This with us being an accredited Intel Gold partner…
>
>
>
>
>
> We’ve returned the lot and ended up with 9/12 of the drives failing in the
> same manner. The replaced drives, which had different serial number ranges,
> also failed. Very frustrating is that the drives fail in a way that result
> in unbootable servers, unless one adds ‘rootdelay=240’ to the kernel.
>
>
>
>
>
> I would be interested to know what platform your drives were in and
> whether or not they were connected to a RAID module/card.
>
>
>
> PS: After much searching we’ve decided to order the NVMe conversion kit
> and have ordered HGST UltraStar SN200 2.5 inch SFF drives with a 3 DWPD
> rating.
>
>
>
>
>
> Regards
>
> David Herselman
>
>
>
> *From:* Sean Redmond [mailto:sean.redmo...@gmail.com]
> *Sent:* Thursday, 11 January 2018 12:45 AM
> *To:* David Herselman <d...@syrex.co>
> *Cc:* Christian Balzer <ch...@gol.com>; ceph-users@lists.ceph.com
>
>
> *Subject:* Re: [ceph-users] Many concurrent drive failures - How do I
> activate pgs?
>
>
>
> Hi,
>
>
>
> I have a case where 3 out to 12 of these Intel S4600 2TB model failed
> within a matter of days after being burn-in tested then placed into
> production.
>
>
>
> I am interested to know, did you every get any further feedback from the
> vendor on your issue?
>
>
>
> Thanks
>
>
>
> On Thu, Dec 21, 2017 at 1:38 PM, David Herselman <d...@syrex.co> wrote:
>
> Hi,
>
> I assume this can only be a physical manufacturing flaw or a firmware bug?
> Do Intel publish advisories on recalled equipment? Should others be
> concerned about using Intel DC S4600 SSD drives? Could this be an
> electrical issue on the Hot Swap Backplane or BMC firmware issue? Either
> way, all pure Intel...
>
> The hole is only 1.3 GB (4 MB x 339 objects) but perfectly striped through
> images, file systems are subsequently severely damaged.
>
> Is it possible to get Ceph to read in partial data shards? It would
> provide between 25-75% more yield...
>
>
> Is there anything wrong with how we've proceeded thus far? Would be nice
> to reference examples of using ceph-objectstore-tool but documentation is
> virtually non-existent.
>
> We used another SSD drive to simulate bringing all the SSDs back online.
> We carved up the drive to provide equal partitions to essentially simulate
> the original SSDs:
>   # Partition a drive to provide 12 x 150GB partitions, eg:
>     sdd       8:48   0   1.8T  0 disk
>     |-sdd1    8:49   0   140G  0 part
>     |-sdd2    8:50   0   140G  0 part
>     |-sdd3    8:51   0   140G  0 part
>     |-sdd4    8:52   0   140G  0 part
>     |-sdd5    8:53   0   140G  0 part
>     |-sdd6    8:54   0   140G  0 part
>     |-sdd7    8:55   0   140G  0 part
>     |-sdd8    8:56   0   140G  0 part
>     |-sdd9    8:57   0   140G  0 part
>     |-sdd10   8:58   0   140G  0 part
>     |-sdd11   8:59   0   140G  0 part
>     +-sdd12   8:60   0   140G  0 part
>
>
>   Pre-requisites:
>     ceph osd set noout;
>     apt-get install uuid-runtime;
>
>
>   for ID in `seq 24 35`; do
>     UUID=`uuidgen`;
>     OSD_SECRET=`ceph-authtool --gen-print-key`;
>     DEVICE='/dev/sdd'$[$ID-23]; # 24-23 = /dev/sdd1, 35-23 = /dev/sdd12
>     echo "{\"cephx_secret\": \"$OSD_SECRET\"}" | ceph osd new $UUID $ID -i
> - -n client.bootstrap-osd -k /var/lib/ceph/bootstrap-osd/ceph.keyring;
>     mkdir /var/lib/ceph/osd/ceph-$ID;
>     mkfs.xfs $DEVICE;
>     mount $DEVICE /var/lib/ceph/osd/ceph-$ID;
>     ceph-authtool --create-keyring /var/lib/ceph/osd/ceph-$ID/keyring
> --name osd.$ID --add-key $OSD_SECRET;
>     ceph-osd -i $ID --mkfs --osd-uuid $UUID;
>     chown -R ceph:ceph /var/lib/ceph/osd/ceph-$ID;
>     systemctl enable ceph-osd@$ID;
>     systemctl start ceph-osd@$ID;
>   done
>
>
> Once up we imported previous exports of empty head files in to 'real' OSDs:
>   kvm5b:
>     systemctl stop ceph-osd@8;
>     ceph-objectstore-tool --op import --pgid 7.4s0 --data-path
> /var/lib/ceph/osd/ceph-8 --journal-path /var/lib/ceph/osd/ceph-8/journal
> --file /var/lib/vz/template/ssd_recovery/osd8_7.4s0.export;
>     chown ceph:ceph -R /var/lib/ceph/osd/ceph-8;
>     systemctl start ceph-osd@8;
>   kvm5f:
>     systemctl stop ceph-osd@23;
>     ceph-objectstore-tool --op import --pgid 7.fs0 --data-path
> /var/lib/ceph/osd/ceph-23 --journal-path /var/lib/ceph/osd/ceph-23/journal
> --file /var/lib/vz/template/ssd_recovery/osd23_7.fs0.export;
>     chown ceph:ceph -R /var/lib/ceph/osd/ceph-23;
>     systemctl start ceph-osd@23;
>
>
> Bulk import previously exported objects:
>     cd /var/lib/vz/template/ssd_recovery;
>     for FILE in `ls -1A osd*_*.export | grep -Pv '^osd(8|23)_'`; do
>       OSD=`echo $FILE | perl -pe 's/^osd(\d+).*/\1/'`;
>       PGID=`echo $FILE | perl -pe 's/^osd\d+_(.*?).export/\1/g'`;
>       echo -e "systemctl stop ceph-osd@$OSD\t ceph-objectstore-tool --op
> import --pgid $PGID --data-path /var/lib/ceph/osd/ceph-$OSD --journal-path
> /var/lib/ceph/osd/ceph-$OSD/journal --file /var/lib/vz/template/ssd_
> recovery/osd"$OSD"_$PGID.export";
>     done | sort
>
> Sample output (this will wrap):
> systemctl stop ceph-osd@27       ceph-objectstore-tool --op import --pgid
> 7.4s3 --data-path /var/lib/ceph/osd/ceph-27 --journal-path
> /var/lib/ceph/osd/ceph-27/journal --file /var/lib/vz/template/ssd_
> recovery/osd27_7.4s3.export
> systemctl stop ceph-osd@27       ceph-objectstore-tool --op import --pgid
> 7.fs5 --data-path /var/lib/ceph/osd/ceph-27 --journal-path
> /var/lib/ceph/osd/ceph-27/journal --file /var/lib/vz/template/ssd_
> recovery/osd27_7.fs5.export
> systemctl stop ceph-osd@30       ceph-objectstore-tool --op import --pgid
> 7.fs4 --data-path /var/lib/ceph/osd/ceph-30 --journal-path
> /var/lib/ceph/osd/ceph-30/journal --file /var/lib/vz/template/ssd_
> recovery/osd30_7.fs4.export
> systemctl stop ceph-osd@31       ceph-objectstore-tool --op import --pgid
> 7.4s2 --data-path /var/lib/ceph/osd/ceph-31 --journal-path
> /var/lib/ceph/osd/ceph-31/journal --file /var/lib/vz/template/ssd_
> recovery/osd31_7.4s2.export
> systemctl stop ceph-osd@32       ceph-objectstore-tool --op import --pgid
> 7.4s4 --data-path /var/lib/ceph/osd/ceph-32 --journal-path
> /var/lib/ceph/osd/ceph-32/journal --file /var/lib/vz/template/ssd_
> recovery/osd32_7.4s4.export
> systemctl stop ceph-osd@32       ceph-objectstore-tool --op import --pgid
> 7.fs2 --data-path /var/lib/ceph/osd/ceph-32 --journal-path
> /var/lib/ceph/osd/ceph-32/journal --file /var/lib/vz/template/ssd_
> recovery/osd32_7.fs2.export
> systemctl stop ceph-osd@34       ceph-objectstore-tool --op import --pgid
> 7.4s5 --data-path /var/lib/ceph/osd/ceph-34 --journal-path
> /var/lib/ceph/osd/ceph-34/journal --file /var/lib/vz/template/ssd_
> recovery/osd34_7.4s5.export
> systemctl stop ceph-osd@34       ceph-objectstore-tool --op import --pgid
> 7.fs1 --data-path /var/lib/ceph/osd/ceph-34 --journal-path
> /var/lib/ceph/osd/ceph-34/journal --file /var/lib/vz/template/ssd_
> recovery/osd34_7.fs1.export
>
>
> Reset permissions and then started the OSDs:
> for OSD in 27 30 31 32 34; do
>   chown -R ceph:ceph /var/lib/ceph/osd/ceph-$OSD;
>   systemctl start ceph-osd@$OSD;
> done
>
>
> Then finally started all the OSDs... Now to hope that Intel have a way of
> accessing drives that are in a 'disable logical state'.
>
>
>
> The imports succeed, herewith a link to the output after running an import
> for placement group 7.4s2 on OSD 31:
>   https://drive.google.com/open?id=1-Jo1jmrWrGLO2OgflacGPlEf2p32Y4hn
>
> Sample snippet:
>     Write 1#7:fffcd2ec:::rbd_data.4.be8e9974b0dc51.0000000000002869:head#
>     snapset 0=[]:{}
>     Write 1#7:fffd4823:::rbd_data.4.ba24ef2ae8944a.000000000000a2b0:head#
>     snapset 0=[]:{}
>     Write 1#7:fffd6fb6:::benchmark_data_kvm5b_20945_object14722:head#
>     snapset 0=[]:{}
>     Write 1#7:ffffa069:::rbd_data.4.ba24ef2ae8944a.000000000000aea9:head#
>     snapset 0=[]:{}
>     Import successful
>
>
> Data does get written, I can tell by the size of the FileStore mount
> points:
>   [root@kvm5b ssd_recovery]# df -h | grep -P 'ceph-(27|30|31|32|34)$'
>   /dev/sdd4       140G  5.2G  135G   4% /var/lib/ceph/osd/ceph-27
>   /dev/sdd7       140G   14G  127G  10% /var/lib/ceph/osd/ceph-30
>   /dev/sdd8       140G   14G  127G  10% /var/lib/ceph/osd/ceph-31
>   /dev/sdd9       140G   22G  119G  16% /var/lib/ceph/osd/ceph-32
>   /dev/sdd11      140G   22G  119G  16% /var/lib/ceph/osd/ceph-34
>
>
> How do I tell Ceph to read these object shards?
>
>
>
> PS: It's probably a good idea to reweight the OSDs to 0 before starting
> again. This should prevent data flowing on to them, if they are not in a
> different device class or other crush selection ruleset. Ie:
>   for OSD in `seq 24 35`; do
>     ceph osd crush reweight osd.$OSD 0;
>   done
>
>
> Regards
> David Herselman
>
> -----Original Message-----
>
> From: David Herselman
> Sent: Thursday, 21 December 2017 3:49 AM
> To: 'Christian Balzer' <ch...@gol.com>; ceph-users@lists.ceph.com
> Subject: RE: [ceph-users] Many concurrent drive failures - How do I
> activate pgs?
>
> Hi Christian,
>
> Thanks for taking the time, I haven't been contacted by anyone yet but
> managed to get the down placement groups cleared by exporting 7.4s0 and
> 7.fs0 and then marking them as complete on the surviving OSDs:
>     kvm5c:
>       ceph-objectstore-tool --op export --pgid 7.4s0 --data-path
> /var/lib/ceph/osd/ceph-8 --journal-path /var/lib/ceph/osd/ceph-8/journal
> --file /var/lib/vz/template/ssd_recovery/osd8_7.4s0.export;
>       ceph-objectstore-tool --op mark-complete --data-path
> /var/lib/ceph/osd/ceph-8 --journal-path /var/lib/ceph/osd/ceph-8/journal
> --pgid 7.4s0;
>     kvm5f:
>       ceph-objectstore-tool --op export --pgid 7.fs0 --data-path
> /var/lib/ceph/osd/ceph-23 --journal-path /var/lib/ceph/osd/ceph-23/journal
> --file /var/lib/vz/template/ssd_recovery/osd23_7.fs0.export;
>       ceph-objectstore-tool --op mark-complete --data-path
> /var/lib/ceph/osd/ceph-23 --journal-path /var/lib/ceph/osd/ceph-23/journal
> --pgid 7.fs0;
>
> This would presumably simply punch holes in the RBD images but at least we
> can copy them out of that pool and hope that Intel can somehow unlock the
> drives for us to then export/import objects.
>
>
> To answer your questions though, we have 6 near identical Intel Wildcat
> Pass 1U servers and have Proxmox loaded on them. Proxmox uses a Debian 9
> base with the Ubuntu kernel, for which they apply cherry picked kernel
> patches (eg Intel NIC driver updates, vhost perf regression and mem-leak
> fixes, etc):
>
> kvm5a:
>        Intel R1208WTTGSR System (serial: BQWS55091014)
>        Intel S2600WTTR Motherboard (serial: BQWL54950385, BIOS ID:
> SE5C610.86B.01.01.0021.032120170601)
>        2 x Intel Xeon E5-2640v4 2.4GHz (HT disabled)
>        24 x Micron 8GB DDR4 2133MHz (24 x 18ASF1G72PZ-2G1B1)
>        Intel AXX10GBNIA I/O Module
> kvm5b:
>        Intel R1208WTTGS System (serial: BQWS53890178)
>        Intel S2600WTT Motherboard (serial: BQWL52550359, BIOS ID:
> SE5C610.86B.01.01.0021.032120170601)
>        2 x Intel Xeon E5-2640v4 2.4GHz (HT enabled)
>        4 x Micron 64GB DDR4 2400MHz LR-DIMM (4 x 72ASS8G72LZ-2G3B2)
>        Intel AXX10GBNIA I/O Module
> kvm5c:
>        Intel R1208WT2GS System (serial: BQWS50490279)
>        Intel S2600WT2 Motherboard (serial: BQWL44650203, BIOS ID:
> SE5C610.86B.01.01.0021.032120170601)
>        2 x Intel Xeon E5-2640v3 2.6GHz (HT enabled)
>        4 x Micron 64GB DDR4 2400MHz LR-DIMM (4 x 72ASS8G72LZ-2G3B2)
>        Intel AXX10GBNIA I/O Module
> kvm5d:
>        Intel R1208WTTGSR System (serial: BQWS62291318)
>        Intel S2600WTTR Motherboard (serial: BQWL61855187, BIOS ID:
> SE5C610.86B.01.01.0021.032120170601)
>        2 x Intel Xeon E5-2640v4 2.4GHz (HT enabled)
>        4 x Micron 64GB DDR4 2400MHz LR-DIMM (4 x 72ASS8G72LZ-2G3B2)
>        Intel AXX10GBNIA I/O Module
> kvm5e:
>        Intel R1208WTTGSR System (serial: BQWS64290162)
>        Intel S2600WTTR Motherboard (serial: BQWL63953066, BIOS ID:
> SE5C610.86B.01.01.0021.032120170601)
>        2 x Intel Xeon E5-2640v4 2.4GHz (HT enabled)
>        4 x Micron 64GB DDR4 2400MHz LR-DIMM (4 x 72ASS8G72LZ-2G3B2)
>        Intel AXX10GBNIA I/O Module
> kvm5f:
>        Intel R1208WTTGSR System (serial: BQWS71790632)
>        Intel S2600WTTR Motherboard (serial: BQWL71050622, BIOS ID:
> SE5C610.86B.01.01.0021.032120170601)
>        2 x Intel Xeon E5-2640v4 2.4GHz (HT enabled)
>        4 x Micron 64GB DDR4 2400MHz LR-DIMM (4 x 72ASS8G72LZ-2G3B2)
>        Intel AXX10GBNIA I/O Module
>
> Summary:
>   * 5b has an Intel S2600WTT, 5c has an Intel S2600WT2, all others have
> S2600WTTR Motherboards
>   * 5a has ECC Registered Dual Rank DDR DIMMs, all others have ECC
> LoadReduced-DIMMs
>   * 5c has an Intel X540-AT2 10 GbE adapter as the on-board NICs are only
> 1 GbE
>
>
> Each system has identical discs:
>   * 2 x 480 GB Intel SSD DC S3610 (SSDSC2BX480G4) - partitioned as
> software RAID1 OS volume and Ceph FileStore journals (spinners)
>   * 4 x 2 TB Seagate discs (ST2000NX0243) - Ceph FileStore OSDs (journals
> in S3610 partitions)
>   * 2 x 1.9 TB Intel SSD DC S4600 (SSDSC2KG019T7) - Ceph BlueStore OSDs
> (problematic)
>
>
> Additional information:
>   * All drives are directly attached to the on-board AHCI SATA
> controllers, via the standard 2.5 inch drive chassis hot-swap bays.
>   * We added 12 x 1.9 TB SSD DC S4600 drives last week Thursday, 2 in each
> system's slots 7 & 8
>   * Systems have been operating with existing Intel SSD DC S3610 and 2 TB
> Seagate discs for over a year; we added the most recent node (kvm5f) on the
> 23rd of November.
>   * 6 of the 12 Intel SSD DC S4600 drives failed in less than 100 hours.
>   * They work perfectly until they suddenly stop responding and are
> thereafter, even with us physically shutting down the server and powering
> it back up again, completely inaccessible. Intel diagnostic tool reports
> 'logically locked'.
>
>
> Drive failures appear random to me:
>     kvm5a - bay 7 - offline - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
> SerialNo=BTYM739208851P9DGN
>     kvm5a - bay 8 - online  - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
> SerialNo=PHYM727602TM1P9DGN
>     kvm5b - bay 7 - offline - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
> SerialNo=PHYM7276031E1P9DGN
>     kvm5b - bay 8 - online  - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
> SerialNo=BTYM7392087W1P9DGN
>     kvm5c - bay 7 - offline - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
> SerialNo=BTYM739200ZJ1P9DGN
>     kvm5c - bay 8 - offline - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
> SerialNo=BTYM7392088B1P9DGN
>     kvm5d - bay 7 - offline - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
> SerialNo=BTYM738604Y11P9DGN
>     kvm5d - bay 8 - online  - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
> SerialNo=PHYM727603181P9DGN
>     kvm5e - bay 7 - online  - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
> SerialNo=BTYM7392013B1P9DGN
>     kvm5e - bay 8 - offline - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
> SerialNo=BTYM7392087E1P9DGN
>     kvm5f - bay 7 - online  - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
> SerialNo=BTYM739208721P9DGN
>     kvm5f - bay 8 - online  - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
> SerialNo=BTYM739208C41P9DGN
>
>
> Intel SSD Data Center Tool reports:
> C:\isdct>isdct.exe show -intelssd
>
> - Intel SSD DC S4600 Series PHYM7276031E1P9DGN -
>
> Bootloader : Property not found
> DevicePath : \\\\.\\PHYSICALDRIVE1
> DeviceStatus : Selected drive is in a disable logical state.
> Firmware : SCV10100
> FirmwareUpdateAvailable : Please contact Intel Customer Support for
> further assistance at the following website: http://www.intel.com/go/
> ssdsupport.
> Index : 0
> ModelNumber : INTEL SSDSC2KG019T7
> ProductFamily : Intel SSD DC S4600 Series SerialNumber : PHYM7276031E1P9DGN
>
>
> C:\isdct>isdct show -a -intelssd 0
>
> - Intel SSD DC S4600 Series PHYM7276031E1P9DGN -
>
> AccessibleMaxAddressSupported : True
> AggregationThreshold : Selected drive is in a disable logical state.
> AggregationTime : Selected drive is in a disable logical state.
> ArbitrationBurst : Selected drive is in a disable logical state.
> BusType : 11
> CoalescingDisable : Selected drive is in a disable logical state.
> ControllerCompatibleIDs : PCI\\VEN_8086&DEV_8C02&REV_
> 05PCI\\VEN_8086&DEV_8C02PCI\\VEN_8086&CC_010601PCI\\VEN_
> 8086&CC_0106PCI\\VEN_8086PCI\\CC_010601PCI\\CC_0106
> ControllerDescription : @mshdc.inf,%pci\\cc_010601.devicedesc%;Standard
> SATA AHCI Controller ControllerID : PCI\\VEN_8086&DEV_8C02&SUBSYS_
> 78461462&REV_05\\3&11583659&0&FA
> ControllerIDEMode : False
> ControllerManufacturer : @mshdc.inf,%ms-ahci%;Standard SATA AHCI
> Controller ControllerService : storahci DIPMEnabled : False DIPMSupported :
> False DevicePath : \\\\.\\PHYSICALDRIVE1 DeviceStatus : Selected drive is
> in a disable logical state.
> DigitalFenceSupported : False
> DownloadMicrocodePossible : True
> DriverDescription : Standard SATA AHCI Controller DriverMajorVersion : 10
> DriverManufacturer : Standard SATA AHCI Controller DriverMinorVersion : 0
> DriverVersion : 10.0.16299.98 DynamicMMIOEnabled : The selected drive does
> not support this feature.
> EnduranceAnalyzer : Selected drive is in a disable logical state.
> ErrorString : *BAD_CONTEXT_2020 F4
> Firmware : SCV10100
> FirmwareUpdateAvailable : Please contact Intel Customer Support for
> further assistance at the following website: http://www.intel.com/go/
> ssdsupport.
> HDD : False
> HighPriorityWeightArbitration : Selected drive is in a disable logical
> state.
> IEEE1667Supported : False
> IOCompletionQueuesRequested : Selected drive is in a disable logical state.
> IOSubmissionQueuesRequested : Selected drive is in a disable logical state.
> Index : 0
> Intel : True
> IntelGen3SATA : True
> IntelNVMe : False
> InterruptVector : Selected drive is in a disable logical state.
> IsDualPort : False
> LatencyTrackingEnabled : Selected drive is in a disable logical state.
> LowPriorityWeightArbitration : Selected drive is in a disable logical
> state.
> Lun : 0
> MaximumLBA : 3750748847
> MediumPriorityWeightArbitration : Selected drive is in a disable logical
> state.
> ModelNumber : INTEL SSDSC2KG019T7
> NVMePowerState : Selected drive is in a disable logical state.
> NativeMaxLBA : Selected drive is in a disable logical state.
> OEM : Generic
> OpalState : Selected drive is in a disable logical state.
> PLITestTimeInterval : Selected drive is in a disable logical state.
> PNPString : SCSI\\DISK&VEN_INTEL&PROD_SSDSC2KG019T7\\4&2BE6C224&0&010000
> PathID : 1
> PhySpeed : Selected drive is in a disable logical state.
> PhysicalSectorSize : Selected drive is in a disable logical state.
> PhysicalSize : 1920383410176
> PowerGovernorAveragePower : Selected drive is in a disable logical state.
> PowerGovernorBurstPower : Selected drive is in a disable logical state.
> PowerGovernorMode : Selected drive is in a disable logical state.
> Product : Youngsville
> ProductFamily : Intel SSD DC S4600 Series ProductProtocol : ATA
> ReadErrorRecoveryTimer : Selected drive is in a disable logical state.
> RemoteSecureEraseSupported : False
> SCSIPortNumber : 0
> SMARTEnabled : True
> SMARTHealthCriticalWarningsConfiguration : Selected drive is in a disable
> logical state.
> SMARTSelfTestSupported : True
> SMBusAddress : Selected drive is in a disable logical state.
> SSCEnabled : False
> SanitizeBlockEraseSupported : False
> SanitizeCryptoScrambleSupported : True
> SanitizeSupported : True
> SataGen1 : True
> SataGen2 : True
> SataGen3 : True
> SataNegotiatedSpeed : Unknown
> SectorSize : 512
> SecurityEnabled : False
> SecurityFrozen : False
> SecurityLocked : False
> SecuritySupported : False
> SerialNumber : PHYM7276031E1P9DGN
> TCGSupported : False
> TargetID : 0
> TempThreshold : Selected drive is in a disable logical state.
> TemperatureLoggingInterval : Selected drive is in a disable logical state.
> TimeLimitedErrorRecovery : Selected drive is in a disable logical state.
> TrimSize : 4
> TrimSupported : True
> VolatileWriteCacheEnabled : Selected drive is in a disable logical state.
> WWID : 3959312879584368077
> WriteAtomicityDisableNormal : Selected drive is in a disable logical state.
> WriteCacheEnabled : True
> WriteCacheReorderingStateEnabled : Selected drive is in a disable logical
> state.
> WriteCacheState : Selected drive is in a disable logical state.
> WriteCacheSupported : True
> WriteErrorRecoveryTimer : Selected drive is in a disable logical state.
>
>
>
> SMART information is inaccessible, overall status is failed. Herewith the
> stats from a partner disc which was still working when the others failed:
> Device Model:     INTEL SSDSC2KG019T7
> Serial Number:    PHYM727602TM1P9DGN
> LU WWN Device Id: 5 5cd2e4 14e1636bb
> Firmware Version: SCV10100
> User Capacity:    1,920,383,410,176 bytes [1.92 TB]
> Sector Sizes:     512 bytes logical, 4096 bytes physical
> Rotation Rate:    Solid State Device
> Form Factor:      2.5 inches
> Device is:        Not in smartctl database [for details use: -P showall]
> ATA Version is:   ACS-3 T13/2161-D revision 5
> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
> Local Time is:    Mon Dec 18 19:33:51 2017 SAST
> SMART support is: Available - device has SMART capability.
> SMART support is: Enabled
>
> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED
> WHEN_FAILED RAW_VALUE
>   5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age   Always
>      -       0
>   9 Power_On_Hours          0x0032   100   100   000    Old_age   Always
>      -       98
> 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
>    -       3
> 170 Unknown_Attribute       0x0033   100   100   010    Pre-fail  Always
>      -       0
> 171 Unknown_Attribute       0x0032   100   100   000    Old_age   Always
>      -       1
> 172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always
>      -       0
> 174 Unknown_Attribute       0x0032   100   100   000    Old_age   Always
>      -       0
> 175 Program_Fail_Count_Chip 0x0033   100   100   010    Pre-fail  Always
>      -       17567121432
> 183 Runtime_Bad_Block       0x0032   100   100   000    Old_age   Always
>      -       0
> 184 End-to-End_Error        0x0033   100   100   090    Pre-fail  Always
>      -       0
> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always
>      -       0
> 190 Airflow_Temperature_Cel 0x0022   077   076   000    Old_age   Always
>      -       23 (Min/Max 17/29)
> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age   Always
>      -       0
> 194 Temperature_Celsius     0x0022   100   100   000    Old_age   Always
>      -       23
> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age   Always
>      -       0
> 199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always
>      -       0
> 225 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always
>      -       14195
> 226 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always
>      -       0
> 227 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age   Always
>      -       42
> 228 Power-off_Retract_Count 0x0032   100   100   000    Old_age   Always
>      -       5905
> 232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail  Always
>      -       0
> 233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always
>      -       0
> 234 Unknown_Attribute       0x0032   100   100   000    Old_age   Always
>      -       0
> 241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always
>      -       14195
> 242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always
>      -       10422
> 243 Unknown_Attribute       0x0032   100   100   000    Old_age   Always
>      -       41906
>
>
> Media wear out : 0% used
> LBAs written: 14195
> Power on hours: <100
> Power cycle count: once at the factory, once at our offices to check if
> there was newer firmware (there wasn't) and once when we restarted the node
> to see if it could then access a failed drive.
>
>
> Regards
> David Herselman
>
>
> -----Original Message-----
> From: Christian Balzer [mailto:ch...@gol.com]
> Sent: Thursday, 21 December 2017 3:24 AM
> To: ceph-users@lists.ceph.com
> Cc: David Herselman <d...@syrex.co>
> Subject: Re: [ceph-users] Many concurrent drive failures - How do I
> activate pgs?
>
> Hello,
>
> first off, I don't have anything to add to your conclusions of the current
> status, alas there are at least 2 folks here on the ML making a living from
> Ceph disaster recovery, so I hope you have been contacted already.
>
> Now once your data is safe or you have a moment, I and others here would
> probably be quite interested in some more details, see inline below.
>
> On Wed, 20 Dec 2017 22:25:23 +0000 David Herselman wrote:
>
> [snip]
> >
> > We've happily been running a 6 node cluster with 4 x FileStore HDDs per
> node (journals on SSD partitions) for over a year and recently upgraded all
> nodes to Debian 9, Ceph Luminous 12.2.2 and kernel 4.13.8. We ordered 12 x
> Intel DC S4600 SSDs which arrived last week so we added two per node on
> Thursday evening and brought them up as BlueStore OSDs. We had proactively
> updated our existing pools to reference only devices classed as 'hdd', so
> that we could move select images over to ssd replicated and erasure coded
> pools.
> >
> Could you tell us more about that cluster, as in HW, how are the SSDs
> connected and FW version of the controller if applicable.
>
> Kernel 4.13.8 suggests that this is a handrolled, upstream kernel.
> While not necessarily related I'll note that as far as Debian kernels
> (which are very lightly if at all patched) are concerned, nothing beyond
> 4.9 has been working to my satisfaction.
> 4.11 still worked, but 4.12 crash-reboot-looped on all my Supermicro X10
> machines (quite a varied selection).
> The current 4.13.13 backport boots on some of those machines, but still
> throws errors with the EDAC devices, which works fine with 4.9.
>
> 4.14 is known to happily destroy data if used with bcache and even if one
> doesn't use that it should give you pause.
>
> > We were pretty diligent and downloaded Intel's Firmware Update Tool and
> validated that each new drive had the latest available firmware before
> installing them in the nodes. We did numerous benchmarks on Friday and
> eventually moved some images over to the new storage pools. Everything was
> working perfectly and extensive tests on Sunday showed excellent
> performance. Sunday night one of the new SSDs died and Ceph replicated and
> redistributed data accordingly, then another failed in the early hours of
> Monday morning and Ceph did what it needed to.
> >
> > We had the two failed drives replaced by 11am and Ceph was up to
> 2/4918587 objects degraded (0.000%) when a third drive failed. At this
> point we updated the crush maps for the rbd_ssd and ec_ssd pools and set
> the device class to 'hdd', to essentially evacuate everything off the SSDs.
> Other SSDs then failed at 3:22pm, 4:19pm, 5:49pm and 5:50pm. We've
> ultimately lost half the Intel S4600 drives, which are all completely
> inaccessible. Our status at 11:42pm Monday night was: 1/1398478 objects
> unfound (0.000%) and 339/4633062 objects degraded (0.007%).
> >
> The relevant logs when and how those SSDs failed would be interesting.
> Was the distribution of the failed SSDs random among the cluster?
> Are you running smartd and did it have something to say?
>
> Completely inaccessible sounds a lot like the infamous "self-bricking" of
> Intel SSDs when they discover something isn't right, or they don't like the
> color scheme of the server inside (^.^).
>
> I'm using quite a lot of Intel SSDs and had only one "fatal" incident.
> A DC S3700 detected that its powercap had failed, but of course kept
> working fine. Until a reboot was need, when it promptly bricked itself,
> data inaccessible, SMART reporting barely that something was there.
>
> So one wonders what caused your SSDs to get their knickers in such a twist.
> Are the survivors showing any unusual signs in their SMART output?
>
> Of course what your vendor/Intel will have to say will also be of
> interest. ^o^
>
> Regards,
>
> Christian
> --
> Christian Balzer        Network/Systems Engineer
> ch...@gol.com           Rakuten Communications
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

Reply via email to