Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

Caspar Smit Thu, 22 Feb 2018 04:51:07 -0800

Hi Sean and David,

Do you have any follow ups / news on the Intel DC S4600 case? We are
looking into this drives to use as DB/WAL devices for a new to be build
cluster.


Did Intel provide anything (like new firmware) which should fix the issues
you were having or are these drives still unreliable?

At the moment we are also looking into the Intel DC S3610 as an alternative
which are a step back in performance but should be very reliable.

Maybe any other recommendations for a ~200GB 2,5" SATA SSD to use as
DB/WAL? (Aiming for ~3 DWPD should be sufficient for DB/WAL?)

Kind regards,
Caspar

2018-01-12 15:45 GMT+01:00 Sean Redmond <sean.redmo...@gmail.com>:

> Hi David,
>
> To follow up on this I had a 4th drive fail (out of 12) and have opted to
> order the below disks as a replacement, I have an ongoing case with Intel
> via the supplier - Will report back anything useful - But I am going to
> avoid the Intel s4600 2TB SSD's for the moment.
>
> 1.92TB Samsung SM863a 2.5" Enterprise SSD, SATA3 6Gb/s, 2-bit MLC V-NAND
>
> Regards
> Sean Redmond
>
> On Wed, Jan 10, 2018 at 11:08 PM, Sean Redmond <sean.redmo...@gmail.com>
> wrote:
>
>> Hi David,
>>
>> Thanks for your email, they are connected inside Dell R730XD (2.5 inch 24
>> disk model) in None RAID mode via a perc RAID card.
>>
>> The version of ceph is Jewel with kernel 4.13.X and ubuntu 16.04.
>>
>> Thanks for your feedback on the HGST disks.
>>
>> Thanks
>>
>> On Wed, Jan 10, 2018 at 10:55 PM, David Herselman <d...@syrex.co> wrote:
>>
>>> Hi Sean,
>>>
>>>
>>>
>>> No, Intel’s feedback has been… Pathetic… I have yet to receive anything
>>> more than a request to ‘sign’ a non-disclosure agreement, to obtain beta
>>> firmware. No official answer as to whether or not one can logically unlock
>>> the drives, no answer to my question whether or not Intel publish serial
>>> numbers anywhere pertaining to recalled batches and no information
>>> pertaining to whether or not firmware updates would address any known
>>> issues.
>>>
>>>
>>>
>>> This with us being an accredited Intel Gold partner…
>>>
>>>
>>>
>>>
>>>
>>> We’ve returned the lot and ended up with 9/12 of the drives failing in
>>> the same manner. The replaced drives, which had different serial number
>>> ranges, also failed. Very frustrating is that the drives fail in a way that
>>> result in unbootable servers, unless one adds ‘rootdelay=240’ to the kernel.
>>>
>>>
>>>
>>>
>>>
>>> I would be interested to know what platform your drives were in and
>>> whether or not they were connected to a RAID module/card.
>>>
>>>
>>>
>>> PS: After much searching we’ve decided to order the NVMe conversion kit
>>> and have ordered HGST UltraStar SN200 2.5 inch SFF drives with a 3 DWPD
>>> rating.
>>>
>>>
>>>
>>>
>>>
>>> Regards
>>>
>>> David Herselman
>>>
>>>
>>>
>>> *From:* Sean Redmond [mailto:sean.redmo...@gmail.com]
>>> *Sent:* Thursday, 11 January 2018 12:45 AM
>>> *To:* David Herselman <d...@syrex.co>
>>> *Cc:* Christian Balzer <ch...@gol.com>; ceph-users@lists.ceph.com
>>>
>>> *Subject:* Re: [ceph-users] Many concurrent drive failures - How do I
>>> activate pgs?
>>>
>>>
>>>
>>> Hi,
>>>
>>>
>>>
>>> I have a case where 3 out to 12 of these Intel S4600 2TB model failed
>>> within a matter of days after being burn-in tested then placed into
>>> production.
>>>
>>>
>>>
>>> I am interested to know, did you every get any further feedback from the
>>> vendor on your issue?
>>>
>>>
>>>
>>> Thanks
>>>
>>>
>>>
>>> On Thu, Dec 21, 2017 at 1:38 PM, David Herselman <d...@syrex.co> wrote:
>>>
>>> Hi,
>>>
>>> I assume this can only be a physical manufacturing flaw or a firmware
>>> bug? Do Intel publish advisories on recalled equipment? Should others be
>>> concerned about using Intel DC S4600 SSD drives? Could this be an
>>> electrical issue on the Hot Swap Backplane or BMC firmware issue? Either
>>> way, all pure Intel...
>>>
>>> The hole is only 1.3 GB (4 MB x 339 objects) but perfectly striped
>>> through images, file systems are subsequently severely damaged.
>>>
>>> Is it possible to get Ceph to read in partial data shards? It would
>>> provide between 25-75% more yield...
>>>
>>>
>>> Is there anything wrong with how we've proceeded thus far? Would be nice
>>> to reference examples of using ceph-objectstore-tool but documentation is
>>> virtually non-existent.
>>>
>>> We used another SSD drive to simulate bringing all the SSDs back online.
>>> We carved up the drive to provide equal partitions to essentially simulate
>>> the original SSDs:
>>>   # Partition a drive to provide 12 x 150GB partitions, eg:
>>>     sdd       8:48   0   1.8T  0 disk
>>>     |-sdd1    8:49   0   140G  0 part
>>>     |-sdd2    8:50   0   140G  0 part
>>>     |-sdd3    8:51   0   140G  0 part
>>>     |-sdd4    8:52   0   140G  0 part
>>>     |-sdd5    8:53   0   140G  0 part
>>>     |-sdd6    8:54   0   140G  0 part
>>>     |-sdd7    8:55   0   140G  0 part
>>>     |-sdd8    8:56   0   140G  0 part
>>>     |-sdd9    8:57   0   140G  0 part
>>>     |-sdd10   8:58   0   140G  0 part
>>>     |-sdd11   8:59   0   140G  0 part
>>>     +-sdd12   8:60   0   140G  0 part
>>>
>>>
>>>   Pre-requisites:
>>>     ceph osd set noout;
>>>     apt-get install uuid-runtime;
>>>
>>>
>>>   for ID in `seq 24 35`; do
>>>     UUID=`uuidgen`;
>>>     OSD_SECRET=`ceph-authtool --gen-print-key`;
>>>     DEVICE='/dev/sdd'$[$ID-23]; # 24-23 = /dev/sdd1, 35-23 = /dev/sdd12
>>>     echo "{\"cephx_secret\": \"$OSD_SECRET\"}" | ceph osd new $UUID $ID
>>> -i - -n client.bootstrap-osd -k /var/lib/ceph/bootstrap-osd/ce
>>> ph.keyring;
>>>     mkdir /var/lib/ceph/osd/ceph-$ID;
>>>     mkfs.xfs $DEVICE;
>>>     mount $DEVICE /var/lib/ceph/osd/ceph-$ID;
>>>     ceph-authtool --create-keyring /var/lib/ceph/osd/ceph-$ID/keyring
>>> --name osd.$ID --add-key $OSD_SECRET;
>>>     ceph-osd -i $ID --mkfs --osd-uuid $UUID;
>>>     chown -R ceph:ceph /var/lib/ceph/osd/ceph-$ID;
>>>     systemctl enable ceph-osd@$ID;
>>>     systemctl start ceph-osd@$ID;
>>>   done
>>>
>>>
>>> Once up we imported previous exports of empty head files in to 'real'
>>> OSDs:
>>>   kvm5b:
>>>     systemctl stop ceph-osd@8;
>>>     ceph-objectstore-tool --op import --pgid 7.4s0 --data-path
>>> /var/lib/ceph/osd/ceph-8 --journal-path /var/lib/ceph/osd/ceph-8/journal
>>> --file /var/lib/vz/template/ssd_recovery/osd8_7.4s0.export;
>>>     chown ceph:ceph -R /var/lib/ceph/osd/ceph-8;
>>>     systemctl start ceph-osd@8;
>>>   kvm5f:
>>>     systemctl stop ceph-osd@23;
>>>     ceph-objectstore-tool --op import --pgid 7.fs0 --data-path
>>> /var/lib/ceph/osd/ceph-23 --journal-path /var/lib/ceph/osd/ceph-23/journal
>>> --file /var/lib/vz/template/ssd_recovery/osd23_7.fs0.export;
>>>     chown ceph:ceph -R /var/lib/ceph/osd/ceph-23;
>>>     systemctl start ceph-osd@23;
>>>
>>>
>>> Bulk import previously exported objects:
>>>     cd /var/lib/vz/template/ssd_recovery;
>>>     for FILE in `ls -1A osd*_*.export | grep -Pv '^osd(8|23)_'`; do
>>>       OSD=`echo $FILE | perl -pe 's/^osd(\d+).*/\1/'`;
>>>       PGID=`echo $FILE | perl -pe 's/^osd\d+_(.*?).export/\1/g'`;
>>>       echo -e "systemctl stop ceph-osd@$OSD\t ceph-objectstore-tool
>>> --op import --pgid $PGID --data-path /var/lib/ceph/osd/ceph-$OSD
>>> --journal-path /var/lib/ceph/osd/ceph-$OSD/journal --file
>>> /var/lib/vz/template/ssd_recovery/osd"$OSD"_$PGID.export";
>>>     done | sort
>>>
>>> Sample output (this will wrap):
>>> systemctl stop ceph-osd@27       ceph-objectstore-tool --op import
>>> --pgid 7.4s3 --data-path /var/lib/ceph/osd/ceph-27 --journal-path
>>> /var/lib/ceph/osd/ceph-27/journal --file /var/lib/vz/template/ssd_recov
>>> ery/osd27_7.4s3.export
>>> systemctl stop ceph-osd@27       ceph-objectstore-tool --op import
>>> --pgid 7.fs5 --data-path /var/lib/ceph/osd/ceph-27 --journal-path
>>> /var/lib/ceph/osd/ceph-27/journal --file /var/lib/vz/template/ssd_recov
>>> ery/osd27_7.fs5.export
>>> systemctl stop ceph-osd@30       ceph-objectstore-tool --op import
>>> --pgid 7.fs4 --data-path /var/lib/ceph/osd/ceph-30 --journal-path
>>> /var/lib/ceph/osd/ceph-30/journal --file /var/lib/vz/template/ssd_recov
>>> ery/osd30_7.fs4.export
>>> systemctl stop ceph-osd@31       ceph-objectstore-tool --op import
>>> --pgid 7.4s2 --data-path /var/lib/ceph/osd/ceph-31 --journal-path
>>> /var/lib/ceph/osd/ceph-31/journal --file /var/lib/vz/template/ssd_recov
>>> ery/osd31_7.4s2.export
>>> systemctl stop ceph-osd@32       ceph-objectstore-tool --op import
>>> --pgid 7.4s4 --data-path /var/lib/ceph/osd/ceph-32 --journal-path
>>> /var/lib/ceph/osd/ceph-32/journal --file /var/lib/vz/template/ssd_recov
>>> ery/osd32_7.4s4.export
>>> systemctl stop ceph-osd@32       ceph-objectstore-tool --op import
>>> --pgid 7.fs2 --data-path /var/lib/ceph/osd/ceph-32 --journal-path
>>> /var/lib/ceph/osd/ceph-32/journal --file /var/lib/vz/template/ssd_recov
>>> ery/osd32_7.fs2.export
>>> systemctl stop ceph-osd@34       ceph-objectstore-tool --op import
>>> --pgid 7.4s5 --data-path /var/lib/ceph/osd/ceph-34 --journal-path
>>> /var/lib/ceph/osd/ceph-34/journal --file /var/lib/vz/template/ssd_recov
>>> ery/osd34_7.4s5.export
>>> systemctl stop ceph-osd@34       ceph-objectstore-tool --op import
>>> --pgid 7.fs1 --data-path /var/lib/ceph/osd/ceph-34 --journal-path
>>> /var/lib/ceph/osd/ceph-34/journal --file /var/lib/vz/template/ssd_recov
>>> ery/osd34_7.fs1.export
>>>
>>>
>>> Reset permissions and then started the OSDs:
>>> for OSD in 27 30 31 32 34; do
>>>   chown -R ceph:ceph /var/lib/ceph/osd/ceph-$OSD;
>>>   systemctl start ceph-osd@$OSD;
>>> done
>>>
>>>
>>> Then finally started all the OSDs... Now to hope that Intel have a way
>>> of accessing drives that are in a 'disable logical state'.
>>>
>>>
>>>
>>> The imports succeed, herewith a link to the output after running an
>>> import for placement group 7.4s2 on OSD 31:
>>>   https://drive.google.com/open?id=1-Jo1jmrWrGLO2OgflacGPlEf2p32Y4hn
>>>
>>> Sample snippet:
>>>     Write 1#7:fffcd2ec:::rbd_data.4.be8e9974b0dc51.0000000000002869:he
>>> ad#
>>>     snapset 0=[]:{}
>>>     Write 1#7:fffd4823:::rbd_data.4.ba24ef2ae8944a.000000000000a2b0:he
>>> ad#
>>>     snapset 0=[]:{}
>>>     Write 1#7:fffd6fb6:::benchmark_data_kvm5b_20945_object14722:head#
>>>     snapset 0=[]:{}
>>>     Write 1#7:ffffa069:::rbd_data.4.ba24ef2ae8944a.000000000000aea9:he
>>> ad#
>>>     snapset 0=[]:{}
>>>     Import successful
>>>
>>>
>>> Data does get written, I can tell by the size of the FileStore mount
>>> points:
>>>   [root@kvm5b ssd_recovery]# df -h | grep -P 'ceph-(27|30|31|32|34)$'
>>>   /dev/sdd4       140G  5.2G  135G   4% /var/lib/ceph/osd/ceph-27
>>>   /dev/sdd7       140G   14G  127G  10% /var/lib/ceph/osd/ceph-30
>>>   /dev/sdd8       140G   14G  127G  10% /var/lib/ceph/osd/ceph-31
>>>   /dev/sdd9       140G   22G  119G  16% /var/lib/ceph/osd/ceph-32
>>>   /dev/sdd11      140G   22G  119G  16% /var/lib/ceph/osd/ceph-34
>>>
>>>
>>> How do I tell Ceph to read these object shards?
>>>
>>>
>>>
>>> PS: It's probably a good idea to reweight the OSDs to 0 before starting
>>> again. This should prevent data flowing on to them, if they are not in a
>>> different device class or other crush selection ruleset. Ie:
>>>   for OSD in `seq 24 35`; do
>>>     ceph osd crush reweight osd.$OSD 0;
>>>   done
>>>
>>>
>>> Regards
>>> David Herselman
>>>
>>> -----Original Message-----
>>>
>>> From: David Herselman
>>> Sent: Thursday, 21 December 2017 3:49 AM
>>> To: 'Christian Balzer' <ch...@gol.com>; ceph-users@lists.ceph.com
>>> Subject: RE: [ceph-users] Many concurrent drive failures - How do I
>>> activate pgs?
>>>
>>> Hi Christian,
>>>
>>> Thanks for taking the time, I haven't been contacted by anyone yet but
>>> managed to get the down placement groups cleared by exporting 7.4s0 and
>>> 7.fs0 and then marking them as complete on the surviving OSDs:
>>>     kvm5c:
>>>       ceph-objectstore-tool --op export --pgid 7.4s0 --data-path
>>> /var/lib/ceph/osd/ceph-8 --journal-path /var/lib/ceph/osd/ceph-8/journal
>>> --file /var/lib/vz/template/ssd_recovery/osd8_7.4s0.export;
>>>       ceph-objectstore-tool --op mark-complete --data-path
>>> /var/lib/ceph/osd/ceph-8 --journal-path /var/lib/ceph/osd/ceph-8/journal
>>> --pgid 7.4s0;
>>>     kvm5f:
>>>       ceph-objectstore-tool --op export --pgid 7.fs0 --data-path
>>> /var/lib/ceph/osd/ceph-23 --journal-path /var/lib/ceph/osd/ceph-23/journal
>>> --file /var/lib/vz/template/ssd_recovery/osd23_7.fs0.export;
>>>       ceph-objectstore-tool --op mark-complete --data-path
>>> /var/lib/ceph/osd/ceph-23 --journal-path /var/lib/ceph/osd/ceph-23/journal
>>> --pgid 7.fs0;
>>>
>>> This would presumably simply punch holes in the RBD images but at least
>>> we can copy them out of that pool and hope that Intel can somehow unlock
>>> the drives for us to then export/import objects.
>>>
>>>
>>> To answer your questions though, we have 6 near identical Intel Wildcat
>>> Pass 1U servers and have Proxmox loaded on them. Proxmox uses a Debian 9
>>> base with the Ubuntu kernel, for which they apply cherry picked kernel
>>> patches (eg Intel NIC driver updates, vhost perf regression and mem-leak
>>> fixes, etc):
>>>
>>> kvm5a:
>>>        Intel R1208WTTGSR System (serial: BQWS55091014)
>>>        Intel S2600WTTR Motherboard (serial: BQWL54950385, BIOS ID:
>>> SE5C610.86B.01.01.0021.032120170601)
>>>        2 x Intel Xeon E5-2640v4 2.4GHz (HT disabled)
>>>        24 x Micron 8GB DDR4 2133MHz (24 x 18ASF1G72PZ-2G1B1)
>>>        Intel AXX10GBNIA I/O Module
>>> kvm5b:
>>>        Intel R1208WTTGS System (serial: BQWS53890178)
>>>        Intel S2600WTT Motherboard (serial: BQWL52550359, BIOS ID:
>>> SE5C610.86B.01.01.0021.032120170601)
>>>        2 x Intel Xeon E5-2640v4 2.4GHz (HT enabled)
>>>        4 x Micron 64GB DDR4 2400MHz LR-DIMM (4 x 72ASS8G72LZ-2G3B2)
>>>        Intel AXX10GBNIA I/O Module
>>> kvm5c:
>>>        Intel R1208WT2GS System (serial: BQWS50490279)
>>>        Intel S2600WT2 Motherboard (serial: BQWL44650203, BIOS ID:
>>> SE5C610.86B.01.01.0021.032120170601)
>>>        2 x Intel Xeon E5-2640v3 2.6GHz (HT enabled)
>>>        4 x Micron 64GB DDR4 2400MHz LR-DIMM (4 x 72ASS8G72LZ-2G3B2)
>>>        Intel AXX10GBNIA I/O Module
>>> kvm5d:
>>>        Intel R1208WTTGSR System (serial: BQWS62291318)
>>>        Intel S2600WTTR Motherboard (serial: BQWL61855187, BIOS ID:
>>> SE5C610.86B.01.01.0021.032120170601)
>>>        2 x Intel Xeon E5-2640v4 2.4GHz (HT enabled)
>>>        4 x Micron 64GB DDR4 2400MHz LR-DIMM (4 x 72ASS8G72LZ-2G3B2)
>>>        Intel AXX10GBNIA I/O Module
>>> kvm5e:
>>>        Intel R1208WTTGSR System (serial: BQWS64290162)
>>>        Intel S2600WTTR Motherboard (serial: BQWL63953066, BIOS ID:
>>> SE5C610.86B.01.01.0021.032120170601)
>>>        2 x Intel Xeon E5-2640v4 2.4GHz (HT enabled)
>>>        4 x Micron 64GB DDR4 2400MHz LR-DIMM (4 x 72ASS8G72LZ-2G3B2)
>>>        Intel AXX10GBNIA I/O Module
>>> kvm5f:
>>>        Intel R1208WTTGSR System (serial: BQWS71790632)
>>>        Intel S2600WTTR Motherboard (serial: BQWL71050622, BIOS ID:
>>> SE5C610.86B.01.01.0021.032120170601)
>>>        2 x Intel Xeon E5-2640v4 2.4GHz (HT enabled)
>>>        4 x Micron 64GB DDR4 2400MHz LR-DIMM (4 x 72ASS8G72LZ-2G3B2)
>>>        Intel AXX10GBNIA I/O Module
>>>
>>> Summary:
>>>   * 5b has an Intel S2600WTT, 5c has an Intel S2600WT2, all others have
>>> S2600WTTR Motherboards
>>>   * 5a has ECC Registered Dual Rank DDR DIMMs, all others have ECC
>>> LoadReduced-DIMMs
>>>   * 5c has an Intel X540-AT2 10 GbE adapter as the on-board NICs are
>>> only 1 GbE
>>>
>>>
>>> Each system has identical discs:
>>>   * 2 x 480 GB Intel SSD DC S3610 (SSDSC2BX480G4) - partitioned as
>>> software RAID1 OS volume and Ceph FileStore journals (spinners)
>>>   * 4 x 2 TB Seagate discs (ST2000NX0243) - Ceph FileStore OSDs
>>> (journals in S3610 partitions)
>>>   * 2 x 1.9 TB Intel SSD DC S4600 (SSDSC2KG019T7) - Ceph BlueStore OSDs
>>> (problematic)
>>>
>>>
>>> Additional information:
>>>   * All drives are directly attached to the on-board AHCI SATA
>>> controllers, via the standard 2.5 inch drive chassis hot-swap bays.
>>>   * We added 12 x 1.9 TB SSD DC S4600 drives last week Thursday, 2 in
>>> each system's slots 7 & 8
>>>   * Systems have been operating with existing Intel SSD DC S3610 and 2
>>> TB Seagate discs for over a year; we added the most recent node (kvm5f) on
>>> the 23rd of November.
>>>   * 6 of the 12 Intel SSD DC S4600 drives failed in less than 100 hours.
>>>   * They work perfectly until they suddenly stop responding and are
>>> thereafter, even with us physically shutting down the server and powering
>>> it back up again, completely inaccessible. Intel diagnostic tool reports
>>> 'logically locked'.
>>>
>>>
>>> Drive failures appear random to me:
>>>     kvm5a - bay 7 - offline - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
>>> SerialNo=BTYM739208851P9DGN
>>>     kvm5a - bay 8 - online  - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
>>> SerialNo=PHYM727602TM1P9DGN
>>>     kvm5b - bay 7 - offline - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
>>> SerialNo=PHYM7276031E1P9DGN
>>>     kvm5b - bay 8 - online  - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
>>> SerialNo=BTYM7392087W1P9DGN
>>>     kvm5c - bay 7 - offline - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
>>> SerialNo=BTYM739200ZJ1P9DGN
>>>     kvm5c - bay 8 - offline - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
>>> SerialNo=BTYM7392088B1P9DGN
>>>     kvm5d - bay 7 - offline - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
>>> SerialNo=BTYM738604Y11P9DGN
>>>     kvm5d - bay 8 - online  - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
>>> SerialNo=PHYM727603181P9DGN
>>>     kvm5e - bay 7 - online  - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
>>> SerialNo=BTYM7392013B1P9DGN
>>>     kvm5e - bay 8 - offline - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
>>> SerialNo=BTYM7392087E1P9DGN
>>>     kvm5f - bay 7 - online  - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
>>> SerialNo=BTYM739208721P9DGN
>>>     kvm5f - bay 8 - online  - Model=INTEL SSDSC2KG019T7, FwRev=SCV10100,
>>> SerialNo=BTYM739208C41P9DGN
>>>
>>>
>>> Intel SSD Data Center Tool reports:
>>> C:\isdct>isdct.exe show -intelssd
>>>
>>> - Intel SSD DC S4600 Series PHYM7276031E1P9DGN -
>>>
>>> Bootloader : Property not found
>>> DevicePath : \\\\.\\PHYSICALDRIVE1
>>> DeviceStatus : Selected drive is in a disable logical state.
>>> Firmware : SCV10100
>>> FirmwareUpdateAvailable : Please contact Intel Customer Support for
>>> further assistance at the following website:
>>> http://www.intel.com/go/ssdsupport.
>>> Index : 0
>>> ModelNumber : INTEL SSDSC2KG019T7
>>> ProductFamily : Intel SSD DC S4600 Series SerialNumber :
>>> PHYM7276031E1P9DGN
>>>
>>>
>>> C:\isdct>isdct show -a -intelssd 0
>>>
>>> - Intel SSD DC S4600 Series PHYM7276031E1P9DGN -
>>>
>>> AccessibleMaxAddressSupported : True
>>> AggregationThreshold : Selected drive is in a disable logical state.
>>> AggregationTime : Selected drive is in a disable logical state.
>>> ArbitrationBurst : Selected drive is in a disable logical state.
>>> BusType : 11
>>> CoalescingDisable : Selected drive is in a disable logical state.
>>> ControllerCompatibleIDs : PCI\\VEN_8086&DEV_8C02&REV_05P
>>> CI\\VEN_8086&DEV_8C02PCI\\VEN_8086&CC_010601PCI\\VEN_8086&CC
>>> _0106PCI\\VEN_8086PCI\\CC_010601PCI\\CC_0106
>>> ControllerDescription : @mshdc.inf,%pci\\cc_010601.devicedesc%;Standard
>>> SATA AHCI Controller ControllerID : PCI\\VEN_8086&DEV_8C02&SUBSYS_
>>> 78461462&REV_05\\3&11583659&0&FA
>>> ControllerIDEMode : False
>>> ControllerManufacturer : @mshdc.inf,%ms-ahci%;Standard SATA AHCI
>>> Controller ControllerService : storahci DIPMEnabled : False DIPMSupported :
>>> False DevicePath : \\\\.\\PHYSICALDRIVE1 DeviceStatus : Selected drive
>>> is in a disable logical state.
>>> DigitalFenceSupported : False
>>> DownloadMicrocodePossible : True
>>> DriverDescription : Standard SATA AHCI Controller DriverMajorVersion :
>>> 10 DriverManufacturer : Standard SATA AHCI Controller DriverMinorVersion :
>>> 0 DriverVersion : 10.0.16299.98 DynamicMMIOEnabled : The selected drive
>>> does not support this feature.
>>> EnduranceAnalyzer : Selected drive is in a disable logical state.
>>> ErrorString : *BAD_CONTEXT_2020 F4
>>> Firmware : SCV10100
>>> FirmwareUpdateAvailable : Please contact Intel Customer Support for
>>> further assistance at the following website:
>>> http://www.intel.com/go/ssdsupport.
>>> HDD : False
>>> HighPriorityWeightArbitration : Selected drive is in a disable logical
>>> state.
>>> IEEE1667Supported : False
>>> IOCompletionQueuesRequested : Selected drive is in a disable logical
>>> state.
>>> IOSubmissionQueuesRequested : Selected drive is in a disable logical
>>> state.
>>> Index : 0
>>> Intel : True
>>> IntelGen3SATA : True
>>> IntelNVMe : False
>>> InterruptVector : Selected drive is in a disable logical state.
>>> IsDualPort : False
>>> LatencyTrackingEnabled : Selected drive is in a disable logical state.
>>> LowPriorityWeightArbitration : Selected drive is in a disable logical
>>> state.
>>> Lun : 0
>>> MaximumLBA : 3750748847
>>> MediumPriorityWeightArbitration : Selected drive is in a disable
>>> logical state.
>>> ModelNumber : INTEL SSDSC2KG019T7
>>> NVMePowerState : Selected drive is in a disable logical state.
>>> NativeMaxLBA : Selected drive is in a disable logical state.
>>> OEM : Generic
>>> OpalState : Selected drive is in a disable logical state.
>>> PLITestTimeInterval : Selected drive is in a disable logical state.
>>> PNPString : SCSI\\DISK&VEN_INTEL&PROD_SSDSC2KG019T7\\4&2BE6C224&0&010000
>>> PathID : 1
>>> PhySpeed : Selected drive is in a disable logical state.
>>> PhysicalSectorSize : Selected drive is in a disable logical state.
>>> PhysicalSize : 1920383410176
>>> PowerGovernorAveragePower : Selected drive is in a disable logical state.
>>> PowerGovernorBurstPower : Selected drive is in a disable logical state.
>>> PowerGovernorMode : Selected drive is in a disable logical state.
>>> Product : Youngsville
>>> ProductFamily : Intel SSD DC S4600 Series ProductProtocol : ATA
>>> ReadErrorRecoveryTimer : Selected drive is in a disable logical state.
>>> RemoteSecureEraseSupported : False
>>> SCSIPortNumber : 0
>>> SMARTEnabled : True
>>> SMARTHealthCriticalWarningsConfiguration : Selected drive is in a
>>> disable logical state.
>>> SMARTSelfTestSupported : True
>>> SMBusAddress : Selected drive is in a disable logical state.
>>> SSCEnabled : False
>>> SanitizeBlockEraseSupported : False
>>> SanitizeCryptoScrambleSupported : True
>>> SanitizeSupported : True
>>> SataGen1 : True
>>> SataGen2 : True
>>> SataGen3 : True
>>> SataNegotiatedSpeed : Unknown
>>> SectorSize : 512
>>> SecurityEnabled : False
>>> SecurityFrozen : False
>>> SecurityLocked : False
>>> SecuritySupported : False
>>> SerialNumber : PHYM7276031E1P9DGN
>>> TCGSupported : False
>>> TargetID : 0
>>> TempThreshold : Selected drive is in a disable logical state.
>>> TemperatureLoggingInterval : Selected drive is in a disable logical
>>> state.
>>> TimeLimitedErrorRecovery : Selected drive is in a disable logical state.
>>> TrimSize : 4
>>> TrimSupported : True
>>> VolatileWriteCacheEnabled : Selected drive is in a disable logical state.
>>> WWID : 3959312879584368077
>>> WriteAtomicityDisableNormal : Selected drive is in a disable logical
>>> state.
>>> WriteCacheEnabled : True
>>> WriteCacheReorderingStateEnabled : Selected drive is in a disable
>>> logical state.
>>> WriteCacheState : Selected drive is in a disable logical state.
>>> WriteCacheSupported : True
>>> WriteErrorRecoveryTimer : Selected drive is in a disable logical state.
>>>
>>>
>>>
>>> SMART information is inaccessible, overall status is failed. Herewith
>>> the stats from a partner disc which was still working when the others
>>> failed:
>>> Device Model:     INTEL SSDSC2KG019T7
>>> Serial Number:    PHYM727602TM1P9DGN
>>> LU WWN Device Id: 5 5cd2e4 14e1636bb
>>> Firmware Version: SCV10100
>>> User Capacity:    1,920,383,410,176 bytes [1.92 TB]
>>> Sector Sizes:     512 bytes logical, 4096 bytes physical
>>> Rotation Rate:    Solid State Device
>>> Form Factor:      2.5 inches
>>> Device is:        Not in smartctl database [for details use: -P showall]
>>> ATA Version is:   ACS-3 T13/2161-D revision 5
>>> SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 6.0 Gb/s)
>>> Local Time is:    Mon Dec 18 19:33:51 2017 SAST
>>> SMART support is: Available - device has SMART capability.
>>> SMART support is: Enabled
>>>
>>> ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE
>>> UPDATED  WHEN_FAILED RAW_VALUE
>>>   5 Reallocated_Sector_Ct   0x0032   100   100   000    Old_age
>>>  Always       -       0
>>>   9 Power_On_Hours          0x0032   100   100   000    Old_age
>>>  Always       -       98
>>> 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always
>>>      -       3
>>> 170 Unknown_Attribute       0x0033   100   100   010    Pre-fail
>>> Always       -       0
>>> 171 Unknown_Attribute       0x0032   100   100   000    Old_age
>>>  Always       -       1
>>> 172 Unknown_Attribute       0x0032   100   100   000    Old_age
>>>  Always       -       0
>>> 174 Unknown_Attribute       0x0032   100   100   000    Old_age
>>>  Always       -       0
>>> 175 Program_Fail_Count_Chip 0x0033   100   100   010    Pre-fail
>>> Always       -       17567121432
>>> 183 Runtime_Bad_Block       0x0032   100   100   000    Old_age
>>>  Always       -       0
>>> 184 End-to-End_Error        0x0033   100   100   090    Pre-fail
>>> Always       -       0
>>> 187 Reported_Uncorrect      0x0032   100   100   000    Old_age
>>>  Always       -       0
>>> 190 Airflow_Temperature_Cel 0x0022   077   076   000    Old_age
>>>  Always       -       23 (Min/Max 17/29)
>>> 192 Power-Off_Retract_Count 0x0032   100   100   000    Old_age
>>>  Always       -       0
>>> 194 Temperature_Celsius     0x0022   100   100   000    Old_age
>>>  Always       -       23
>>> 197 Current_Pending_Sector  0x0012   100   100   000    Old_age
>>>  Always       -       0
>>> 199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age
>>>  Always       -       0
>>> 225 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age
>>>  Always       -       14195
>>> 226 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age
>>>  Always       -       0
>>> 227 Unknown_SSD_Attribute   0x0032   100   100   000    Old_age
>>>  Always       -       42
>>> 228 Power-off_Retract_Count 0x0032   100   100   000    Old_age
>>>  Always       -       5905
>>> 232 Available_Reservd_Space 0x0033   100   100   010    Pre-fail
>>> Always       -       0
>>> 233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age
>>>  Always       -       0
>>> 234 Unknown_Attribute       0x0032   100   100   000    Old_age
>>>  Always       -       0
>>> 241 Total_LBAs_Written      0x0032   100   100   000    Old_age
>>>  Always       -       14195
>>> 242 Total_LBAs_Read         0x0032   100   100   000    Old_age
>>>  Always       -       10422
>>> 243 Unknown_Attribute       0x0032   100   100   000    Old_age
>>>  Always       -       41906
>>>
>>>
>>> Media wear out : 0% used
>>> LBAs written: 14195
>>> Power on hours: <100
>>> Power cycle count: once at the factory, once at our offices to check if
>>> there was newer firmware (there wasn't) and once when we restarted the node
>>> to see if it could then access a failed drive.
>>>
>>>
>>> Regards
>>> David Herselman
>>>
>>>
>>> -----Original Message-----
>>> From: Christian Balzer [mailto:ch...@gol.com]
>>> Sent: Thursday, 21 December 2017 3:24 AM
>>> To: ceph-users@lists.ceph.com
>>> Cc: David Herselman <d...@syrex.co>
>>> Subject: Re: [ceph-users] Many concurrent drive failures - How do I
>>> activate pgs?
>>>
>>> Hello,
>>>
>>> first off, I don't have anything to add to your conclusions of the
>>> current status, alas there are at least 2 folks here on the ML making a
>>> living from Ceph disaster recovery, so I hope you have been contacted
>>> already.
>>>
>>> Now once your data is safe or you have a moment, I and others here would
>>> probably be quite interested in some more details, see inline below.
>>>
>>> On Wed, 20 Dec 2017 22:25:23 +0000 David Herselman wrote:
>>>
>>> [snip]
>>> >
>>> > We've happily been running a 6 node cluster with 4 x FileStore HDDs
>>> per node (journals on SSD partitions) for over a year and recently upgraded
>>> all nodes to Debian 9, Ceph Luminous 12.2.2 and kernel 4.13.8. We ordered
>>> 12 x Intel DC S4600 SSDs which arrived last week so we added two per node
>>> on Thursday evening and brought them up as BlueStore OSDs. We had
>>> proactively updated our existing pools to reference only devices classed as
>>> 'hdd', so that we could move select images over to ssd replicated and
>>> erasure coded pools.
>>> >
>>> Could you tell us more about that cluster, as in HW, how are the SSDs
>>> connected and FW version of the controller if applicable.
>>>
>>> Kernel 4.13.8 suggests that this is a handrolled, upstream kernel.
>>> While not necessarily related I'll note that as far as Debian kernels
>>> (which are very lightly if at all patched) are concerned, nothing beyond
>>> 4.9 has been working to my satisfaction.
>>> 4.11 still worked, but 4.12 crash-reboot-looped on all my Supermicro X10
>>> machines (quite a varied selection).
>>> The current 4.13.13 backport boots on some of those machines, but still
>>> throws errors with the EDAC devices, which works fine with 4.9.
>>>
>>> 4.14 is known to happily destroy data if used with bcache and even if
>>> one doesn't use that it should give you pause.
>>>
>>> > We were pretty diligent and downloaded Intel's Firmware Update Tool
>>> and validated that each new drive had the latest available firmware before
>>> installing them in the nodes. We did numerous benchmarks on Friday and
>>> eventually moved some images over to the new storage pools. Everything was
>>> working perfectly and extensive tests on Sunday showed excellent
>>> performance. Sunday night one of the new SSDs died and Ceph replicated and
>>> redistributed data accordingly, then another failed in the early hours of
>>> Monday morning and Ceph did what it needed to.
>>> >
>>> > We had the two failed drives replaced by 11am and Ceph was up to
>>> 2/4918587 objects degraded (0.000%) when a third drive failed. At this
>>> point we updated the crush maps for the rbd_ssd and ec_ssd pools and set
>>> the device class to 'hdd', to essentially evacuate everything off the SSDs.
>>> Other SSDs then failed at 3:22pm, 4:19pm, 5:49pm and 5:50pm. We've
>>> ultimately lost half the Intel S4600 drives, which are all completely
>>> inaccessible. Our status at 11:42pm Monday night was: 1/1398478 objects
>>> unfound (0.000%) and 339/4633062 objects degraded (0.007%).
>>> >
>>> The relevant logs when and how those SSDs failed would be interesting.
>>> Was the distribution of the failed SSDs random among the cluster?
>>> Are you running smartd and did it have something to say?
>>>
>>> Completely inaccessible sounds a lot like the infamous "self-bricking"
>>> of Intel SSDs when they discover something isn't right, or they don't like
>>> the color scheme of the server inside (^.^).
>>>
>>> I'm using quite a lot of Intel SSDs and had only one "fatal" incident.
>>> A DC S3700 detected that its powercap had failed, but of course kept
>>> working fine. Until a reboot was need, when it promptly bricked itself,
>>> data inaccessible, SMART reporting barely that something was there.
>>>
>>> So one wonders what caused your SSDs to get their knickers in such a
>>> twist.
>>> Are the survivors showing any unusual signs in their SMART output?
>>>
>>> Of course what your vendor/Intel will have to say will also be of
>>> interest. ^o^
>>>
>>> Regards,
>>>
>>> Christian
>>> --
>>> Christian Balzer        Network/Systems Engineer
>>> ch...@gol.com           Rakuten Communications
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@lists.ceph.com
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>>>
>>>
>>
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Many concurrent drive failures - How do I activate pgs?

Reply via email to