Re: [ceph-users] replace dead SSD journal

2015-05-06 Thread Andrija Panic
Well, seems like they are on satellite :) On 6 May 2015 at 02:58, Matthew Monaco wrote: > On 05/05/2015 08:55 AM, Andrija Panic wrote: > > Hi, > > > > small update: > > > > in 3 months - we lost 5 out of 6 Samsung 128Gb 850 PROs (just few days in > > between of each SSD death) - cant believe it

Re: [ceph-users] replace dead SSD journal

2015-05-05 Thread Matthew Monaco
On 05/05/2015 08:55 AM, Andrija Panic wrote: > Hi, > > small update: > > in 3 months - we lost 5 out of 6 Samsung 128Gb 850 PROs (just few days in > between of each SSD death) - cant believe it - NOT due to wearing out... I > really hope we got efective series from suplier... > That's ridiculou

Re: [ceph-users] replace dead SSD journal

2015-05-05 Thread Andrija Panic
Hi, small update: in 3 months - we lost 5 out of 6 Samsung 128Gb 850 PROs (just few days in between of each SSD death) - cant believe it - NOT due to wearing out... I really hope we got efective series from suplier... Regards On 18 April 2015 at 14:24, Andrija Panic wrote: > yes I know, but t

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Andrija Panic
yes I know, but to late now, I'm afraid :) On 18 April 2015 at 14:18, Josef Johansson wrote: > Have you looked into the samsung 845 dc? They are not that expensive last > time I checked. > > /Josef > On 18 Apr 2015 13:15, "Andrija Panic" wrote: > >> might be true, yes - we had Intel 128GB (inte

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Josef Johansson
Have you looked into the samsung 845 dc? They are not that expensive last time I checked. /Josef On 18 Apr 2015 13:15, "Andrija Panic" wrote: > might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but these > have horrible random/sequetial speeds - Samsun 850 PROs are 3 times at > le

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Andrija Panic
might be true, yes - we had Intel 128GB (intel S3500 or S3700) - but these have horrible random/sequetial speeds - Samsun 850 PROs are 3 times at least faster on sequential, and more than 3 times faser on random/IOPS measures. And ofcourse modern enterprise drives = ... On 18 April 2015 at 12:

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Mark Kirkwood
Yes, it sure is - my experience with 'consumer' SSD is that they die with obscure firmware bugs (wrong capacity, zero capacity, not detected in bios anymore) rather than flash wearout. It seems that the 'enterprise' tagged drives are less inclined to suffer this fate. Regards Mark On 18/04/1

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Andrija Panic
these 2 drives, are on the regular SATA (on board)controler, and beside this, there is 12 x 4TB on the fron of the servers - normal backplane on the front. Anyway, we are going to check those dead SSDs on a pc/laptop or so,just to confirm they are really dead - but this is the way they die, not we

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Andrija Panic
heh :) yes, intresting last name :) anyway, all are the exact same age, we implememnted new CEPH nodes at exactly same time - but it's now wearing problem - the dead SSDs were siply DEAD - smartctl-a showing nothing, except 600 PB space/size :) On 18 April 2015 at 09:41, Steffen W Sørensen wrote:

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Josef Johansson
If the same chassi/chip/backplane is behind both drives and maybe other drives in the chassi have troubles,it may be a defect there as well. On 18 Apr 2015 09:42, "Steffen W Sørensen" wrote: > > > On 17/04/2015, at 21.07, Andrija Panic wrote: > > > > nahSamsun 850 PRO 128GB - dead after 3mon

Re: [ceph-users] replace dead SSD journal

2015-04-18 Thread Steffen W Sørensen
> On 17/04/2015, at 21.07, Andrija Panic wrote: > > nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing > level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc… ) Damn… but maybe your surname says it all - Don’t Panic :) But making sure same type

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Krzysztof Nowicki
Checked the SMART status. All of the Samsungs have Wear Leveling Count equal to 99 (raw values 29, 36 and 15). I'm going to have to monitor them - I could afford loosing one of them, but loosing two would mean loss of data. pt., 17 kwi 2015 o 21:22 użytkownik Josef Johansson napisał: > the massi

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Josef Johansson
the massive rebalancing does not affect the ssds in a good way either. But from what I've gatherd the pro should be fine. Massive amount of write errors in the logs? /Josef On 17 Apr 2015 21:07, "Andrija Panic" wrote: > nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... > wear

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
damn, good news for me, pssibly bad news for you :) what is wearing level (samrtctl -a /dev/sdX) - attribute near the end of the atribute list... thx On 17 April 2015 at 21:12, Krzysztof Nowicki wrote: > I have two of them in my cluster (plus one 256GB version) for about half a > year now. So f

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Krzysztof Nowicki
I have two of them in my cluster (plus one 256GB version) for about half a year now. So far so good. I'll be keeping a closer look at them. pt., 17 kwi 2015, 21:07 Andrija Panic użytkownik napisał: > nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... > wearing level is 96%, so

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
nahSamsun 850 PRO 128GB - dead after 3months - 2 of these died... wearing level is 96%, so only 4% wasted... (yes I know these are not enterprise,etc... ) On 17 April 2015 at 21:01, Josef Johansson wrote: > tough luck, hope everything comes up ok afterwards. What models on the SSD? > > /Jose

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Josef Johansson
tough luck, hope everything comes up ok afterwards. What models on the SSD? /Josef On 17 Apr 2015 20:05, "Andrija Panic" wrote: > SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are > down, and rebalancing is about finish... after which I need to fix the OSDs. > > On 17 April

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
SSD died that hosted journals for 6 OSDs - 2 x SSD died, so 12 OSDs are down, and rebalancing is about finish... after which I need to fix the OSDs. On 17 April 2015 at 19:01, Josef Johansson wrote: > Hi, > > Did 6 other OSDs go down when re-adding? > > /Josef > > On 17 Apr 2015, at 18:49, Andri

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Josef Johansson
Hi, Did 6 other OSDs go down when re-adding? /Josef > On 17 Apr 2015, at 18:49, Andrija Panic wrote: > > 12 osds down - I expect less work with removing and adding osd? > > On Apr 17, 2015 6:35 PM, "Krzysztof Nowicki" > wrote: > Why not just wipe out the

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
12 osds down - I expect less work with removing and adding osd? On Apr 17, 2015 6:35 PM, "Krzysztof Nowicki" wrote: > Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the > existing OSD UUID, copy the keyring and let it populate itself? > > pt., 17 kwi 2015 o 18:31 użytkownik An

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Krzysztof Nowicki
Why not just wipe out the OSD filesystem, run ceph-osd --mkfs with the existing OSD UUID, copy the keyring and let it populate itself? pt., 17 kwi 2015 o 18:31 użytkownik Andrija Panic napisał: > Thx guys, thats what I will be doing at the end. > > Cheers > On Apr 17, 2015 6:24 PM, "Robert LeBla

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
Thx guys, thats what I will be doing at the end. Cheers On Apr 17, 2015 6:24 PM, "Robert LeBlanc" wrote: > Delete and re-add all six OSDs. > > On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic > wrote: > >> Hi guys, >> >> I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, >> c

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Robert LeBlanc
Delete and re-add all six OSDs. On Fri, Apr 17, 2015 at 3:36 AM, Andrija Panic wrote: > Hi guys, > > I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, > ceph rebalanced etc. > > Now I have new SSD inside, and I will partition it etc - but would like to > know, how to procee

Re: [ceph-users] replace dead SSD journal

2015-04-17 Thread Steffen W Sørensen
> I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph > rebalanced etc. > > Now I have new SSD inside, and I will partition it etc - but would like to > know, how to proceed now, with the journal recreation for those 6 OSDs that > are down now. Well assuming the OSDs ar

[ceph-users] replace dead SSD journal

2015-04-17 Thread Andrija Panic
Hi guys, I have 1 SSD that hosted 6 OSD's Journals, that is dead, so 6 OSD down, ceph rebalanced etc. Now I have new SSD inside, and I will partition it etc - but would like to know, how to proceed now, with the journal recreation for those 6 OSDs that are down now. Should I flush journal (where