Are you asking to add the osd back with its data or add it back in as a fresh osd. What is your `ceph status`?
On Tue, Sep 19, 2017, 5:23 AM Gonzalo Aguilar Delgado < gagui...@aguilardelgado.com> wrote: > Hi David, > > Thank you for the great explanation of the weights, I thought that ceph > was adjusting them based on disk. But it seems it's not. > > But the problem was not that I think the node was failing because a > software bug because the disk was not full anymeans. > > /dev/sdb1 976284608 172396756 803887852 18% > /var/lib/ceph/osd/ceph-1 > > Now the question is to know if I can add again this osd safely. Is it > possible? > > Best regards, > > > > On 14/09/17 23:29, David Turner wrote: > > Your weights should more closely represent the size of the OSDs. OSD3 and > OSD6 are weighted properly, but your other 3 OSDs have the same weight even > though OSD0 is twice the size of OSD2 and OSD4. > > Your OSD weights is what I thought you were referring to when you said you > set the crush map to 1. At some point it does look like you set all of > your OSD weights to 1, which would apply to OSD1. If the OSD was too small > for that much data, it would have filled up and be too full to start. Can > you mount that disk and see how much free space is on it? > > Just so you understand what that weight is, it is how much data the > cluster is going to put on it. The default is for the weight to be the > size of the OSD in TiB (1024 based instead of TB which is 1000). If you > set the weight of a 1TB disk and a 4TB disk both to 1, then the cluster > will try and give them the same amount of data. If you set the 4TB disk to > a weight of 4, then the cluster will try to give it 4x more data than the > 1TB drive (usually what you want). > > In your case, your 926G OSD0 has a weight of 1 and your 460G OSD2 has a > weight of 1 so the cluster thinks they should each receive the same amount > of data (which it did, they each have ~275GB of data). OSD3 has a weight > of 1.36380 (its size in TiB) and OSD6 has a weight of 0.90919 and they have > basically the same %used space (17%) as opposed to the same amount of data > because the weight is based on their size. > > As long as you had enough replicas of your data in the cluster for it to > recover from you removing OSD1 such that your cluster is health_ok without > any missing objects, then there is nothing that you need off of OSD1 and > ceph recovered from the lost disk successfully. > > On Thu, Sep 14, 2017 at 4:39 PM Gonzalo Aguilar Delgado < > gagui...@aguilardelgado.com> wrote: > >> Hello, >> >> I was on a old version of ceph. And it showed a warning saying: >> >> *crush map* has straw_calc_version=*0* >> >> I rode that adjusting it will only rebalance all so admin should select >> when to do it. So I went straigth and ran: >> >> >> ceph osd crush tunables optimal >> >> >> It rebalanced as it said but then I started to have lots of pg wrong. I >> discovered that it was because my OSD1. I thought it was disk faillure so I >> added a new OSD6 and system started to rebalance. Anyway OSD was not >> starting. >> >> I thought to wipe it all. But I preferred to leave disk as it was, and >> journal intact, in case I can recover and get data from it. (See mail: >> [ceph-users] Scrub failing all the time, new inconsistencies keep >> appearing). >> >> >> So here's the information. But it has OSD1 replaced by OSD3, sorry. >> >> ID WEIGHT REWEIGHT SIZE USE AVAIL %USE VAR PGS >> 0 1.00000 1.00000 926G 271G 654G 29.34 1.10 369 >> 2 1.00000 1.00000 460G 284G 176G 61.67 2.32 395 >> 4 1.00000 1.00000 465G 151G 313G 32.64 1.23 214 >> 3 1.36380 1.00000 1396G 239G 1157G 17.13 0.64 340 >> 6 0.90919 1.00000 931G 164G 766G 17.70 0.67 210 >> TOTAL 4179G 1111G 3067G 26.60 >> MIN/MAX VAR: 0.64/2.32 STDDEV: 16.99 >> >> As I said I still have OSD1 intact so I can do whatever you need except >> readding to the cluster. Since I don't know what It will do, maybe cause >> havok. >> Best regards, >> >> >> On 14/09/17 17:12, David Turner wrote: >> >> What do you mean by "updated crush map to 1"? Can you please provide a >> copy of your crush map and `ceph osd df`? >> >> On Wed, Sep 13, 2017 at 6:39 AM Gonzalo Aguilar Delgado < >> gagui...@aguilardelgado.com> wrote: >> >>> Hi, >>> >>> I'recently updated crush map to 1 and did all relocation of the pgs. At >>> the end I found that one of the OSD is not starting. >>> >>> This is what it shows: >>> >>> >>> 2017-09-13 10:37:34.287248 7f49cbe12700 -1 *** Caught signal (Aborted) ** >>> in thread 7f49cbe12700 thread_name:filestore_sync >>> >>> ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) >>> 1: (()+0x9616ee) [0xa93c6ef6ee] >>> 2: (()+0x11390) [0x7f49d9937390] >>> 3: (gsignal()+0x38) [0x7f49d78d3428] >>> 4: (abort()+0x16a) [0x7f49d78d502a] >>> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x26b) [0xa93c7ef43b] >>> 6: (FileStore::sync_entry()+0x2bbb) [0xa93c47fcbb] >>> 7: (FileStore::SyncThread::entry()+0xd) [0xa93c4adcdd] >>> 8: (()+0x76ba) [0x7f49d992d6ba] >>> 9: (clone()+0x6d) [0x7f49d79a53dd] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> >>> --- begin dump of recent events --- >>> -3> 2017-09-13 10:37:34.253808 7f49dac6e8c0 5 osd.1 pg_epoch: 6293 >>> pg[10.8c( v 6220'575937 (4942'572901,6220'575937] local-les=6235 n=282 >>> ec=419 les/c/f 6235/6235/0 6293/6293/6290) [1,2]/[2] r=-1 lpr=0 >>> pi=6234-6292/24 crt=6220'575937 lcod 0'0 inactive NOTIFY NIBBLEWISE] exit >>> Initial 0.029683 0 0.000000 >>> -2> 2017-09-13 10:37:34.253848 7f49dac6e8c0 5 osd.1 pg_epoch: 6293 >>> pg[10.8c( v 6220'575937 (4942'572901,6220'575937] local-les=6235 n=282 >>> ec=419 les/c/f 6235/6235/0 6293/6293/6290) [1,2]/[2] r=-1 lpr=0 >>> pi=6234-6292/24 crt=6220'575937 lcod 0'0 inactive NOTIFY NIBBLEWISE] enter >>> Reset >>> -1> 2017-09-13 10:37:34.255018 7f49dac6e8c0 5 osd.1 pg_epoch: 6293 >>> pg[10.90(unlocked)] enter Initial >>> 0> 2017-09-13 10:37:34.287248 7f49cbe12700 -1 *** Caught signal >>> (Aborted) ** >>> in thread 7f49cbe12700 thread_name:filestore_sync >>> >>> ceph version 10.2.7 (50e863e0f4bc8f4b9e31156de690d765af245185) >>> 1: (()+0x9616ee) [0xa93c6ef6ee] >>> 2: (()+0x11390) [0x7f49d9937390] >>> 3: (gsignal()+0x38) [0x7f49d78d3428] >>> 4: (abort()+0x16a) [0x7f49d78d502a] >>> 5: (ceph::__ceph_assert_fail(char const*, char const*, int, char >>> const*)+0x26b) [0xa93c7ef43b] >>> 6: (FileStore::sync_entry()+0x2bbb) [0xa93c47fcbb] >>> 7: (FileStore::SyncThread::entry()+0xd) [0xa93c4adcdd] >>> 8: (()+0x76ba) [0x7f49d992d6ba] >>> 9: (clone()+0x6d) [0x7f49d79a53dd] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> >>> --- logging levels --- >>> 0/ 5 none >>> 0/ 1 lockdep >>> 0/ 1 context >>> 1/ 1 crush >>> 1/ 5 mds >>> 1/ 5 mds_balancer >>> 1/ 5 mds_locker >>> 1/ 5 mds_log >>> 1/ 5 mds_log_expire >>> 1/ 5 mds_migrator >>> 0/ 1 buffer >>> 0/ 1 timer >>> 0/ 1 filer >>> 0/ 1 striper >>> 0/ 1 objecter >>> 0/ 5 rados >>> 0/ 5 rbd >>> 0/ 5 rbd_mirror >>> 0/ 5 rbd_replay >>> 0/ 5 journaler >>> 0/ 5 objectcacher >>> 0/ 5 client >>> 0/ 5 osd >>> 0/ 5 optracker >>> 0/ 5 objclass >>> 1/ 3 filestore >>> 1/ 3 journal >>> 0/ 5 ms >>> 1/ 5 mon >>> 0/10 monc >>> 1/ 5 paxos >>> 0/ 5 tp >>> 1/ 5 auth >>> 1/ 5 crypto >>> 1/ 1 finisher >>> 1/ 5 heartbeatmap >>> 1/ 5 perfcounter >>> 1/ 5 rgw >>> 1/10 civetweb >>> 1/ 5 javaclient >>> 1/ 5 asok >>> 1/ 1 throttle >>> 0/ 0 refs >>> 1/ 5 xio >>> 1/ 5 compressor >>> 1/ 5 newstore >>> 1/ 5 bluestore >>> 1/ 5 bluefs >>> 1/ 3 bdev >>> 1/ 5 kstore >>> 4/ 5 rocksdb >>> 4/ 5 leveldb >>> 1/ 5 kinetic >>> 1/ 5 fuse >>> -2/-2 (syslog threshold) >>> -1/-1 (stderr threshold) >>> max_recent 10000 >>> max_new 1000 >>> log_file /var/log/ceph/ceph-osd.1.log >>> --- end dump of recent events --- >>> >>> >>> >>> Is there any way to recover it or should I open a bug? >>> >>> >>> Best regards >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com