[ceph-users] How to recover/mount mirrored rbd image for file recovery
Hello, my goal is to back up a proxmox cluster with rbd-mirror for desaster recovery. Promoting/Demoting, etc.. works great. But how can i access a single file on the mirrored cluster? I tried: root@ceph01:~# rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1 --cluster backup /dev/nbd1 But i get: root@ceph01:~# fdisk -l /dev/nbd1 fdisk: cannot open /dev/nbd1: Input/output error dmesg shows stuff like: [Thu Mar 19 09:29:55 2020] nbd1: unable to read partition table [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) Here is my state: root@ceph01:~# rbd --cluster backup mirror pool status cluster5-rbd --verbose health: OK images: 3 total 3 replaying vm-106-disk-0: global_id: 0bc18ee1-1749-4787-a45d-01c7e946ff06 state: up+replaying description: replaying, master_position=[object_number=3, tag_tid=2, entry_tid=3], mirror_position=[object_number=3, tag_tid=2, entry_tid=3], entries_behind_master=0 last_update: 2020-03-19 09:29:17 vm-114-disk-1: global_id: 2219ffa9-a4e0-4f89-b352-ff30b1ffe9b9 state: up+replaying description: replaying, master_position=[object_number=390, tag_tid=6, entry_tid=334290], mirror_position=[object_number=382, tag_tid=6, entry_tid=328526], entries_behind_master=5764 last_update: 2020-03-19 09:29:17 vm-115-disk-0: global_id: 2b0af493-14c1-4b10-b557-84928dc37dd1 state: up+replaying description: replaying, master_position=[object_number=72, tag_tid=1, entry_tid=67796], mirror_position=[object_number=72, tag_tid=1, entry_tid=67796], entries_behind_master=0 last_update: 2020-03-19 09:29:17 More dmesg stuff: [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: 95 callbacks suppressed [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 0 [Thu Mar 19 09:30:02 2020] buffer_io_error: 94 callbacks suppressed [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 0, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 1 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 1, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 2 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 2, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 3 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 3, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 4 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 4, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 5 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 5, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 6 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 6, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 7 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 7, async page read Do i have to stop the replaying or how can i mount the image on the backup cluster? Thanks, Michael ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to recover/mount mirrored rbd image for file recovery
Hi, one workaround would be to create a protected snapshot on the primary image which is also mirrored, and then clone that snapshot on the remote site. That clone can be accessed as required. I'm not sure if there's a way to directly access the remote image since it's read-only. Regards, Eugen Zitat von Ml Ml : Hello, my goal is to back up a proxmox cluster with rbd-mirror for desaster recovery. Promoting/Demoting, etc.. works great. But how can i access a single file on the mirrored cluster? I tried: root@ceph01:~# rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1 --cluster backup /dev/nbd1 But i get: root@ceph01:~# fdisk -l /dev/nbd1 fdisk: cannot open /dev/nbd1: Input/output error dmesg shows stuff like: [Thu Mar 19 09:29:55 2020] nbd1: unable to read partition table [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) Here is my state: root@ceph01:~# rbd --cluster backup mirror pool status cluster5-rbd --verbose health: OK images: 3 total 3 replaying vm-106-disk-0: global_id: 0bc18ee1-1749-4787-a45d-01c7e946ff06 state: up+replaying description: replaying, master_position=[object_number=3, tag_tid=2, entry_tid=3], mirror_position=[object_number=3, tag_tid=2, entry_tid=3], entries_behind_master=0 last_update: 2020-03-19 09:29:17 vm-114-disk-1: global_id: 2219ffa9-a4e0-4f89-b352-ff30b1ffe9b9 state: up+replaying description: replaying, master_position=[object_number=390, tag_tid=6, entry_tid=334290], mirror_position=[object_number=382, tag_tid=6, entry_tid=328526], entries_behind_master=5764 last_update: 2020-03-19 09:29:17 vm-115-disk-0: global_id: 2b0af493-14c1-4b10-b557-84928dc37dd1 state: up+replaying description: replaying, master_position=[object_number=72, tag_tid=1, entry_tid=67796], mirror_position=[object_number=72, tag_tid=1, entry_tid=67796], entries_behind_master=0 last_update: 2020-03-19 09:29:17 More dmesg stuff: [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: 95 callbacks suppressed [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 0 [Thu Mar 19 09:30:02 2020] buffer_io_error: 94 callbacks suppressed [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 0, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 1 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 1, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 2 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 2, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 3 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 3, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 4 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 4, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 5 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 5, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 6 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 6, async page read [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 7 [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block 7, async page read Do i have to stop the replaying or how can i mount the image on the backup cluster? Thanks, Michael ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Full OSD's on cephfs_metadata pool
Hi, I have tried extending the LV of one of the OSD's but it can't make use of it and I have added a separate db volume but that didn't help. can you tell why it can't make use of additional space? Extending LVs has worked for me in Nautilus. Maybe you could share the steps you performed? Regards, Eugen Zitat von Robert Ruge : Hi All. Nautilus 14.2.8. I came in this morning to find that six of my eight NVME OSD's that were housing the cephfs_metadata pool had mysteriously filled up and crashed overnight and they won't come back up. These OSD's are all single logical volume devices with no separate WAL or DB. I have tried extending the LV of one of the OSD's but it can't make use of it and I have added a separate db volume but that didn't help. In the meantime I have told the cluster to move cephfs_metadata back to HDD which it has kindly done and emptied my two live OSD's but I am left with 10 pgs inactive. BLUEFS_SPILLOVER BlueFS spillover detected on 6 OSD(s) osd.93 spilled over 521 MiB metadata from 'db' device (26 GiB used of 50 GiB) to slow device osd.95 spilled over 456 MiB metadata from 'db' device (26 GiB used of 50 GiB) to slow device osd.100 spilled over 2.1 GiB metadata from 'db' device (26 GiB used of 50 GiB) to slow device osd.107 spilled over 782 MiB metadata from 'db' device (26 GiB used of 50 GiB) to slow device osd.112 spilled over 1.3 GiB metadata from 'db' device (27 GiB used of 50 GiB) to slow device osd.115 spilled over 1.4 GiB metadata from 'db' device (27 GiB used of 50 GiB) to slow device PG_AVAILABILITY Reduced data availability: 10 pgs inactive, 10 pgs down pg 2.4e is down, acting [60,6,120] pg 2.60 is down, acting [105,132,15] pg 2.61 is down, acting [8,13,112] pg 2.72 is down, acting [93,112,0] pg 2.9f is down, acting [117,1,35] pg 2.b9 is down, acting [95,25,6] pg 2.c3 is down, acting [97,139,5] pg 2.c6 is down, acting [95,7,127] pg 2.d1 is down, acting [36,107,17] pg 2.f4 is down, acting [23,117,138] Can I backup and recreate an OSD on a larger volume? Can I remove a good pg from an offline OSD to remove some space? Ceph-bluestore-tool repair fails. "bluefs enospc" seems to be the critical error. So currently my cephfs is unavailable so any help would be greatly appreciated. Regards Robert Ruge Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: OSDs continuously restarting under load
Hi, Samuel, I've never seen that sort of signal in the real life: 2020-03-18 18:39:26.426584 201e35fdb40 -1 *** Caught signal (Bus error) ** I suppose this has some hardware roots. Have you checked dmesg output? Just in case, here is some info on "Bus Error" signal, may be it will provide some insight: https://en.wikipedia.org/wiki/Bus_error Thanks, Igor On 3/18/2020 5:06 PM, huxia...@horebdata.cn wrote: Hello, folks, I am trying to add a ceph node into an existing ceph cluster. Once the reweight of newly-added OSD on the new node exceed 0.4 somewhere, the osd becomes unresponsive and restarting, eventually go down. What could be the problem? Any suggestion would be highly appreciated. best regards, samuel root@node81:/var/log/ceph# root@node81:/var/log/ceph# root@node81:/var/log/ceph# root@node81:/var/log/ceph# ceph osd df ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS 12 hybrid 1.0 1.0 3.81TiB 38.3GiB 3.77TiB 0.98 1.32 316 13 hybrid 1.0 1.0 3.81TiB 37.6GiB 3.77TiB 0.96 1.29 308 14 hybrid 1.0 1.0 3.81TiB 36.9GiB 3.77TiB 0.95 1.27 301 15 hybrid 1.0 1.0 3.81TiB 37.1GiB 3.77TiB 0.95 1.28 297 0 hybrid 1.0 1.0 3.81TiB 37.6GiB 3.77TiB 0.96 1.29 305 1 hybrid 1.0 1.0 3.81TiB 38.2GiB 3.77TiB 0.98 1.31 309 2 hybrid 1.0 1.0 3.81TiB 37.4GiB 3.77TiB 0.96 1.29 296 3 hybrid 1.0 1.0 3.81TiB 37.9GiB 3.77TiB 0.97 1.30 303 4hdd 0.2 1.0 3.42TiB 10.5GiB 3.41TiB 0.30 0.40 0 5hdd 0.2 1.0 3.42TiB 9.63GiB 3.41TiB 0.28 0.37 87 6hdd 0.2 1.0 3.42TiB 1.91GiB 3.42TiB 0.05 0.07 0 7hdd 0.2 1.0 3.42TiB 11.3GiB 3.41TiB 0.32 0.43 83 16hdd 0.3 1.0 1.79TiB 16.3GiB 1.78TiB 0.89 1.19 142 TOTAL 45.9TiB 351GiB 45.6TiB 0.75 日志 root@node81:/var/log/ceph# cat ceph-osd.6.log | grep load_pgs 2020-03-18 18:33:57.808747 2000b556000 0 osd.6 0 load_pgs 2020-03-18 18:33:57.808763 2000b556000 0 osd.6 0 load_pgs opened 0 pgs -1324> 2020-03-18 18:33:57.808747 2000b556000 0 osd.6 0 load_pgs -1323> 2020-03-18 18:33:57.808763 2000b556000 0 osd.6 0 load_pgs opened 0 pgs 2020-03-18 18:35:04.363341 2000327 0 osd.6 5222 load_pgs 2020-03-18 18:36:15.318489 2000327 0 osd.6 5222 load_pgs opened 202 pgs -466> 2020-03-18 18:35:04.363341 2000327 0 osd.6 5222 load_pgs -465> 2020-03-18 18:36:15.318489 2000327 0 osd.6 5222 load_pgs opened 202 pgs 2020-03-18 18:36:32.367450 2000326e000 0 osd.6 5236 load_pgs 2020-03-18 18:37:40.747347 2000326e000 0 osd.6 5236 load_pgs opened 177 pgs -422> 2020-03-18 18:36:32.367450 2000326e000 0 osd.6 5236 load_pgs -421> 2020-03-18 18:37:40.747347 2000326e000 0 osd.6 5236 load_pgs opened 177 pgs 2020-03-18 18:37:56.579371 2000f374000 0 osd.6 5247 load_pgs 2020-03-18 18:39:03.376838 2000f374000 0 osd.6 5247 load_pgs opened 170 pgs -67> 2020-03-18 18:37:56.579371 2000f374000 0 osd.6 5247 load_pgs -66> 2020-03-18 18:39:03.376838 2000f374000 0 osd.6 5247 load_pgs opened 170 pgs 2020-03-18 18:39:09.483868 201df5fdb40 0 0x201c4c90c90 4.22d unexpected need for 4:b47f2043:::rbd_data.8a738625558ec.56a3:head have 3291'557 flags = none tried to add 3291'557 flags = none 2020-03-18 18:39:09.483882 201df5fdb40 0 0x201c4c90c90 4.22d unexpected need for 4:b47f2a18:::rbd_data.9177446e87ccd.10f8:head have 4738'731 flags = none tried to add 4738'731 flags = none 2020-03-18 18:39:09.483896 201df5fdb40 0 0x201c4c90c90 4.22d unexpected need for 4:b47fc7a4:::rbd_data.58f426b8b4567.0221:head have 1789'169 flags = delete tried to add 1789'169 flags = delete 2020-03-18 18:39:20.985370 2000fc61b40 0 -- 192.168.230.122:6806/1159687 >> 192.168.230.11:0/3129700933 conn(0x200140cb3f0 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: challenging authorizer 2020-03-18 18:39:21.495101 2000ec1fb40 0 -- 192.168.230.122:6806/1159687 >> 192.168.230.12:0/4111063261 conn(0x200140c55a0 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: challenging authorizer 2020-03-18 18:39:21.495101 2000fc61b40 0 -- 192.168.230.122:6806/1159687 >> 192.168.230.13:0/464497787 conn(0x200140fd4b0 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: challenging authorizer 2020-03-18 18:39:21.629021 2000ec1fb40 0 -- 192.168.230.122:6806/1159687 >> 192.168.230.201:0/4088469422 conn(0x20014100b10 :6806 s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: challenging authorizer 2020-03-18 18:39:26.426584 201e35fdb40 -1 *** Caught signal (Bus error) ** in thread 201e35fdb40 thread_name:tp_osd_tp ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous (stable) 1: (()+0x145882c) [0x2000245882c]
[ceph-users] Re: Full OSD's on cephfs_metadata pool
Hi Robert, there was a thread named "bluefs enospc" a couple day ago where Derek shared steps to bring in a standalone DB volume and get rid of "enospc" error. I'm currently working on a fix which hopefully will allow to recover from this failure but it might take some time before it lands to Nautilus. Thanks, Igor On 3/19/2020 6:10 AM, Robert Ruge wrote: Hi All. Nautilus 14.2.8. I came in this morning to find that six of my eight NVME OSD's that were housing the cephfs_metadata pool had mysteriously filled up and crashed overnight and they won't come back up. These OSD's are all single logical volume devices with no separate WAL or DB. I have tried extending the LV of one of the OSD's but it can't make use of it and I have added a separate db volume but that didn't help. In the meantime I have told the cluster to move cephfs_metadata back to HDD which it has kindly done and emptied my two live OSD's but I am left with 10 pgs inactive. BLUEFS_SPILLOVER BlueFS spillover detected on 6 OSD(s) osd.93 spilled over 521 MiB metadata from 'db' device (26 GiB used of 50 GiB) to slow device osd.95 spilled over 456 MiB metadata from 'db' device (26 GiB used of 50 GiB) to slow device osd.100 spilled over 2.1 GiB metadata from 'db' device (26 GiB used of 50 GiB) to slow device osd.107 spilled over 782 MiB metadata from 'db' device (26 GiB used of 50 GiB) to slow device osd.112 spilled over 1.3 GiB metadata from 'db' device (27 GiB used of 50 GiB) to slow device osd.115 spilled over 1.4 GiB metadata from 'db' device (27 GiB used of 50 GiB) to slow device PG_AVAILABILITY Reduced data availability: 10 pgs inactive, 10 pgs down pg 2.4e is down, acting [60,6,120] pg 2.60 is down, acting [105,132,15] pg 2.61 is down, acting [8,13,112] pg 2.72 is down, acting [93,112,0] pg 2.9f is down, acting [117,1,35] pg 2.b9 is down, acting [95,25,6] pg 2.c3 is down, acting [97,139,5] pg 2.c6 is down, acting [95,7,127] pg 2.d1 is down, acting [36,107,17] pg 2.f4 is down, acting [23,117,138] Can I backup and recreate an OSD on a larger volume? Can I remove a good pg from an offline OSD to remove some space? Ceph-bluestore-tool repair fails. "bluefs enospc" seems to be the critical error. So currently my cephfs is unavailable so any help would be greatly appreciated. Regards Robert Ruge Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Full OSD's on cephfs_metadata pool
Thanks Igor. I found that thread in my mailbox a few hours into the episode and it saved the day. I managed to get 6 of the 8 OSD's up which was enough to get the 10 missing pg's online and transitioned back onto hdd. However I also appear to have killed two of the OSD's through maybe using inappropriate ssd's. There was no warning from the cluster that those OSD's were getting full unless some unusual event caused them to fill overnight. I don't have enough nvme to support this model of operation so I will need to live with hdd's for a bit longer. Regards Robert Regards Robert From: Igor Fedotov Sent: Thursday, March 19, 2020 10:15:46 PM To: Robert Ruge ; ceph-users@ceph.io Subject: Re: [ceph-users] Full OSD's on cephfs_metadata pool Hi Robert, there was a thread named "bluefs enospc" a couple day ago where Derek shared steps to bring in a standalone DB volume and get rid of "enospc" error. I'm currently working on a fix which hopefully will allow to recover from this failure but it might take some time before it lands to Nautilus. Thanks, Igor On 3/19/2020 6:10 AM, Robert Ruge wrote: > Hi All. > > Nautilus 14.2.8. > > I came in this morning to find that six of my eight NVME OSD's that were > housing the cephfs_metadata pool had mysteriously filled up and crashed > overnight and they won't come back up. These OSD's are all single logical > volume devices with no separate WAL or DB. > I have tried extending the LV of one of the OSD's but it can't make use of it > and I have added a separate db volume but that didn't help. > In the meantime I have told the cluster to move cephfs_metadata back to HDD > which it has kindly done and emptied my two live OSD's but I am left with 10 > pgs inactive. > > BLUEFS_SPILLOVER BlueFS spillover detected on 6 OSD(s) > osd.93 spilled over 521 MiB metadata from 'db' device (26 GiB used of > 50 GiB) to slow device > osd.95 spilled over 456 MiB metadata from 'db' device (26 GiB used of > 50 GiB) to slow device > osd.100 spilled over 2.1 GiB metadata from 'db' device (26 GiB used of > 50 GiB) to slow device > osd.107 spilled over 782 MiB metadata from 'db' device (26 GiB used of > 50 GiB) to slow device > osd.112 spilled over 1.3 GiB metadata from 'db' device (27 GiB used of > 50 GiB) to slow device > osd.115 spilled over 1.4 GiB metadata from 'db' device (27 GiB used of > 50 GiB) to slow device > PG_AVAILABILITY Reduced data availability: 10 pgs inactive, 10 pgs down > pg 2.4e is down, acting [60,6,120] > pg 2.60 is down, acting [105,132,15] > pg 2.61 is down, acting [8,13,112] > pg 2.72 is down, acting [93,112,0] > pg 2.9f is down, acting [117,1,35] > pg 2.b9 is down, acting [95,25,6] > pg 2.c3 is down, acting [97,139,5] > pg 2.c6 is down, acting [95,7,127] > pg 2.d1 is down, acting [36,107,17] > pg 2.f4 is down, acting [23,117,138] > > Can I backup and recreate an OSD on a larger volume? > Can I remove a good pg from an offline OSD to remove some space? > > Ceph-bluestore-tool repair fails. > "bluefs enospc" seems to be the critical error. > > So currently my cephfs is unavailable so any help would be greatly > appreciated. > > Regards > Robert Ruge > > > Important Notice: The contents of this email are intended solely for the > named addressee and are confidential; any unauthorised use, reproduction or > storage of the contents is expressly prohibited. If you have received this > email in error, please delete it and any attachments immediately and advise > the sender by return email or telephone. > > Deakin University does not warrant that this email and any attachments are > error or virus free. > ___ > ceph-users mailing list -- ceph-users@ceph.io > To unsubscribe send an email to ceph-users-le...@ceph.io Important Notice: The contents of this email are intended solely for the named addressee and are confidential; any unauthorised use, reproduction or storage of the contents is expressly prohibited. If you have received this email in error, please delete it and any attachments immediately and advise the sender by return email or telephone. Deakin University does not warrant that this email and any attachments are error or virus free. ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: How to recover/mount mirrored rbd image for file recovery
On Thu, Mar 19, 2020 at 6:19 AM Eugen Block wrote: > > Hi, > > one workaround would be to create a protected snapshot on the primary > image which is also mirrored, and then clone that snapshot on the > remote site. That clone can be accessed as required. +1. This is the correct approach. If you are using a Mimic+ cluster (i.e. require OSD release >= Mimic), you can use skip the protect step. > I'm not sure if there's a way to directly access the remote image > since it's read-only. > > Regards, > Eugen > > > Zitat von Ml Ml : > > > Hello, > > > > my goal is to back up a proxmox cluster with rbd-mirror for desaster > > recovery. Promoting/Demoting, etc.. works great. > > > > But how can i access a single file on the mirrored cluster? I tried: > > > >root@ceph01:~# rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1 > > --cluster backup > >/dev/nbd1 > > > > But i get: > >root@ceph01:~# fdisk -l /dev/nbd1 > >fdisk: cannot open /dev/nbd1: Input/output error > > > > dmesg shows stuff like: > >[Thu Mar 19 09:29:55 2020] nbd1: unable to read partition table > >[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) > >[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) > > > > Here is my state: > > > > root@ceph01:~# rbd --cluster backup mirror pool status cluster5-rbd > > --verbose > > health: OK > > images: 3 total > > 3 replaying > > > > vm-106-disk-0: > > global_id: 0bc18ee1-1749-4787-a45d-01c7e946ff06 > > state: up+replaying > > description: replaying, master_position=[object_number=3, tag_tid=2, > > entry_tid=3], mirror_position=[object_number=3, tag_tid=2, > > entry_tid=3], entries_behind_master=0 > > last_update: 2020-03-19 09:29:17 > > > > vm-114-disk-1: > > global_id: 2219ffa9-a4e0-4f89-b352-ff30b1ffe9b9 > > state: up+replaying > > description: replaying, master_position=[object_number=390, > > tag_tid=6, entry_tid=334290], mirror_position=[object_number=382, > > tag_tid=6, entry_tid=328526], entries_behind_master=5764 > > last_update: 2020-03-19 09:29:17 > > > > vm-115-disk-0: > > global_id: 2b0af493-14c1-4b10-b557-84928dc37dd1 > > state: up+replaying > > description: replaying, master_position=[object_number=72, > > tag_tid=1, entry_tid=67796], mirror_position=[object_number=72, > > tag_tid=1, entry_tid=67796], entries_behind_master=0 > > last_update: 2020-03-19 09:29:17 > > > > More dmesg stuff: > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:30:02 2020] blk_update_request: 95 callbacks suppressed > > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 0 > > [Thu Mar 19 09:30:02 2020] buffer_io_error: 94 callbacks suppressed > > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block > > 0, async page read > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 1 > > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block > > 1, async page read > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 2 > > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block > > 2, async page read > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 3 > > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block > > 3, async page read > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 4 > > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block > > 4, async page read > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 5 > > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block > > 5, async page read > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 6 > > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block > > 6, async page read > > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30) > > [Thu Mar 19 09:30:02 2020
[ceph-users] 回复: Re: OSDs continuously restarting under load
Hi, Igor, thanks for the tip. Dmesg does not say any suspicious information. I will investigate whether hardware has any problem or not. best regards, samuel huxia...@horebdata.cn 发件人: Igor Fedotov 发送时间: 2020-03-19 12:07 收件人: huxia...@horebdata.cn; ceph-users; ceph-users 主题: Re: [ceph-users] OSDs continuously restarting under load Hi, Samuel, I've never seen that sort of signal in the real life: 2020-03-18 18:39:26.426584 201e35fdb40 -1 *** Caught signal (Bus error) ** I suppose this has some hardware roots. Have you checked dmesg output? Just in case, here is some info on "Bus Error" signal, may be it will provide some insight: https://en.wikipedia.org/wiki/Bus_error Thanks, Igor On 3/18/2020 5:06 PM, huxia...@horebdata.cn wrote: > Hello, folks, > > I am trying to add a ceph node into an existing ceph cluster. Once the > reweight of newly-added OSD on the new node exceed 0.4 somewhere, the osd > becomes unresponsive and restarting, eventually go down. > > What could be the problem? Any suggestion would be highly appreciated. > > best regards, > > samuel > > > root@node81:/var/log/ceph# > root@node81:/var/log/ceph# > root@node81:/var/log/ceph# > root@node81:/var/log/ceph# ceph osd df > ID CLASS WEIGHT REWEIGHT SIZEUSE AVAIL %USE VAR PGS > 12 hybrid 1.0 1.0 3.81TiB 38.3GiB 3.77TiB 0.98 1.32 316 > 13 hybrid 1.0 1.0 3.81TiB 37.6GiB 3.77TiB 0.96 1.29 308 > 14 hybrid 1.0 1.0 3.81TiB 36.9GiB 3.77TiB 0.95 1.27 301 > 15 hybrid 1.0 1.0 3.81TiB 37.1GiB 3.77TiB 0.95 1.28 297 > 0 hybrid 1.0 1.0 3.81TiB 37.6GiB 3.77TiB 0.96 1.29 305 > 1 hybrid 1.0 1.0 3.81TiB 38.2GiB 3.77TiB 0.98 1.31 309 > 2 hybrid 1.0 1.0 3.81TiB 37.4GiB 3.77TiB 0.96 1.29 296 > 3 hybrid 1.0 1.0 3.81TiB 37.9GiB 3.77TiB 0.97 1.30 303 > 4hdd 0.2 1.0 3.42TiB 10.5GiB 3.41TiB 0.30 0.40 0 > 5hdd 0.2 1.0 3.42TiB 9.63GiB 3.41TiB 0.28 0.37 87 > 6hdd 0.2 1.0 3.42TiB 1.91GiB 3.42TiB 0.05 0.07 0 > 7hdd 0.2 1.0 3.42TiB 11.3GiB 3.41TiB 0.32 0.43 83 > 16hdd 0.3 1.0 1.79TiB 16.3GiB 1.78TiB 0.89 1.19 142 > TOTAL 45.9TiB 351GiB 45.6TiB 0.75 > > > 日志 > > root@node81:/var/log/ceph# cat ceph-osd.6.log | grep load_pgs > 2020-03-18 18:33:57.808747 2000b556000 0 osd.6 0 load_pgs > 2020-03-18 18:33:57.808763 2000b556000 0 osd.6 0 load_pgs opened 0 pgs > -1324> 2020-03-18 18:33:57.808747 2000b556000 0 osd.6 0 load_pgs > -1323> 2020-03-18 18:33:57.808763 2000b556000 0 osd.6 0 load_pgs opened 0 > pgs > 2020-03-18 18:35:04.363341 2000327 0 osd.6 5222 load_pgs > 2020-03-18 18:36:15.318489 2000327 0 osd.6 5222 load_pgs opened 202 pgs >-466> 2020-03-18 18:35:04.363341 2000327 0 osd.6 5222 load_pgs >-465> 2020-03-18 18:36:15.318489 2000327 0 osd.6 5222 load_pgs opened > 202 pgs > 2020-03-18 18:36:32.367450 2000326e000 0 osd.6 5236 load_pgs > 2020-03-18 18:37:40.747347 2000326e000 0 osd.6 5236 load_pgs opened 177 pgs >-422> 2020-03-18 18:36:32.367450 2000326e000 0 osd.6 5236 load_pgs >-421> 2020-03-18 18:37:40.747347 2000326e000 0 osd.6 5236 load_pgs opened > 177 pgs > 2020-03-18 18:37:56.579371 2000f374000 0 osd.6 5247 load_pgs > 2020-03-18 18:39:03.376838 2000f374000 0 osd.6 5247 load_pgs opened 170 pgs > -67> 2020-03-18 18:37:56.579371 2000f374000 0 osd.6 5247 load_pgs > -66> 2020-03-18 18:39:03.376838 2000f374000 0 osd.6 5247 load_pgs opened > 170 pgs > > > 2020-03-18 18:39:09.483868 201df5fdb40 0 0x201c4c90c90 4.22d unexpected need > for 4:b47f2043:::rbd_data.8a738625558ec.56a3:head have 3291'557 > flags = none tried to add 3291'557 flags = none > 2020-03-18 18:39:09.483882 201df5fdb40 0 0x201c4c90c90 4.22d unexpected need > for 4:b47f2a18:::rbd_data.9177446e87ccd.10f8:head have 4738'731 > flags = none tried to add 4738'731 flags = none > 2020-03-18 18:39:09.483896 201df5fdb40 0 0x201c4c90c90 4.22d unexpected need > for 4:b47fc7a4:::rbd_data.58f426b8b4567.0221:head have 1789'169 > flags = delete tried to add 1789'169 flags = delete > 2020-03-18 18:39:20.985370 2000fc61b40 0 -- 192.168.230.122:6806/1159687 >> > 192.168.230.11:0/3129700933 conn(0x200140cb3f0 :6806 > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: > challenging authorizer > 2020-03-18 18:39:21.495101 2000ec1fb40 0 -- 192.168.230.122:6806/1159687 >> > 192.168.230.12:0/4111063261 conn(0x200140c55a0 :6806 > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: > challenging authorizer > 2020-03-18 18:39:21.495101 2000fc61b40 0 -- 192.168.230.122:6806/1159687 >> > 192.168.230.13:0/464497787 conn(0x200140fd4b0 :6806 > s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_m
[ceph-users] Re: Full OSD's on cephfs_metadata pool
Hi Robert, Sorry to hear that this impacted you but I feel a bit better that I wasn't alone. Did you have a lot of log segments to trim on the MDSs when you recovered? I would agree that this was a very odd sudden onset of space consumption for us. We have usually like 600GB consumed of around 8.5TB available NVMe space until the issue started and we were at maximum capacity all the sudden. I could explain this if I understood that when the MDS is behind on trimming it lands the log segments in the metadata pool. If we got so far behind it could have just filled up the pool. Thanks, derek On 3/19/20 7:50 AM, Robert Ruge wrote: > Thanks Igor. I found that thread in my mailbox a few hours into the episode > and it saved the day. I managed to get 6 of the 8 OSD's up which was enough > to get the 10 missing pg's online and transitioned back onto hdd. > > However I also appear to have killed two of the OSD's through maybe using > inappropriate ssd's. > > There was no warning from the cluster that those OSD's were getting full > unless some unusual event caused them to fill overnight. > > I don't have enough nvme to support this model of operation so I will need to > live with hdd's for a bit longer. > > Regards > Robert > > > Regards > Robert > > From: Igor Fedotov > Sent: Thursday, March 19, 2020 10:15:46 PM > To: Robert Ruge ; ceph-users@ceph.io > > Subject: Re: [ceph-users] Full OSD's on cephfs_metadata pool > > Hi Robert, > > there was a thread named "bluefs enospc" a couple day ago where Derek > shared steps to bring in a standalone DB volume and get rid of "enospc" > error. > > > I'm currently working on a fix which hopefully will allow to recover > from this failure but it might take some time before it lands to Nautilus. > > > Thanks, > > Igor > > On 3/19/2020 6:10 AM, Robert Ruge wrote: >> Hi All. >> >> Nautilus 14.2.8. >> >> I came in this morning to find that six of my eight NVME OSD's that were >> housing the cephfs_metadata pool had mysteriously filled up and crashed >> overnight and they won't come back up. These OSD's are all single logical >> volume devices with no separate WAL or DB. >> I have tried extending the LV of one of the OSD's but it can't make use of >> it and I have added a separate db volume but that didn't help. >> In the meantime I have told the cluster to move cephfs_metadata back to HDD >> which it has kindly done and emptied my two live OSD's but I am left with 10 >> pgs inactive. >> >> BLUEFS_SPILLOVER BlueFS spillover detected on 6 OSD(s) >> osd.93 spilled over 521 MiB metadata from 'db' device (26 GiB used of >> 50 GiB) to slow device >> osd.95 spilled over 456 MiB metadata from 'db' device (26 GiB used of >> 50 GiB) to slow device >> osd.100 spilled over 2.1 GiB metadata from 'db' device (26 GiB used of >> 50 GiB) to slow device >> osd.107 spilled over 782 MiB metadata from 'db' device (26 GiB used of >> 50 GiB) to slow device >> osd.112 spilled over 1.3 GiB metadata from 'db' device (27 GiB used of >> 50 GiB) to slow device >> osd.115 spilled over 1.4 GiB metadata from 'db' device (27 GiB used of >> 50 GiB) to slow device >> PG_AVAILABILITY Reduced data availability: 10 pgs inactive, 10 pgs down >> pg 2.4e is down, acting [60,6,120] >> pg 2.60 is down, acting [105,132,15] >> pg 2.61 is down, acting [8,13,112] >> pg 2.72 is down, acting [93,112,0] >> pg 2.9f is down, acting [117,1,35] >> pg 2.b9 is down, acting [95,25,6] >> pg 2.c3 is down, acting [97,139,5] >> pg 2.c6 is down, acting [95,7,127] >> pg 2.d1 is down, acting [36,107,17] >> pg 2.f4 is down, acting [23,117,138] >> >> Can I backup and recreate an OSD on a larger volume? >> Can I remove a good pg from an offline OSD to remove some space? >> >> Ceph-bluestore-tool repair fails. >> "bluefs enospc" seems to be the critical error. >> >> So currently my cephfs is unavailable so any help would be greatly >> appreciated. >> >> Regards >> Robert Ruge >> >> >> Important Notice: The contents of this email are intended solely for the >> named addressee and are confidential; any unauthorised use, reproduction or >> storage of the contents is expressly prohibited. If you have received this >> email in error, please delete it and any attachments immediately and advise >> the sender by return email or telephone. >> >> Deakin University does not warrant that this email and any attachments are >> error or virus free. >> ___ >> ceph-users mailing list -- ceph-users@ceph.io >> To unsubscribe send an email to ceph-users-le...@ceph.io > > Important Notice: The contents of this email are intended solely for the > named addressee and are confidential; any unauthorised use, reproduction or > storage of the contents is expressly prohibited. If you have received this > email in error, please delete it and any attachments immedi
[ceph-users] Re: MGRs failing once per day and generally slow response times
Sorry for nagging, but is there a solution to this? Routinely restarting my MGRs every few hours isn't how I want to spend my time (although I guess I could schedule a cron job for that). On 16/03/2020 09:35, Janek Bevendorff wrote: > Over the weekend, all five MGRs failed, which means we have no more > Prometheus monitoring data. We are obviously monitoring the MGR status > as well, so we can detect the failure, but it's still a pretty serious > issue. Any ideas as to why this might happen? > > > On 13/03/2020 16:56, Janek Bevendorff wrote: >> Indeed. I just had another MGR go bye-bye. I don't think host clock >> skew is the problem. >> >> >> On 13/03/2020 15:29, Anthony D'Atri wrote: >>> Chrony does converge faster, but I doubt this will solve your >>> problem if you don’t have quality peers. Or if it’s not really a >>> time problem. >>> On Mar 13, 2020, at 6:44 AM, Janek Bevendorff wrote: I replaced ntpd with chronyd and will let you know if it changes anything. Thanks. > On 13/03/2020 06:25, Konstantin Shalygin wrote: >> On 3/13/20 12:57 AM, Janek Bevendorff wrote: >> NTPd is running, all the nodes have the same time to the second. >> I don't think that is the problem. > As always in such cases - try to switch your ntpd to default EL7 > daemon - chronyd. > > > > k ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io >> ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io
[ceph-users] Re: Full OSD's on cephfs_metadata pool
Derek you are my champion. Your instructions were spot on and so timely. Thank you so much for posting those up for those who follow in your footsteps. How do I tell if my MDS was behind in log trimming? I didn't see any health messages to that effect. I think my NVMe OSD's were too small for this task as I was using spare capacity on the Optane drives which meant each OSD was only 198Gb x 8 OSD's which didn't leave much room for unexpected occurrences. However my current ceph df shows cephfs_metadata sitting at 148GiB stored x 3 so should have easily fitted. May your cephfs live long and prosper. Regards Robert Ruge -Original Message- From: Derek Yarnell Sent: Friday, 20 March 2020 12:14 AM To: Robert Ruge ; ceph-users@ceph.io; Igor Fedotov Subject: Re: [ceph-users] Re: Full OSD's on cephfs_metadata pool Hi Robert, Sorry to hear that this impacted you but I feel a bit better that I wasn't alone. Did you have a lot of log segments to trim on the MDSs when you recovered? I would agree that this was a very odd sudden onset of space consumption for us. We have usually like 600GB consumed of around 8.5TB available NVMe space until the issue started and we were at maximum capacity all the sudden. I could explain this if I understood that when the MDS is behind on trimming it lands the log segments in the metadata pool. If we got so far behind it could have just filled up the pool. Thanks, derek On 3/19/20 7:50 AM, Robert Ruge wrote: > Thanks Igor. I found that thread in my mailbox a few hours into the episode > and it saved the day. I managed to get 6 of the 8 OSD's up which was enough > to get the 10 missing pg's online and transitioned back onto hdd. > > However I also appear to have killed two of the OSD's through maybe using > inappropriate ssd's. > > There was no warning from the cluster that those OSD's were getting full > unless some unusual event caused them to fill overnight. > > I don't have enough nvme to support this model of operation so I will need to > live with hdd's for a bit longer. > > Regards > Robert > > > Regards > Robert > > From: Igor Fedotov > Sent: Thursday, March 19, 2020 10:15:46 PM > To: Robert Ruge ; ceph-users@ceph.io > > Subject: Re: [ceph-users] Full OSD's on cephfs_metadata pool > > Hi Robert, > > there was a thread named "bluefs enospc" a couple day ago where Derek > shared steps to bring in a standalone DB volume and get rid of "enospc" > error. > > > I'm currently working on a fix which hopefully will allow to recover > from this failure but it might take some time before it lands to Nautilus. > > > Thanks, > > Igor > > On 3/19/2020 6:10 AM, Robert Ruge wrote: >> Hi All. >> >> Nautilus 14.2.8. >> >> I came in this morning to find that six of my eight NVME OSD's that were >> housing the cephfs_metadata pool had mysteriously filled up and crashed >> overnight and they won't come back up. These OSD's are all single logical >> volume devices with no separate WAL or DB. >> I have tried extending the LV of one of the OSD's but it can't make use of >> it and I have added a separate db volume but that didn't help. >> In the meantime I have told the cluster to move cephfs_metadata back to HDD >> which it has kindly done and emptied my two live OSD's but I am left with 10 >> pgs inactive. >> >> BLUEFS_SPILLOVER BlueFS spillover detected on 6 OSD(s) >> osd.93 spilled over 521 MiB metadata from 'db' device (26 GiB used of >> 50 GiB) to slow device >> osd.95 spilled over 456 MiB metadata from 'db' device (26 GiB used of >> 50 GiB) to slow device >> osd.100 spilled over 2.1 GiB metadata from 'db' device (26 GiB used of >> 50 GiB) to slow device >> osd.107 spilled over 782 MiB metadata from 'db' device (26 GiB used of >> 50 GiB) to slow device >> osd.112 spilled over 1.3 GiB metadata from 'db' device (27 GiB used of >> 50 GiB) to slow device >> osd.115 spilled over 1.4 GiB metadata from 'db' device (27 GiB >> used of 50 GiB) to slow device PG_AVAILABILITY Reduced data availability: 10 >> pgs inactive, 10 pgs down >> pg 2.4e is down, acting [60,6,120] >> pg 2.60 is down, acting [105,132,15] >> pg 2.61 is down, acting [8,13,112] >> pg 2.72 is down, acting [93,112,0] >> pg 2.9f is down, acting [117,1,35] >> pg 2.b9 is down, acting [95,25,6] >> pg 2.c3 is down, acting [97,139,5] >> pg 2.c6 is down, acting [95,7,127] >> pg 2.d1 is down, acting [36,107,17] >> pg 2.f4 is down, acting [23,117,138] >> >> Can I backup and recreate an OSD on a larger volume? >> Can I remove a good pg from an offline OSD to remove some space? >> >> Ceph-bluestore-tool repair fails. >> "bluefs enospc" seems to be the critical error. >> >> So currently my cephfs is unavailable so any help would be greatly >> appreciated. >> >> Regards >> Robert Ruge >> >> >> Important Notice: The contents of this email are intended solely for the >> named addre