[ceph-users] How to recover/mount mirrored rbd image for file recovery

2020-03-19 Thread Ml Ml
Hello,

my goal is to back up a proxmox cluster with rbd-mirror for desaster
recovery. Promoting/Demoting, etc.. works great.

But how can i access a single file on the mirrored cluster? I tried:

   root@ceph01:~# rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
--cluster backup
   /dev/nbd1

But i get:
   root@ceph01:~# fdisk -l /dev/nbd1
   fdisk: cannot open /dev/nbd1: Input/output error

dmesg shows stuff like:
   [Thu Mar 19 09:29:55 2020]  nbd1: unable to read partition table
   [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
   [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)

Here is my state:

root@ceph01:~# rbd --cluster backup mirror pool status cluster5-rbd --verbose
health: OK
images: 3 total
3 replaying

vm-106-disk-0:
  global_id:   0bc18ee1-1749-4787-a45d-01c7e946ff06
  state:   up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=2,
entry_tid=3], mirror_position=[object_number=3, tag_tid=2,
entry_tid=3], entries_behind_master=0
  last_update: 2020-03-19 09:29:17

vm-114-disk-1:
  global_id:   2219ffa9-a4e0-4f89-b352-ff30b1ffe9b9
  state:   up+replaying
  description: replaying, master_position=[object_number=390,
tag_tid=6, entry_tid=334290], mirror_position=[object_number=382,
tag_tid=6, entry_tid=328526], entries_behind_master=5764
  last_update: 2020-03-19 09:29:17

vm-115-disk-0:
  global_id:   2b0af493-14c1-4b10-b557-84928dc37dd1
  state:   up+replaying
  description: replaying, master_position=[object_number=72,
tag_tid=1, entry_tid=67796], mirror_position=[object_number=72,
tag_tid=1, entry_tid=67796], entries_behind_master=0
  last_update: 2020-03-19 09:29:17

More dmesg stuff:
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: 95 callbacks suppressed
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 0
[Thu Mar 19 09:30:02 2020] buffer_io_error: 94 callbacks suppressed
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
0, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 1
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
1, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 2
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
2, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 3
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
3, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 4
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
4, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 5
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
5, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 6
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
6, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 7
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
7, async page read

Do i have to stop the replaying or how can i mount the image on the
backup cluster?

Thanks,
Michael
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to recover/mount mirrored rbd image for file recovery

2020-03-19 Thread Eugen Block

Hi,

one workaround would be to create a protected snapshot on the primary  
image which is also mirrored, and then clone that snapshot on the  
remote site. That clone can be accessed as required.


I'm not sure if there's a way to directly access the remote image  
since it's read-only.


Regards,
Eugen


Zitat von Ml Ml :


Hello,

my goal is to back up a proxmox cluster with rbd-mirror for desaster
recovery. Promoting/Demoting, etc.. works great.

But how can i access a single file on the mirrored cluster? I tried:

   root@ceph01:~# rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
--cluster backup
   /dev/nbd1

But i get:
   root@ceph01:~# fdisk -l /dev/nbd1
   fdisk: cannot open /dev/nbd1: Input/output error

dmesg shows stuff like:
   [Thu Mar 19 09:29:55 2020]  nbd1: unable to read partition table
   [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
   [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)

Here is my state:

root@ceph01:~# rbd --cluster backup mirror pool status cluster5-rbd --verbose
health: OK
images: 3 total
3 replaying

vm-106-disk-0:
  global_id:   0bc18ee1-1749-4787-a45d-01c7e946ff06
  state:   up+replaying
  description: replaying, master_position=[object_number=3, tag_tid=2,
entry_tid=3], mirror_position=[object_number=3, tag_tid=2,
entry_tid=3], entries_behind_master=0
  last_update: 2020-03-19 09:29:17

vm-114-disk-1:
  global_id:   2219ffa9-a4e0-4f89-b352-ff30b1ffe9b9
  state:   up+replaying
  description: replaying, master_position=[object_number=390,
tag_tid=6, entry_tid=334290], mirror_position=[object_number=382,
tag_tid=6, entry_tid=328526], entries_behind_master=5764
  last_update: 2020-03-19 09:29:17

vm-115-disk-0:
  global_id:   2b0af493-14c1-4b10-b557-84928dc37dd1
  state:   up+replaying
  description: replaying, master_position=[object_number=72,
tag_tid=1, entry_tid=67796], mirror_position=[object_number=72,
tag_tid=1, entry_tid=67796], entries_behind_master=0
  last_update: 2020-03-19 09:29:17

More dmesg stuff:
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: 95 callbacks suppressed
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 0
[Thu Mar 19 09:30:02 2020] buffer_io_error: 94 callbacks suppressed
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
0, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 1
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
1, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 2
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
2, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 3
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
3, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 4
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
4, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 5
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
5, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 6
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
6, async page read
[Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
[Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 7
[Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
7, async page read

Do i have to stop the replaying or how can i mount the image on the
backup cluster?

Thanks,
Michael
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Full OSD's on cephfs_metadata pool

2020-03-19 Thread Eugen Block

Hi,

I have tried extending the LV of one of the OSD's but it can't make  
use of it and I have added a separate db volume but that didn't help.


can you tell why it can't make use of additional space? Extending LVs  
has worked for me in Nautilus. Maybe you could share the steps you  
performed?


Regards,
Eugen


Zitat von Robert Ruge :


Hi All.

Nautilus 14.2.8.

I came in this morning to find that six of my eight NVME OSD's that  
were housing the cephfs_metadata pool had mysteriously filled up and  
crashed overnight and they won't come back up. These OSD's are all  
single logical volume devices with no separate WAL or DB.
I have tried extending the LV of one of the OSD's but it can't make  
use of it and I have added a separate db volume but that didn't help.
In the meantime I have told the cluster to move cephfs_metadata back  
to HDD which it has kindly done and emptied my two live OSD's but I  
am left with 10 pgs inactive.


BLUEFS_SPILLOVER BlueFS spillover detected on 6 OSD(s)
 osd.93 spilled over 521 MiB metadata from 'db' device (26 GiB  
used of 50 GiB) to slow device
 osd.95 spilled over 456 MiB metadata from 'db' device (26 GiB  
used of 50 GiB) to slow device
 osd.100 spilled over 2.1 GiB metadata from 'db' device (26 GiB  
used of 50 GiB) to slow device
 osd.107 spilled over 782 MiB metadata from 'db' device (26 GiB  
used of 50 GiB) to slow device
 osd.112 spilled over 1.3 GiB metadata from 'db' device (27 GiB  
used of 50 GiB) to slow device
 osd.115 spilled over 1.4 GiB metadata from 'db' device (27 GiB  
used of 50 GiB) to slow device

PG_AVAILABILITY Reduced data availability: 10 pgs inactive, 10 pgs down
pg 2.4e is down, acting [60,6,120]
pg 2.60 is down, acting [105,132,15]
pg 2.61 is down, acting [8,13,112]
pg 2.72 is down, acting [93,112,0]
pg 2.9f is down, acting [117,1,35]
pg 2.b9 is down, acting [95,25,6]
pg 2.c3 is down, acting [97,139,5]
pg 2.c6 is down, acting [95,7,127]
pg 2.d1 is down, acting [36,107,17]
pg 2.f4 is down, acting [23,117,138]

Can I backup and recreate an OSD on a larger volume?
Can I remove a good pg from an offline OSD to remove some space?

Ceph-bluestore-tool repair fails.
"bluefs enospc" seems to be the critical error.

So currently my cephfs is unavailable so any help would be greatly  
appreciated.


Regards
Robert Ruge


Important Notice: The contents of this email are intended solely for  
the named addressee and are confidential; any unauthorised use,  
reproduction or storage of the contents is expressly prohibited. If  
you have received this email in error, please delete it and any  
attachments immediately and advise the sender by return email or  
telephone.


Deakin University does not warrant that this email and any  
attachments are error or virus free.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSDs continuously restarting under load

2020-03-19 Thread Igor Fedotov

Hi, Samuel,

I've never seen that sort of signal in the real life:

2020-03-18 18:39:26.426584 201e35fdb40 -1 *** Caught signal (Bus error) **


I suppose this has some hardware roots. Have you checked dmesg output?


Just in case, here is some info on "Bus Error" signal, may be it will 
provide some insight: https://en.wikipedia.org/wiki/Bus_error



Thanks,

Igor


On 3/18/2020 5:06 PM, huxia...@horebdata.cn wrote:

Hello, folks,

I am trying to add a ceph node into an existing ceph cluster. Once the reweight 
of newly-added OSD on the new node exceed 0.4 somewhere, the osd becomes 
unresponsive and restarting, eventually go down.

What could be the problem?  Any suggestion would be highly appreciated.

best regards,

samuel


root@node81:/var/log/ceph#
root@node81:/var/log/ceph#
root@node81:/var/log/ceph#
root@node81:/var/log/ceph# ceph osd df
ID CLASS  WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE VAR  PGS
12 hybrid 1.0  1.0 3.81TiB 38.3GiB 3.77TiB 0.98 1.32 316
13 hybrid 1.0  1.0 3.81TiB 37.6GiB 3.77TiB 0.96 1.29 308
14 hybrid 1.0  1.0 3.81TiB 36.9GiB 3.77TiB 0.95 1.27 301
15 hybrid 1.0  1.0 3.81TiB 37.1GiB 3.77TiB 0.95 1.28 297
  0 hybrid 1.0  1.0 3.81TiB 37.6GiB 3.77TiB 0.96 1.29 305
  1 hybrid 1.0  1.0 3.81TiB 38.2GiB 3.77TiB 0.98 1.31 309
  2 hybrid 1.0  1.0 3.81TiB 37.4GiB 3.77TiB 0.96 1.29 296
  3 hybrid 1.0  1.0 3.81TiB 37.9GiB 3.77TiB 0.97 1.30 303
  4hdd 0.2  1.0 3.42TiB 10.5GiB 3.41TiB 0.30 0.40   0
  5hdd 0.2  1.0 3.42TiB 9.63GiB 3.41TiB 0.28 0.37  87
  6hdd 0.2  1.0 3.42TiB 1.91GiB 3.42TiB 0.05 0.07   0
  7hdd 0.2  1.0 3.42TiB 11.3GiB 3.41TiB 0.32 0.43  83
16hdd 0.3  1.0 1.79TiB 16.3GiB 1.78TiB 0.89 1.19 142
  TOTAL 45.9TiB  351GiB 45.6TiB 0.75


 日志

root@node81:/var/log/ceph# cat ceph-osd.6.log | grep load_pgs
2020-03-18 18:33:57.808747 2000b556000  0 osd.6 0 load_pgs
2020-03-18 18:33:57.808763 2000b556000  0 osd.6 0 load_pgs opened 0 pgs
  -1324> 2020-03-18 18:33:57.808747 2000b556000  0 osd.6 0 load_pgs
  -1323> 2020-03-18 18:33:57.808763 2000b556000  0 osd.6 0 load_pgs opened 0 pgs
2020-03-18 18:35:04.363341 2000327  0 osd.6 5222 load_pgs
2020-03-18 18:36:15.318489 2000327  0 osd.6 5222 load_pgs opened 202 pgs
   -466> 2020-03-18 18:35:04.363341 2000327  0 osd.6 5222 load_pgs
   -465> 2020-03-18 18:36:15.318489 2000327  0 osd.6 5222 load_pgs opened 
202 pgs
2020-03-18 18:36:32.367450 2000326e000  0 osd.6 5236 load_pgs
2020-03-18 18:37:40.747347 2000326e000  0 osd.6 5236 load_pgs opened 177 pgs
   -422> 2020-03-18 18:36:32.367450 2000326e000  0 osd.6 5236 load_pgs
   -421> 2020-03-18 18:37:40.747347 2000326e000  0 osd.6 5236 load_pgs opened 
177 pgs
2020-03-18 18:37:56.579371 2000f374000  0 osd.6 5247 load_pgs
2020-03-18 18:39:03.376838 2000f374000  0 osd.6 5247 load_pgs opened 170 pgs
-67> 2020-03-18 18:37:56.579371 2000f374000  0 osd.6 5247 load_pgs
-66> 2020-03-18 18:39:03.376838 2000f374000  0 osd.6 5247 load_pgs opened 
170 pgs


2020-03-18 18:39:09.483868 201df5fdb40  0 0x201c4c90c90 4.22d unexpected need 
for 4:b47f2043:::rbd_data.8a738625558ec.56a3:head have 3291'557 
flags = none tried to add 3291'557 flags = none
2020-03-18 18:39:09.483882 201df5fdb40  0 0x201c4c90c90 4.22d unexpected need 
for 4:b47f2a18:::rbd_data.9177446e87ccd.10f8:head have 4738'731 
flags = none tried to add 4738'731 flags = none
2020-03-18 18:39:09.483896 201df5fdb40  0 0x201c4c90c90 4.22d unexpected need 
for 4:b47fc7a4:::rbd_data.58f426b8b4567.0221:head have 1789'169 
flags = delete tried to add 1789'169 flags = delete
2020-03-18 18:39:20.985370 2000fc61b40  0 -- 192.168.230.122:6806/1159687 >> 
192.168.230.11:0/3129700933 conn(0x200140cb3f0 :6806 
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: 
challenging authorizer
2020-03-18 18:39:21.495101 2000ec1fb40  0 -- 192.168.230.122:6806/1159687 >> 
192.168.230.12:0/4111063261 conn(0x200140c55a0 :6806 
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: 
challenging authorizer
2020-03-18 18:39:21.495101 2000fc61b40  0 -- 192.168.230.122:6806/1159687 >> 
192.168.230.13:0/464497787 conn(0x200140fd4b0 :6806 
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: 
challenging authorizer
2020-03-18 18:39:21.629021 2000ec1fb40  0 -- 192.168.230.122:6806/1159687 >> 
192.168.230.201:0/4088469422 conn(0x20014100b10 :6806 
s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: 
challenging authorizer
2020-03-18 18:39:26.426584 201e35fdb40 -1 *** Caught signal (Bus error) **
  in thread 201e35fdb40 thread_name:tp_osd_tp

  ceph version 12.2.7 (3ec878d1e53e1aeb47a9f619c49d9e7c0aa384d5) luminous 
(stable)
  1: (()+0x145882c) [0x2000245882c]

[ceph-users] Re: Full OSD's on cephfs_metadata pool

2020-03-19 Thread Igor Fedotov

Hi Robert,

there was a thread named "bluefs enospc" a couple day ago where Derek 
shared steps to bring in a standalone DB volume and get rid of "enospc" 
error.



I'm currently working on a fix which hopefully will allow to recover 
from this failure but it might take some time before it lands to Nautilus.



Thanks,

Igor

On 3/19/2020 6:10 AM, Robert Ruge wrote:

Hi All.

Nautilus 14.2.8.

I came in this morning to find that six of my eight NVME OSD's that were 
housing the cephfs_metadata pool had mysteriously filled up and crashed 
overnight and they won't come back up. These OSD's are all single logical 
volume devices with no separate WAL or DB.
I have tried extending the LV of one of the OSD's but it can't make use of it 
and I have added a separate db volume but that didn't help.
In the meantime I have told the cluster to move cephfs_metadata back to HDD 
which it has kindly done and emptied my two live OSD's but I am left with 10 
pgs inactive.

BLUEFS_SPILLOVER BlueFS spillover detected on 6 OSD(s)
  osd.93 spilled over 521 MiB metadata from 'db' device (26 GiB used of 50 
GiB) to slow device
  osd.95 spilled over 456 MiB metadata from 'db' device (26 GiB used of 50 
GiB) to slow device
  osd.100 spilled over 2.1 GiB metadata from 'db' device (26 GiB used of 50 
GiB) to slow device
  osd.107 spilled over 782 MiB metadata from 'db' device (26 GiB used of 50 
GiB) to slow device
  osd.112 spilled over 1.3 GiB metadata from 'db' device (27 GiB used of 50 
GiB) to slow device
  osd.115 spilled over 1.4 GiB metadata from 'db' device (27 GiB used of 50 
GiB) to slow device
PG_AVAILABILITY Reduced data availability: 10 pgs inactive, 10 pgs down
 pg 2.4e is down, acting [60,6,120]
 pg 2.60 is down, acting [105,132,15]
 pg 2.61 is down, acting [8,13,112]
 pg 2.72 is down, acting [93,112,0]
 pg 2.9f is down, acting [117,1,35]
 pg 2.b9 is down, acting [95,25,6]
 pg 2.c3 is down, acting [97,139,5]
 pg 2.c6 is down, acting [95,7,127]
 pg 2.d1 is down, acting [36,107,17]
 pg 2.f4 is down, acting [23,117,138]

Can I backup and recreate an OSD on a larger volume?
Can I remove a good pg from an offline OSD to remove some space?

Ceph-bluestore-tool repair fails.
"bluefs enospc" seems to be the critical error.

So currently my cephfs is unavailable so any help would be greatly appreciated.

Regards
Robert Ruge


Important Notice: The contents of this email are intended solely for the named 
addressee and are confidential; any unauthorised use, reproduction or storage 
of the contents is expressly prohibited. If you have received this email in 
error, please delete it and any attachments immediately and advise the sender 
by return email or telephone.

Deakin University does not warrant that this email and any attachments are 
error or virus free.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Full OSD's on cephfs_metadata pool

2020-03-19 Thread Robert Ruge
Thanks Igor. I found that thread in my mailbox a few hours into the episode and 
it saved the day. I managed to get 6 of the 8 OSD's up which was enough to get 
the 10 missing pg's online and transitioned back onto hdd.

However I also appear to have killed two of the OSD's through maybe using 
inappropriate ssd's.

There was no warning from the cluster that those OSD's  were getting full 
unless some unusual event caused them to fill overnight.

I don't have enough nvme to support this model of operation so I will need to 
live with hdd's for a bit longer.

Regards
Robert


Regards
Robert

From: Igor Fedotov 
Sent: Thursday, March 19, 2020 10:15:46 PM
To: Robert Ruge ; ceph-users@ceph.io 

Subject: Re: [ceph-users] Full OSD's on cephfs_metadata pool

Hi Robert,

there was a thread named "bluefs enospc" a couple day ago where Derek
shared steps to bring in a standalone DB volume and get rid of "enospc"
error.


I'm currently working on a fix which hopefully will allow to recover
from this failure but it might take some time before it lands to Nautilus.


Thanks,

Igor

On 3/19/2020 6:10 AM, Robert Ruge wrote:
> Hi All.
>
> Nautilus 14.2.8.
>
> I came in this morning to find that six of my eight NVME OSD's that were 
> housing the cephfs_metadata pool had mysteriously filled up and crashed 
> overnight and they won't come back up. These OSD's are all single logical 
> volume devices with no separate WAL or DB.
> I have tried extending the LV of one of the OSD's but it can't make use of it 
> and I have added a separate db volume but that didn't help.
> In the meantime I have told the cluster to move cephfs_metadata back to HDD 
> which it has kindly done and emptied my two live OSD's but I am left with 10 
> pgs inactive.
>
> BLUEFS_SPILLOVER BlueFS spillover detected on 6 OSD(s)
>   osd.93 spilled over 521 MiB metadata from 'db' device (26 GiB used of 
> 50 GiB) to slow device
>   osd.95 spilled over 456 MiB metadata from 'db' device (26 GiB used of 
> 50 GiB) to slow device
>   osd.100 spilled over 2.1 GiB metadata from 'db' device (26 GiB used of 
> 50 GiB) to slow device
>   osd.107 spilled over 782 MiB metadata from 'db' device (26 GiB used of 
> 50 GiB) to slow device
>   osd.112 spilled over 1.3 GiB metadata from 'db' device (27 GiB used of 
> 50 GiB) to slow device
>   osd.115 spilled over 1.4 GiB metadata from 'db' device (27 GiB used of 
> 50 GiB) to slow device
> PG_AVAILABILITY Reduced data availability: 10 pgs inactive, 10 pgs down
>  pg 2.4e is down, acting [60,6,120]
>  pg 2.60 is down, acting [105,132,15]
>  pg 2.61 is down, acting [8,13,112]
>  pg 2.72 is down, acting [93,112,0]
>  pg 2.9f is down, acting [117,1,35]
>  pg 2.b9 is down, acting [95,25,6]
>  pg 2.c3 is down, acting [97,139,5]
>  pg 2.c6 is down, acting [95,7,127]
>  pg 2.d1 is down, acting [36,107,17]
>  pg 2.f4 is down, acting [23,117,138]
>
> Can I backup and recreate an OSD on a larger volume?
> Can I remove a good pg from an offline OSD to remove some space?
>
> Ceph-bluestore-tool repair fails.
> "bluefs enospc" seems to be the critical error.
>
> So currently my cephfs is unavailable so any help would be greatly 
> appreciated.
>
> Regards
> Robert Ruge
>
>
> Important Notice: The contents of this email are intended solely for the 
> named addressee and are confidential; any unauthorised use, reproduction or 
> storage of the contents is expressly prohibited. If you have received this 
> email in error, please delete it and any attachments immediately and advise 
> the sender by return email or telephone.
>
> Deakin University does not warrant that this email and any attachments are 
> error or virus free.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

Important Notice: The contents of this email are intended solely for the named 
addressee and are confidential; any unauthorised use, reproduction or storage 
of the contents is expressly prohibited. If you have received this email in 
error, please delete it and any attachments immediately and advise the sender 
by return email or telephone.

Deakin University does not warrant that this email and any attachments are 
error or virus free.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: How to recover/mount mirrored rbd image for file recovery

2020-03-19 Thread Jason Dillaman
On Thu, Mar 19, 2020 at 6:19 AM Eugen Block  wrote:
>
> Hi,
>
> one workaround would be to create a protected snapshot on the primary
> image which is also mirrored, and then clone that snapshot on the
> remote site. That clone can be accessed as required.

+1. This is the correct approach. If you are using a Mimic+ cluster
(i.e. require OSD release >= Mimic), you can use skip the protect
step.

> I'm not sure if there's a way to directly access the remote image
> since it's read-only.
>
> Regards,
> Eugen
>
>
> Zitat von Ml Ml :
>
> > Hello,
> >
> > my goal is to back up a proxmox cluster with rbd-mirror for desaster
> > recovery. Promoting/Demoting, etc.. works great.
> >
> > But how can i access a single file on the mirrored cluster? I tried:
> >
> >root@ceph01:~# rbd-nbd --read-only map cluster5-rbd/vm-114-disk-1
> > --cluster backup
> >/dev/nbd1
> >
> > But i get:
> >root@ceph01:~# fdisk -l /dev/nbd1
> >fdisk: cannot open /dev/nbd1: Input/output error
> >
> > dmesg shows stuff like:
> >[Thu Mar 19 09:29:55 2020]  nbd1: unable to read partition table
> >[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> >[Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> >
> > Here is my state:
> >
> > root@ceph01:~# rbd --cluster backup mirror pool status cluster5-rbd 
> > --verbose
> > health: OK
> > images: 3 total
> > 3 replaying
> >
> > vm-106-disk-0:
> >   global_id:   0bc18ee1-1749-4787-a45d-01c7e946ff06
> >   state:   up+replaying
> >   description: replaying, master_position=[object_number=3, tag_tid=2,
> > entry_tid=3], mirror_position=[object_number=3, tag_tid=2,
> > entry_tid=3], entries_behind_master=0
> >   last_update: 2020-03-19 09:29:17
> >
> > vm-114-disk-1:
> >   global_id:   2219ffa9-a4e0-4f89-b352-ff30b1ffe9b9
> >   state:   up+replaying
> >   description: replaying, master_position=[object_number=390,
> > tag_tid=6, entry_tid=334290], mirror_position=[object_number=382,
> > tag_tid=6, entry_tid=328526], entries_behind_master=5764
> >   last_update: 2020-03-19 09:29:17
> >
> > vm-115-disk-0:
> >   global_id:   2b0af493-14c1-4b10-b557-84928dc37dd1
> >   state:   up+replaying
> >   description: replaying, master_position=[object_number=72,
> > tag_tid=1, entry_tid=67796], mirror_position=[object_number=72,
> > tag_tid=1, entry_tid=67796], entries_behind_master=0
> >   last_update: 2020-03-19 09:29:17
> >
> > More dmesg stuff:
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:29:55 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020] blk_update_request: 95 callbacks suppressed
> > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 0
> > [Thu Mar 19 09:30:02 2020] buffer_io_error: 94 callbacks suppressed
> > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > 0, async page read
> > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 1
> > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > 1, async page read
> > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 2
> > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > 2, async page read
> > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 3
> > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > 3, async page read
> > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 4
> > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > 4, async page read
> > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 5
> > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > 5, async page read
> > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020] blk_update_request: I/O error, dev nbd1, sector 6
> > [Thu Mar 19 09:30:02 2020] Buffer I/O error on dev nbd1, logical block
> > 6, async page read
> > [Thu Mar 19 09:30:02 2020] block nbd1: Other side returned error (30)
> > [Thu Mar 19 09:30:02 2020

[ceph-users] 回复: Re: OSDs continuously restarting under load

2020-03-19 Thread huxia...@horebdata.cn
Hi, Igor,

thanks for the tip. Dmesg does not say any suspicious information. 

I will investigate whether hardware has any problem or not.

best regards,

samuel





huxia...@horebdata.cn
 
发件人: Igor Fedotov
发送时间: 2020-03-19 12:07
收件人: huxia...@horebdata.cn; ceph-users; ceph-users
主题: Re: [ceph-users] OSDs continuously restarting under load
Hi, Samuel,
 
I've never seen that sort of signal in the real life:
 
2020-03-18 18:39:26.426584 201e35fdb40 -1 *** Caught signal (Bus error) **
 
 
I suppose this has some hardware roots. Have you checked dmesg output?
 
 
Just in case, here is some info on "Bus Error" signal, may be it will 
provide some insight: https://en.wikipedia.org/wiki/Bus_error
 
 
Thanks,
 
Igor
 
 
On 3/18/2020 5:06 PM, huxia...@horebdata.cn wrote:
> Hello, folks,
>
> I am trying to add a ceph node into an existing ceph cluster. Once the 
> reweight of newly-added OSD on the new node exceed 0.4 somewhere, the osd 
> becomes unresponsive and restarting, eventually go down.
>
> What could be the problem?  Any suggestion would be highly appreciated.
>
> best regards,
>
> samuel
>
> 
> root@node81:/var/log/ceph#
> root@node81:/var/log/ceph#
> root@node81:/var/log/ceph#
> root@node81:/var/log/ceph# ceph osd df
> ID CLASS  WEIGHT  REWEIGHT SIZEUSE AVAIL   %USE VAR  PGS
> 12 hybrid 1.0  1.0 3.81TiB 38.3GiB 3.77TiB 0.98 1.32 316
> 13 hybrid 1.0  1.0 3.81TiB 37.6GiB 3.77TiB 0.96 1.29 308
> 14 hybrid 1.0  1.0 3.81TiB 36.9GiB 3.77TiB 0.95 1.27 301
> 15 hybrid 1.0  1.0 3.81TiB 37.1GiB 3.77TiB 0.95 1.28 297
>   0 hybrid 1.0  1.0 3.81TiB 37.6GiB 3.77TiB 0.96 1.29 305
>   1 hybrid 1.0  1.0 3.81TiB 38.2GiB 3.77TiB 0.98 1.31 309
>   2 hybrid 1.0  1.0 3.81TiB 37.4GiB 3.77TiB 0.96 1.29 296
>   3 hybrid 1.0  1.0 3.81TiB 37.9GiB 3.77TiB 0.97 1.30 303
>   4hdd 0.2  1.0 3.42TiB 10.5GiB 3.41TiB 0.30 0.40   0
>   5hdd 0.2  1.0 3.42TiB 9.63GiB 3.41TiB 0.28 0.37  87
>   6hdd 0.2  1.0 3.42TiB 1.91GiB 3.42TiB 0.05 0.07   0
>   7hdd 0.2  1.0 3.42TiB 11.3GiB 3.41TiB 0.32 0.43  83
> 16hdd 0.3  1.0 1.79TiB 16.3GiB 1.78TiB 0.89 1.19 142
>   TOTAL 45.9TiB  351GiB 45.6TiB 0.75
>
> 
>  日志
>
> root@node81:/var/log/ceph# cat ceph-osd.6.log | grep load_pgs
> 2020-03-18 18:33:57.808747 2000b556000  0 osd.6 0 load_pgs
> 2020-03-18 18:33:57.808763 2000b556000  0 osd.6 0 load_pgs opened 0 pgs
>   -1324> 2020-03-18 18:33:57.808747 2000b556000  0 osd.6 0 load_pgs
>   -1323> 2020-03-18 18:33:57.808763 2000b556000  0 osd.6 0 load_pgs opened 0 
> pgs
> 2020-03-18 18:35:04.363341 2000327  0 osd.6 5222 load_pgs
> 2020-03-18 18:36:15.318489 2000327  0 osd.6 5222 load_pgs opened 202 pgs
>-466> 2020-03-18 18:35:04.363341 2000327  0 osd.6 5222 load_pgs
>-465> 2020-03-18 18:36:15.318489 2000327  0 osd.6 5222 load_pgs opened 
> 202 pgs
> 2020-03-18 18:36:32.367450 2000326e000  0 osd.6 5236 load_pgs
> 2020-03-18 18:37:40.747347 2000326e000  0 osd.6 5236 load_pgs opened 177 pgs
>-422> 2020-03-18 18:36:32.367450 2000326e000  0 osd.6 5236 load_pgs
>-421> 2020-03-18 18:37:40.747347 2000326e000  0 osd.6 5236 load_pgs opened 
> 177 pgs
> 2020-03-18 18:37:56.579371 2000f374000  0 osd.6 5247 load_pgs
> 2020-03-18 18:39:03.376838 2000f374000  0 osd.6 5247 load_pgs opened 170 pgs
> -67> 2020-03-18 18:37:56.579371 2000f374000  0 osd.6 5247 load_pgs
> -66> 2020-03-18 18:39:03.376838 2000f374000  0 osd.6 5247 load_pgs opened 
> 170 pgs
>
>
> 2020-03-18 18:39:09.483868 201df5fdb40  0 0x201c4c90c90 4.22d unexpected need 
> for 4:b47f2043:::rbd_data.8a738625558ec.56a3:head have 3291'557 
> flags = none tried to add 3291'557 flags = none
> 2020-03-18 18:39:09.483882 201df5fdb40  0 0x201c4c90c90 4.22d unexpected need 
> for 4:b47f2a18:::rbd_data.9177446e87ccd.10f8:head have 4738'731 
> flags = none tried to add 4738'731 flags = none
> 2020-03-18 18:39:09.483896 201df5fdb40  0 0x201c4c90c90 4.22d unexpected need 
> for 4:b47fc7a4:::rbd_data.58f426b8b4567.0221:head have 1789'169 
> flags = delete tried to add 1789'169 flags = delete
> 2020-03-18 18:39:20.985370 2000fc61b40  0 -- 192.168.230.122:6806/1159687 >> 
> 192.168.230.11:0/3129700933 conn(0x200140cb3f0 :6806 
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: 
> challenging authorizer
> 2020-03-18 18:39:21.495101 2000ec1fb40  0 -- 192.168.230.122:6806/1159687 >> 
> 192.168.230.12:0/4111063261 conn(0x200140c55a0 :6806 
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_msg: 
> challenging authorizer
> 2020-03-18 18:39:21.495101 2000fc61b40  0 -- 192.168.230.122:6806/1159687 >> 
> 192.168.230.13:0/464497787 conn(0x200140fd4b0 :6806 
> s=STATE_ACCEPTING_WAIT_CONNECT_MSG_AUTH pgs=0 cs=0 l=1).handle_connect_m

[ceph-users] Re: Full OSD's on cephfs_metadata pool

2020-03-19 Thread Derek Yarnell
Hi Robert,

Sorry to hear that this impacted you but I feel a bit better that I
wasn't alone.  Did you have a lot of log segments to trim on the MDSs
when you recovered?  I would agree that this was a very odd sudden onset
of space consumption for us.  We have usually like 600GB consumed of
around 8.5TB available NVMe space until the issue started and we were at
maximum capacity all the sudden.

I could explain this if I understood that when the MDS is behind on
trimming it lands the log segments in the metadata pool.  If we got so
far behind it could have just filled up the pool.

Thanks,
derek

On 3/19/20 7:50 AM, Robert Ruge wrote:
> Thanks Igor. I found that thread in my mailbox a few hours into the episode 
> and it saved the day. I managed to get 6 of the 8 OSD's up which was enough 
> to get the 10 missing pg's online and transitioned back onto hdd.
> 
> However I also appear to have killed two of the OSD's through maybe using 
> inappropriate ssd's.
> 
> There was no warning from the cluster that those OSD's  were getting full 
> unless some unusual event caused them to fill overnight.
> 
> I don't have enough nvme to support this model of operation so I will need to 
> live with hdd's for a bit longer.
> 
> Regards
> Robert
> 
> 
> Regards
> Robert
> 
> From: Igor Fedotov 
> Sent: Thursday, March 19, 2020 10:15:46 PM
> To: Robert Ruge ; ceph-users@ceph.io 
> 
> Subject: Re: [ceph-users] Full OSD's on cephfs_metadata pool
> 
> Hi Robert,
> 
> there was a thread named "bluefs enospc" a couple day ago where Derek
> shared steps to bring in a standalone DB volume and get rid of "enospc"
> error.
> 
> 
> I'm currently working on a fix which hopefully will allow to recover
> from this failure but it might take some time before it lands to Nautilus.
> 
> 
> Thanks,
> 
> Igor
> 
> On 3/19/2020 6:10 AM, Robert Ruge wrote:
>> Hi All.
>>
>> Nautilus 14.2.8.
>>
>> I came in this morning to find that six of my eight NVME OSD's that were 
>> housing the cephfs_metadata pool had mysteriously filled up and crashed 
>> overnight and they won't come back up. These OSD's are all single logical 
>> volume devices with no separate WAL or DB.
>> I have tried extending the LV of one of the OSD's but it can't make use of 
>> it and I have added a separate db volume but that didn't help.
>> In the meantime I have told the cluster to move cephfs_metadata back to HDD 
>> which it has kindly done and emptied my two live OSD's but I am left with 10 
>> pgs inactive.
>>
>> BLUEFS_SPILLOVER BlueFS spillover detected on 6 OSD(s)
>>   osd.93 spilled over 521 MiB metadata from 'db' device (26 GiB used of 
>> 50 GiB) to slow device
>>   osd.95 spilled over 456 MiB metadata from 'db' device (26 GiB used of 
>> 50 GiB) to slow device
>>   osd.100 spilled over 2.1 GiB metadata from 'db' device (26 GiB used of 
>> 50 GiB) to slow device
>>   osd.107 spilled over 782 MiB metadata from 'db' device (26 GiB used of 
>> 50 GiB) to slow device
>>   osd.112 spilled over 1.3 GiB metadata from 'db' device (27 GiB used of 
>> 50 GiB) to slow device
>>   osd.115 spilled over 1.4 GiB metadata from 'db' device (27 GiB used of 
>> 50 GiB) to slow device
>> PG_AVAILABILITY Reduced data availability: 10 pgs inactive, 10 pgs down
>>  pg 2.4e is down, acting [60,6,120]
>>  pg 2.60 is down, acting [105,132,15]
>>  pg 2.61 is down, acting [8,13,112]
>>  pg 2.72 is down, acting [93,112,0]
>>  pg 2.9f is down, acting [117,1,35]
>>  pg 2.b9 is down, acting [95,25,6]
>>  pg 2.c3 is down, acting [97,139,5]
>>  pg 2.c6 is down, acting [95,7,127]
>>  pg 2.d1 is down, acting [36,107,17]
>>  pg 2.f4 is down, acting [23,117,138]
>>
>> Can I backup and recreate an OSD on a larger volume?
>> Can I remove a good pg from an offline OSD to remove some space?
>>
>> Ceph-bluestore-tool repair fails.
>> "bluefs enospc" seems to be the critical error.
>>
>> So currently my cephfs is unavailable so any help would be greatly 
>> appreciated.
>>
>> Regards
>> Robert Ruge
>>
>>
>> Important Notice: The contents of this email are intended solely for the 
>> named addressee and are confidential; any unauthorised use, reproduction or 
>> storage of the contents is expressly prohibited. If you have received this 
>> email in error, please delete it and any attachments immediately and advise 
>> the sender by return email or telephone.
>>
>> Deakin University does not warrant that this email and any attachments are 
>> error or virus free.
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> Important Notice: The contents of this email are intended solely for the 
> named addressee and are confidential; any unauthorised use, reproduction or 
> storage of the contents is expressly prohibited. If you have received this 
> email in error, please delete it and any attachments immedi

[ceph-users] Re: MGRs failing once per day and generally slow response times

2020-03-19 Thread Janek Bevendorff
Sorry for nagging, but is there a solution to this? Routinely restarting
my MGRs every few hours isn't how I want to spend my time (although I
guess I could schedule a cron job for that).


On 16/03/2020 09:35, Janek Bevendorff wrote:
> Over the weekend, all five MGRs failed, which means we have no more
> Prometheus monitoring data. We are obviously monitoring the MGR status
> as well, so we can detect the failure, but it's still a pretty serious
> issue. Any ideas as to why this might happen?
>
>
> On 13/03/2020 16:56, Janek Bevendorff wrote:
>> Indeed. I just had another MGR go bye-bye. I don't think host clock
>> skew is the problem.
>>
>>
>> On 13/03/2020 15:29, Anthony D'Atri wrote:
>>> Chrony does converge faster, but I doubt this will solve your
>>> problem if you don’t have quality peers. Or if it’s not really a
>>> time problem.
>>>
 On Mar 13, 2020, at 6:44 AM, Janek Bevendorff
  wrote:

 I replaced ntpd with chronyd and will let you know if it changes
 anything. Thanks.


> On 13/03/2020 06:25, Konstantin Shalygin wrote:
>> On 3/13/20 12:57 AM, Janek Bevendorff wrote:
>> NTPd is running, all the nodes have the same time to the second.
>> I don't think that is the problem.
> As always in such cases - try to switch your ntpd to default EL7
> daemon - chronyd.
>
>
>
> k
 ___
 ceph-users mailing list -- ceph-users@ceph.io
 To unsubscribe send an email to ceph-users-le...@ceph.io
>>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Full OSD's on cephfs_metadata pool

2020-03-19 Thread Robert Ruge
Derek you are my champion. Your instructions were spot on and so timely. Thank 
you so much for posting those up for those who follow in your footsteps.

How do I tell if my MDS was behind in log trimming? I didn't see any health 
messages to that effect.

I think my NVMe OSD's were too small for this task as I was using spare 
capacity on the Optane drives which meant each OSD was only 198Gb x 8 OSD's 
which didn't leave much room for unexpected occurrences. However my current 
ceph df shows cephfs_metadata sitting at 148GiB stored x 3 so should have 
easily fitted.

May your cephfs live long and prosper.

Regards
Robert Ruge

-Original Message-
From: Derek Yarnell 
Sent: Friday, 20 March 2020 12:14 AM
To: Robert Ruge ; ceph-users@ceph.io; Igor Fedotov 

Subject: Re: [ceph-users] Re: Full OSD's on cephfs_metadata pool

Hi Robert,

Sorry to hear that this impacted you but I feel a bit better that I wasn't 
alone.  Did you have a lot of log segments to trim on the MDSs when you 
recovered?  I would agree that this was a very odd sudden onset of space 
consumption for us.  We have usually like 600GB consumed of around 8.5TB 
available NVMe space until the issue started and we were at maximum capacity 
all the sudden.

I could explain this if I understood that when the MDS is behind on trimming it 
lands the log segments in the metadata pool.  If we got so far behind it could 
have just filled up the pool.

Thanks,
derek

On 3/19/20 7:50 AM, Robert Ruge wrote:
> Thanks Igor. I found that thread in my mailbox a few hours into the episode 
> and it saved the day. I managed to get 6 of the 8 OSD's up which was enough 
> to get the 10 missing pg's online and transitioned back onto hdd.
>
> However I also appear to have killed two of the OSD's through maybe using 
> inappropriate ssd's.
>
> There was no warning from the cluster that those OSD's  were getting full 
> unless some unusual event caused them to fill overnight.
>
> I don't have enough nvme to support this model of operation so I will need to 
> live with hdd's for a bit longer.
>
> Regards
> Robert
>
>
> Regards
> Robert
> 
> From: Igor Fedotov 
> Sent: Thursday, March 19, 2020 10:15:46 PM
> To: Robert Ruge ; ceph-users@ceph.io
> 
> Subject: Re: [ceph-users] Full OSD's on cephfs_metadata pool
>
> Hi Robert,
>
> there was a thread named "bluefs enospc" a couple day ago where Derek
> shared steps to bring in a standalone DB volume and get rid of "enospc"
> error.
>
>
> I'm currently working on a fix which hopefully will allow to recover
> from this failure but it might take some time before it lands to Nautilus.
>
>
> Thanks,
>
> Igor
>
> On 3/19/2020 6:10 AM, Robert Ruge wrote:
>> Hi All.
>>
>> Nautilus 14.2.8.
>>
>> I came in this morning to find that six of my eight NVME OSD's that were 
>> housing the cephfs_metadata pool had mysteriously filled up and crashed 
>> overnight and they won't come back up. These OSD's are all single logical 
>> volume devices with no separate WAL or DB.
>> I have tried extending the LV of one of the OSD's but it can't make use of 
>> it and I have added a separate db volume but that didn't help.
>> In the meantime I have told the cluster to move cephfs_metadata back to HDD 
>> which it has kindly done and emptied my two live OSD's but I am left with 10 
>> pgs inactive.
>>
>> BLUEFS_SPILLOVER BlueFS spillover detected on 6 OSD(s)
>>   osd.93 spilled over 521 MiB metadata from 'db' device (26 GiB used of 
>> 50 GiB) to slow device
>>   osd.95 spilled over 456 MiB metadata from 'db' device (26 GiB used of 
>> 50 GiB) to slow device
>>   osd.100 spilled over 2.1 GiB metadata from 'db' device (26 GiB used of 
>> 50 GiB) to slow device
>>   osd.107 spilled over 782 MiB metadata from 'db' device (26 GiB used of 
>> 50 GiB) to slow device
>>   osd.112 spilled over 1.3 GiB metadata from 'db' device (27 GiB used of 
>> 50 GiB) to slow device
>>   osd.115 spilled over 1.4 GiB metadata from 'db' device (27 GiB
>> used of 50 GiB) to slow device PG_AVAILABILITY Reduced data availability: 10 
>> pgs inactive, 10 pgs down
>>  pg 2.4e is down, acting [60,6,120]
>>  pg 2.60 is down, acting [105,132,15]
>>  pg 2.61 is down, acting [8,13,112]
>>  pg 2.72 is down, acting [93,112,0]
>>  pg 2.9f is down, acting [117,1,35]
>>  pg 2.b9 is down, acting [95,25,6]
>>  pg 2.c3 is down, acting [97,139,5]
>>  pg 2.c6 is down, acting [95,7,127]
>>  pg 2.d1 is down, acting [36,107,17]
>>  pg 2.f4 is down, acting [23,117,138]
>>
>> Can I backup and recreate an OSD on a larger volume?
>> Can I remove a good pg from an offline OSD to remove some space?
>>
>> Ceph-bluestore-tool repair fails.
>> "bluefs enospc" seems to be the critical error.
>>
>> So currently my cephfs is unavailable so any help would be greatly 
>> appreciated.
>>
>> Regards
>> Robert Ruge
>>
>>
>> Important Notice: The contents of this email are intended solely for the 
>> named addre