Hi Robert,

there was a thread named "bluefs enospc" a couple day ago where Derek shared steps to bring in a standalone DB volume and get rid of "enospc" error.


I'm currently working on a fix which hopefully will allow to recover from this failure but it might take some time before it lands to Nautilus.


Thanks,

Igor

On 3/19/2020 6:10 AM, Robert Ruge wrote:
Hi All.

Nautilus 14.2.8.

I came in this morning to find that six of my eight NVME OSD's that were 
housing the cephfs_metadata pool had mysteriously filled up and crashed 
overnight and they won't come back up. These OSD's are all single logical 
volume devices with no separate WAL or DB.
I have tried extending the LV of one of the OSD's but it can't make use of it 
and I have added a separate db volume but that didn't help.
In the meantime I have told the cluster to move cephfs_metadata back to HDD 
which it has kindly done and emptied my two live OSD's but I am left with 10 
pgs inactive.

BLUEFS_SPILLOVER BlueFS spillover detected on 6 OSD(s)
      osd.93 spilled over 521 MiB metadata from 'db' device (26 GiB used of 50 
GiB) to slow device
      osd.95 spilled over 456 MiB metadata from 'db' device (26 GiB used of 50 
GiB) to slow device
      osd.100 spilled over 2.1 GiB metadata from 'db' device (26 GiB used of 50 
GiB) to slow device
      osd.107 spilled over 782 MiB metadata from 'db' device (26 GiB used of 50 
GiB) to slow device
      osd.112 spilled over 1.3 GiB metadata from 'db' device (27 GiB used of 50 
GiB) to slow device
      osd.115 spilled over 1.4 GiB metadata from 'db' device (27 GiB used of 50 
GiB) to slow device
PG_AVAILABILITY Reduced data availability: 10 pgs inactive, 10 pgs down
     pg 2.4e is down, acting [60,6,120]
     pg 2.60 is down, acting [105,132,15]
     pg 2.61 is down, acting [8,13,112]
     pg 2.72 is down, acting [93,112,0]
     pg 2.9f is down, acting [117,1,35]
     pg 2.b9 is down, acting [95,25,6]
     pg 2.c3 is down, acting [97,139,5]
     pg 2.c6 is down, acting [95,7,127]
     pg 2.d1 is down, acting [36,107,17]
     pg 2.f4 is down, acting [23,117,138]

Can I backup and recreate an OSD on a larger volume?
Can I remove a good pg from an offline OSD to remove some space?

Ceph-bluestore-tool repair fails.
"bluefs enospc" seems to be the critical error.

So currently my cephfs is unavailable so any help would be greatly appreciated.

Regards
Robert Ruge


Important Notice: The contents of this email are intended solely for the named 
addressee and are confidential; any unauthorised use, reproduction or storage 
of the contents is expressly prohibited. If you have received this email in 
error, please delete it and any attachments immediately and advise the sender 
by return email or telephone.

Deakin University does not warrant that this email and any attachments are 
error or virus free.
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to