[ceph-users] MGR data on md RAID 1 or not

2022-02-23 Thread Roel van Meer

Hi list!

We've got a Ceph cluster where the OS of the Ceph nodes lives on a set of 
SSD disks in mdadm RAID 1. We were wondering if there are any (performance)  
benefits of moving the MGR data away from this RAID 1 and onto a dedicated  
non-RAID SSD partition. The drawback would be reduced protection against OS  
SSD failure of course, but would it also have any performance benefits?


Best regards,

Roel

--
Wij zijn ISO 27001 gecertificeerd

1A First Alternative BV
T: +31 (0)88 0016405
W: https://www.1afa.com
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] OSD SLOW_OPS is filling MONs disk space

2022-02-23 Thread Gaël THEROND
Hi everyone, I'm having a really nasty issue since around two days where
our cluster report a bunch of SLOW_OPS on one of our OSD as:

https://paste.openstack.org/show/b3DkgnJDVx05vL5o4OmY/

Here is the cluster specification:
  * Used to store Openstack related data (VMs/Snaphots/Volumes/Swift).
  * Based on CEPH Nautilus 14.2.8 installed using ceph-ansible.
  * Use an EC based storage profile.
  * We have a separate and dedicated frontend and backend 10Gbps network.
  * We don't have any network issues observed or reported by our monitoring
system.

Here is our current cluster status:
https://paste.openstack.org/show/biVnkm9Yyog3lmSUn0UK/
Here is a detailed view of our cluster status:
https://paste.openstack.org/show/bgKCSVuow0JUZITo2Ndj/

My main issue here is that this health alert is starting to fill the
Monitor's disk and so trigger a MON_DISK_BIG alert.

I'm worried as I'm having a hard time to identify which osd operation is
actually slow and especially, which image does it concern and which client
is using it.

So far I've try:
  * To match this client ID with any watcher of our stored
volumes/vms/snaphots by extracting the whole list and then using the
following command: *rbd status /*
 Unfortunately none of the watchers is matching my reported client from
the OSD on any pool.

*  * *To map this reported chunk of data to any of our store image
using:  *ceph
osd map /rbd_data.5.89a4a940aba90b.00a0*
 Unfortunately any pool name existing within our cluster give me back
an answer with no image information and a different watcher client ID.

So my questions are:

How can I identify which operation this OSD is trying to achieve as
osd_op() is a bit large ^^ ?
Does the *snapc *information part within the log relate to snapshot or is
that something totally different?
How can I identify the related images to this data chunk?
Is there official documentation about SLOW_OPS operations code explaining
how to read the logs like something that explains which block is PG
number, which is the ID of something etc?

Thanks a lot everyone and feel free to ask for additional information!
G.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

2022-02-23 Thread Eugen Block

Hi,


How can I identify which operation this OSD is trying to achieve as
osd_op() is a bit large ^^ ?


I would start by querying the OSD for historic_slow_ops:

ceph daemon osd. dump_historic_slow_ops to see which operation it is.


How can I identify the related images to this data chunk?


You could go through all rbd images and check for the line containing  
block_name_prefix, this could take some time depending on how many  
images you have:


block_name_prefix: rbd_data.ca69416b8b4567

I sometimes do that with this for loop:

for i in `rbd -p  ls`; do if [ $(rbd info /$i | grep -c  
) -gt 0 ]; then echo "image: $i"; break; fi; done


So in your case it would look something like this:

for i in `rbd -p  ls`; do if [ $(rbd info /$i | grep -c  
89a4a940aba90b -gt 0 ]; then echo "image: $i"; break; fi; done


To see which clients are connected you can check the mon daemon:

ceph daemon mon. sessions

The mon daemon also has a history of slow ops:

ceph daemon mon. dump_historic_slow_ops

Regards,
Eugen


Zitat von Gaël THEROND :


Hi everyone, I'm having a really nasty issue since around two days where
our cluster report a bunch of SLOW_OPS on one of our OSD as:

https://paste.openstack.org/show/b3DkgnJDVx05vL5o4OmY/

Here is the cluster specification:
  * Used to store Openstack related data (VMs/Snaphots/Volumes/Swift).
  * Based on CEPH Nautilus 14.2.8 installed using ceph-ansible.
  * Use an EC based storage profile.
  * We have a separate and dedicated frontend and backend 10Gbps network.
  * We don't have any network issues observed or reported by our monitoring
system.

Here is our current cluster status:
https://paste.openstack.org/show/biVnkm9Yyog3lmSUn0UK/
Here is a detailed view of our cluster status:
https://paste.openstack.org/show/bgKCSVuow0JUZITo2Ndj/

My main issue here is that this health alert is starting to fill the
Monitor's disk and so trigger a MON_DISK_BIG alert.

I'm worried as I'm having a hard time to identify which osd operation is
actually slow and especially, which image does it concern and which client
is using it.

So far I've try:
  * To match this client ID with any watcher of our stored
volumes/vms/snaphots by extracting the whole list and then using the
following command: *rbd status /*
 Unfortunately none of the watchers is matching my reported client from
the OSD on any pool.

*  * *To map this reported chunk of data to any of our store image
using:  *ceph
osd map /rbd_data.5.89a4a940aba90b.00a0*
 Unfortunately any pool name existing within our cluster give me back
an answer with no image information and a different watcher client ID.

So my questions are:

How can I identify which operation this OSD is trying to achieve as
osd_op() is a bit large ^^ ?
Does the *snapc *information part within the log relate to snapshot or is
that something totally different?
How can I identify the related images to this data chunk?
Is there official documentation about SLOW_OPS operations code explaining
how to read the logs like something that explains which block is PG
number, which is the ID of something etc?

Thanks a lot everyone and feel free to ask for additional information!
G.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unclear on metadata config for new Pacific cluster

2022-02-23 Thread Eugen Block

Hi,

if you want to have DB and WAL on the same device, just don't specify  
WAL in your drivegroup. It will be automatically created on the DB  
device, too. In your case the rotational flag should be enough to  
distinguish between data and DB.



based on the suggestion in the docs that this would be sufficient for both
DB and WAL (
https://docs.ceph.com/en/pacific/cephadm/services/osd/#the-simple-case)
ended up with metadata on the HDD data disks, as demonstrated by quite a
lot of space being consumed even with no actual data.


How exactly did you determine that there was actual WAL data on the HDDs?


Zitat von Adam Huffman :


Hello

We have a new Pacific cluster configured via Cephadm.

For the OSDs, the spec is like this, with the intention for DB and WAL to
be on NVMe:

spec:

  data_devices:

rotational: true

  db_devices:

model: SSDPE2KE032T8L

  filter_logic: AND

  objectstore: bluestore

  wal_devices:

model: SSDPE2KE032T8L

This was after an initial attempt like this:

spec:

  data_devices:

rotational: 1

  db_devices:

rotational: 0

based on the suggestion in the docs that this would be sufficient for both
DB and WAL (
https://docs.ceph.com/en/pacific/cephadm/services/osd/#the-simple-case)
ended up with metadata on the HDD data disks, as demonstrated by quite a
lot of space being consumed even with no actual data.

With the new spec, the usage looks more normal. However, it's not clear
whether both DB and WAL are in fact on the faster devices as desired.

Here's an except of one of the new OSDs:

{

"id": 107,

"arch": "x86_64",

"back_iface": "",

"bluefs": "1",

"bluefs_dedicated_db": "0",

"bluefs_dedicated_wal": "1",

"bluefs_single_shared_device": "0",

"bluefs_wal_access_mode": "blk",

"bluefs_wal_block_size": "4096",

"bluefs_wal_dev_node": "/dev/dm-40",

"bluefs_wal_devices": "nvme0n1",

"bluefs_wal_driver": "KernelDevice",

"bluefs_wal_partition_path": "/dev/dm-40",

"bluefs_wal_rotational": "0",

"bluefs_wal_size": "355622453248",

"bluefs_wal_support_discard": "1",

"bluefs_wal_type": "ssd",

"bluestore_bdev_access_mode": "blk",

"bluestore_bdev_block_size": "4096",

"bluestore_bdev_dev_node": "/dev/dm-39",

"bluestore_bdev_devices": "sdr",

"bluestore_bdev_driver": "KernelDevice",

"bluestore_bdev_partition_path": "/dev/dm-39",

"bluestore_bdev_rotational": "1",

"bluestore_bdev_size": "8001561821184",

"bluestore_bdev_support_discard": "0",

"bluestore_bdev_type": "hdd",

"ceph_release": "pacific",

"ceph_version": "ceph version 16.2.7
(dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)",

"ceph_version_short": "16.2.7",

   8<>8

"container_image": "
quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76fff41a77fa32d0b903061
",

"cpu": "AMD EPYC 7352 24-Core Processor",

"default_device_class": "hdd",

"device_ids":
"nvme0n1=SSDPE2KE032T8L_PHLN0195002R3P2BGN,sdr=LENOVO_ST8000NM010A_EX_WKD2CHZLE02930J6",

"device_paths":
"nvme0n1=/dev/disk/by-path/pci-:c1:00.0-nvme-1,sdr=/dev/disk/by-path/pci-:41:00.0-scsi-0:0:41:0",

"devices": "nvme0n1,sdr",

"distro": "centos",

"distro_description": "CentOS Stream 8",

"distro_version": "8",

8<   >8

"journal_rotational": "0",

"kernel_description": "#1 SMP Thu Feb 10 16:11:23 UTC 2022",

"kernel_version": "4.18.0-365.el8.x86_64",

"mem_swap_kb": "4194300",

"mem_total_kb": "131583928",

"network_numa_unknown_ifaces": "back_iface,front_iface",

"objectstore_numa_nodes": "0",

"objectstore_numa_unknown_devices": "sdr",

"os": "Linux",

"osd_data": "/var/lib/ceph/osd/ceph-107",

"osd_objectstore": "bluestore",

"osdspec_affinity": "dashboard-admin-1645460246886",

"rotational": "1"

}

Note:

   "bluefs_dedicated_db": "0",

   "bluefs_dedicated_wal": "1",

   "bluefs_single_shared_device": "0",

On one of our Nautilus clusters, we have:

"bluefs_single_shared_device": "1",

and the same on an Octopus cluster.

I've heard of the WAL being hosted in the DB, but not the other way
around...

Best Wishes,
Adam
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

2022-02-23 Thread Gaël THEROND
Thanks a lot Eugene, I dumbly forgot about the rbd block prefix!

I’ll try that this afternoon and told you how it went.

Le mer. 23 févr. 2022 à 11:41, Eugen Block  a écrit :

> Hi,
>
> > How can I identify which operation this OSD is trying to achieve as
> > osd_op() is a bit large ^^ ?
>
> I would start by querying the OSD for historic_slow_ops:
>
> ceph daemon osd. dump_historic_slow_ops to see which operation it is.
>
> > How can I identify the related images to this data chunk?
>
> You could go through all rbd images and check for the line containing
> block_name_prefix, this could take some time depending on how many
> images you have:
>
>  block_name_prefix: rbd_data.ca69416b8b4567
>
> I sometimes do that with this for loop:
>
> for i in `rbd -p  ls`; do if [ $(rbd info /$i | grep -c
> ) -gt 0 ]; then echo "image: $i"; break; fi; done
>
> So in your case it would look something like this:
>
> for i in `rbd -p  ls`; do if [ $(rbd info /$i | grep -c
> 89a4a940aba90b -gt 0 ]; then echo "image: $i"; break; fi; done
>
> To see which clients are connected you can check the mon daemon:
>
> ceph daemon mon. sessions
>
> The mon daemon also has a history of slow ops:
>
> ceph daemon mon. dump_historic_slow_ops
>
> Regards,
> Eugen
>
>
> Zitat von Gaël THEROND :
>
> > Hi everyone, I'm having a really nasty issue since around two days where
> > our cluster report a bunch of SLOW_OPS on one of our OSD as:
> >
> > https://paste.openstack.org/show/b3DkgnJDVx05vL5o4OmY/
> >
> > Here is the cluster specification:
> >   * Used to store Openstack related data (VMs/Snaphots/Volumes/Swift).
> >   * Based on CEPH Nautilus 14.2.8 installed using ceph-ansible.
> >   * Use an EC based storage profile.
> >   * We have a separate and dedicated frontend and backend 10Gbps network.
> >   * We don't have any network issues observed or reported by our
> monitoring
> > system.
> >
> > Here is our current cluster status:
> > https://paste.openstack.org/show/biVnkm9Yyog3lmSUn0UK/
> > Here is a detailed view of our cluster status:
> > https://paste.openstack.org/show/bgKCSVuow0JUZITo2Ndj/
> >
> > My main issue here is that this health alert is starting to fill the
> > Monitor's disk and so trigger a MON_DISK_BIG alert.
> >
> > I'm worried as I'm having a hard time to identify which osd operation is
> > actually slow and especially, which image does it concern and which
> client
> > is using it.
> >
> > So far I've try:
> >   * To match this client ID with any watcher of our stored
> > volumes/vms/snaphots by extracting the whole list and then using the
> > following command: *rbd status /*
> >  Unfortunately none of the watchers is matching my reported client
> from
> > the OSD on any pool.
> >
> > *  * *To map this reported chunk of data to any of our store image
> > using:  *ceph
> > osd map /rbd_data.5.89a4a940aba90b.00a0*
> >  Unfortunately any pool name existing within our cluster give me back
> > an answer with no image information and a different watcher client ID.
> >
> > So my questions are:
> >
> > How can I identify which operation this OSD is trying to achieve as
> > osd_op() is a bit large ^^ ?
> > Does the *snapc *information part within the log relate to snapshot or is
> > that something totally different?
> > How can I identify the related images to this data chunk?
> > Is there official documentation about SLOW_OPS operations code explaining
> > how to read the logs like something that explains which block is PG
> > number, which is the ID of something etc?
> >
> > Thanks a lot everyone and feel free to ask for additional information!
> > G.
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Unclear on metadata config for new Pacific cluster

2022-02-23 Thread Adam Huffman
On Wed, 23 Feb 2022 at 11:25, Eugen Block  wrote:

> Hi,
>
> if you want to have DB and WAL on the same device, just don't specify
> WAL in your drivegroup. It will be automatically created on the DB
> device, too. In your case the rotational flag should be enough to
> distinguish between data and DB.
>
> > based on the suggestion in the docs that this would be sufficient for
> both
> > DB and WAL (
> > https://docs.ceph.com/en/pacific/cephadm/services/osd/#the-simple-case)
> > ended up with metadata on the HDD data disks, as demonstrated by quite a
> > lot of space being consumed even with no actual data.
>
> How exactly did you determine that there was actual WAL data on the HDDs?
>
>
>
I couldn't say exactly what it was, but 7 or so TBs was in use, even with
no user data at all.

With the latest iteration, there was just a few GBs in use immediately
after creation.



> Zitat von Adam Huffman :
>
> > Hello
> >
> > We have a new Pacific cluster configured via Cephadm.
> >
> > For the OSDs, the spec is like this, with the intention for DB and WAL to
> > be on NVMe:
> >
> > spec:
> >
> >   data_devices:
> >
> > rotational: true
> >
> >   db_devices:
> >
> > model: SSDPE2KE032T8L
> >
> >   filter_logic: AND
> >
> >   objectstore: bluestore
> >
> >   wal_devices:
> >
> > model: SSDPE2KE032T8L
> >
> > This was after an initial attempt like this:
> >
> > spec:
> >
> >   data_devices:
> >
> > rotational: 1
> >
> >   db_devices:
> >
> > rotational: 0
> >
> > based on the suggestion in the docs that this would be sufficient for
> both
> > DB and WAL (
> > https://docs.ceph.com/en/pacific/cephadm/services/osd/#the-simple-case)
> > ended up with metadata on the HDD data disks, as demonstrated by quite a
> > lot of space being consumed even with no actual data.
> >
> > With the new spec, the usage looks more normal. However, it's not clear
> > whether both DB and WAL are in fact on the faster devices as desired.
> >
> > Here's an except of one of the new OSDs:
> >
> > {
> >
> > "id": 107,
> >
> > "arch": "x86_64",
> >
> > "back_iface": "",
> >
> > "bluefs": "1",
> >
> > "bluefs_dedicated_db": "0",
> >
> > "bluefs_dedicated_wal": "1",
> >
> > "bluefs_single_shared_device": "0",
> >
> > "bluefs_wal_access_mode": "blk",
> >
> > "bluefs_wal_block_size": "4096",
> >
> > "bluefs_wal_dev_node": "/dev/dm-40",
> >
> > "bluefs_wal_devices": "nvme0n1",
> >
> > "bluefs_wal_driver": "KernelDevice",
> >
> > "bluefs_wal_partition_path": "/dev/dm-40",
> >
> > "bluefs_wal_rotational": "0",
> >
> > "bluefs_wal_size": "355622453248",
> >
> > "bluefs_wal_support_discard": "1",
> >
> > "bluefs_wal_type": "ssd",
> >
> > "bluestore_bdev_access_mode": "blk",
> >
> > "bluestore_bdev_block_size": "4096",
> >
> > "bluestore_bdev_dev_node": "/dev/dm-39",
> >
> > "bluestore_bdev_devices": "sdr",
> >
> > "bluestore_bdev_driver": "KernelDevice",
> >
> > "bluestore_bdev_partition_path": "/dev/dm-39",
> >
> > "bluestore_bdev_rotational": "1",
> >
> > "bluestore_bdev_size": "8001561821184",
> >
> > "bluestore_bdev_support_discard": "0",
> >
> > "bluestore_bdev_type": "hdd",
> >
> > "ceph_release": "pacific",
> >
> > "ceph_version": "ceph version 16.2.7
> > (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)",
> >
> > "ceph_version_short": "16.2.7",
> >
> >8<>8
> >
> > "container_image": "
> >
> quay.io/ceph/ceph@sha256:a39107f8d3daab4d756eabd6ee1630d1bc7f31eaa76fff41a77fa32d0b903061
> > ",
> >
> > "cpu": "AMD EPYC 7352 24-Core Processor",
> >
> > "default_device_class": "hdd",
> >
> > "device_ids":
> >
> "nvme0n1=SSDPE2KE032T8L_PHLN0195002R3P2BGN,sdr=LENOVO_ST8000NM010A_EX_WKD2CHZLE02930J6",
> >
> > "device_paths":
> >
> "nvme0n1=/dev/disk/by-path/pci-:c1:00.0-nvme-1,sdr=/dev/disk/by-path/pci-:41:00.0-scsi-0:0:41:0",
> >
> > "devices": "nvme0n1,sdr",
> >
> > "distro": "centos",
> >
> > "distro_description": "CentOS Stream 8",
> >
> > "distro_version": "8",
> >
> > 8<   >8
> >
> > "journal_rotational": "0",
> >
> > "kernel_description": "#1 SMP Thu Feb 10 16:11:23 UTC 2022",
> >
> > "kernel_version": "4.18.0-365.el8.x86_64",
> >
> > "mem_swap_kb": "4194300",
> >
> > "mem_total_kb": "131583928",
> >
> > "network_numa_unknown_ifaces": "back_iface,front_iface",
> >
> > "objectstore_numa_nodes": "0",
> >
> > "objectstore_numa_unknown_devices": "sdr",
> >
> > "os": "Linux",
> >
> > "osd_data": "/var/lib/ceph/osd/ceph-107",
> >
> > "osd_objectstore": "bluestore",
> >
> > "osdspec_affinity": "dashboa

[ceph-users] Re: MDS crash due to seemingly unrecoverable metadata error

2022-02-23 Thread Xiubo Li
Have you tried to backup and then remove the 'mds%d_openfiles.%x' object 
to see could you start the MDS ?


Thanks.


On 2/23/22 7:07 PM, Wolfgang Mair wrote:
Update: I managed to clear the inode errors by deleting the parent 
directory entry from the metadata pool. However the MDS still refuses 
to start, which makes me wonder if the error killing it had to do with 
the inode issue in the first place.


Can anyone make sense of this error and point me where to investigate 
further?


Feb 23 11:44:54 herta ceph-mds[3384124]: -1> 
2022-02-23T11:44:54.195+0100 7f0502a33700 -1 ./src/mds/CInode.cc: In 
function 'CDir* CInode::get_or_open_dirfrag(MDCache*, frag_t)' thread 
7f0502a33700 time 2022-02-23T11:44:54.196476+0100
Feb 23 11:44:54 herta ceph-mds[3384124]: ./src/mds/CInode.cc: 785: 
FAILED ceph_assert(is_dir())
Feb 23 11:44:54 herta ceph-mds[3384124]: ceph version 16.2.7 
(f9aa029788115b5df5328f584156565ee5b7) pacific (stable)
Feb 23 11:44:54 herta ceph-mds[3384124]: 1: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x124) [0x7f050df11046]
Feb 23 11:44:54 herta ceph-mds[3384124]: 2: 
/usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7f050df111d1]
Feb 23 11:44:54 herta ceph-mds[3384124]: 3: 
(CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x5605fd69e365]
Feb 23 11:44:54 herta ceph-mds[3384124]: 4: 
(OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x5605fd74645d]
Feb 23 11:44:54 herta ceph-mds[3384124]: 5: 
(MDSContext::complete(int)+0x50) [0x5605fd717980]
Feb 23 11:44:54 herta ceph-mds[3384124]: 6: (void 
finish_contexts > 
>(ceph::common::CephContext*, std::vectorstd::allocator >&, int)+0x98) [0x5605fd3edd58]
Feb 23 11:44:54 herta ceph-mds[3384124]: 7: 
(MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, 
int)+0x138) [0x5605fd53bfc8]
Feb 23 11:44:54 herta ceph-mds[3384124]: 8: 
(MDCache::_open_ino_backtrace_fetched(inodeno_t, 
ceph::buffer::v15_2_0::list&, int)+0x277) [0x5605fd543717]
Feb 23 11:44:54 herta ceph-mds[3384124]: 9: 
(MDSContext::complete(int)+0x50) [0x5605fd717980]
Feb 23 11:44:54 herta ceph-mds[3384124]: 10: 
(MDSIOContextBase::complete(int)+0x524) [0x5605fd7180f4]
Feb 23 11:44:54 herta ceph-mds[3384124]: 11: 
(Finisher::finisher_thread_entry()+0x18d) [0x7f050dfaec0d]
Feb 23 11:44:54 herta ceph-mds[3384124]: 12: 
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f050dc6cea7]

Feb 23 11:44:54 herta ceph-mds[3384124]: 13: clone()
Feb 23 11:44:54 herta ceph-mds[3384124]: 0> 
2022-02-23T11:44:54.195+0100 7f0502a33700 -1 *** Caught signal 
(Aborted) **
Feb 23 11:44:54 herta ceph-mds[3384124]: in thread 7f0502a33700 
thread_name:MR_Finisher
Feb 23 11:44:54 herta ceph-mds[3384124]: ceph version 16.2.7 
(f9aa029788115b5df5328f584156565ee5b7) pacific (stable)
Feb 23 11:44:54 herta ceph-mds[3384124]: 1: 
/lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f050dc78140]

Feb 23 11:44:54 herta ceph-mds[3384124]: 2: gsignal()
Feb 23 11:44:54 herta ceph-mds[3384124]: 3: abort()
Feb 23 11:44:54 herta ceph-mds[3384124]: 4: 
(ceph::__ceph_assert_fail(char const*, char const*, int, char 
const*)+0x16e) [0x7f050df11090]
Feb 23 11:44:54 herta ceph-mds[3384124]: 5: 
/usr/lib/ceph/libceph-common.so.2(+0x2511d1) [0x7f050df111d1]
Feb 23 11:44:54 herta ceph-mds[3384124]: 6: 
(CInode::get_or_open_dirfrag(MDCache*, frag_t)+0x105) [0x5605fd69e365]
Feb 23 11:44:54 herta ceph-mds[3384124]: 7: 
(OpenFileTable::_prefetch_dirfrags()+0x2ad) [0x5605fd74645d]
Feb 23 11:44:54 herta ceph-mds[3384124]: 8: 
(MDSContext::complete(int)+0x50) [0x5605fd717980]
Feb 23 11:44:54 herta ceph-mds[3384124]: 9: (void 
finish_contexts > 
>(ceph::common::CephContext*, std::vectorstd::allocator >&, int)+0x98) [0x5605fd3edd58]
Feb 23 11:44:54 herta ceph-mds[3384124]: 10: 
(MDCache::open_ino_finish(inodeno_t, MDCache::open_ino_info_t&, 
int)+0x138) [0x5605fd53bfc8]
Feb 23 11:44:54 herta ceph-mds[3384124]: 11: 
(MDCache::_open_ino_backtrace_fetched(inodeno_t, 
ceph::buffer::v15_2_0::list&, int)+0x277) [0x5605fd543717]
Feb 23 11:44:54 herta ceph-mds[3384124]: 12: 
(MDSContext::complete(int)+0x50) [0x5605fd717980]
Feb 23 11:44:54 herta ceph-mds[3384124]: 13: 
(MDSIOContextBase::complete(int)+0x524) [0x5605fd7180f4]
Feb 23 11:44:54 herta ceph-mds[3384124]: 14: 
(Finisher::finisher_thread_entry()+0x18d) [0x7f050dfaec0d]
Feb 23 11:44:54 herta ceph-mds[3384124]: 15: 
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8ea7) [0x7f050dc6cea7]

Feb 23 11:44:54 herta ceph-mds[3384124]: 16: clone()
Feb 23 11:44:54 herta ceph-mds[3384124]: NOTE: a copy of the 
executable, or `objdump -rdS ` is needed to interpret this.


Best,

Wolfgang


On 21.02.22 17:01, Wolfgang Mair wrote:

Hi

I have a weird problem with my ceph cluster:

basic info:

 - 3-node cluster
 - cephfs runs on three data pools:
    - cephfs_meta (replicated)
    - ec_basic (erasure coded)
    - ec_sensitive (erasure coded with higher redundancy)

My MDS keeps crashing with a bad backtrace error:
2022-02-21T16:11:09.661+0100 7fd2cd290700 -1 log_channel(cluster

[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

2022-02-23 Thread Gaël THEROND
So!

Here is really mysterious resolution.
The issue vanished at the moment I requested the osd about its slow_ops
history.

I didn’t had time to do anything except to look for the osd ops history
that was actually empty :-)

I’ll keep all your suggestions if it ever came back :-)

Thanks a lot!

Le mer. 23 févr. 2022 à 12:51, Gaël THEROND  a
écrit :

> Thanks a lot Eugene, I dumbly forgot about the rbd block prefix!
>
> I’ll try that this afternoon and told you how it went.
>
> Le mer. 23 févr. 2022 à 11:41, Eugen Block  a écrit :
>
>> Hi,
>>
>> > How can I identify which operation this OSD is trying to achieve as
>> > osd_op() is a bit large ^^ ?
>>
>> I would start by querying the OSD for historic_slow_ops:
>>
>> ceph daemon osd. dump_historic_slow_ops to see which operation it is.
>>
>> > How can I identify the related images to this data chunk?
>>
>> You could go through all rbd images and check for the line containing
>> block_name_prefix, this could take some time depending on how many
>> images you have:
>>
>>  block_name_prefix: rbd_data.ca69416b8b4567
>>
>> I sometimes do that with this for loop:
>>
>> for i in `rbd -p  ls`; do if [ $(rbd info /$i | grep -c
>> ) -gt 0 ]; then echo "image: $i"; break; fi; done
>>
>> So in your case it would look something like this:
>>
>> for i in `rbd -p  ls`; do if [ $(rbd info /$i | grep -c
>> 89a4a940aba90b -gt 0 ]; then echo "image: $i"; break; fi; done
>>
>> To see which clients are connected you can check the mon daemon:
>>
>> ceph daemon mon. sessions
>>
>> The mon daemon also has a history of slow ops:
>>
>> ceph daemon mon. dump_historic_slow_ops
>>
>> Regards,
>> Eugen
>>
>>
>> Zitat von Gaël THEROND :
>>
>> > Hi everyone, I'm having a really nasty issue since around two days where
>> > our cluster report a bunch of SLOW_OPS on one of our OSD as:
>> >
>> > https://paste.openstack.org/show/b3DkgnJDVx05vL5o4OmY/
>> >
>> > Here is the cluster specification:
>> >   * Used to store Openstack related data (VMs/Snaphots/Volumes/Swift).
>> >   * Based on CEPH Nautilus 14.2.8 installed using ceph-ansible.
>> >   * Use an EC based storage profile.
>> >   * We have a separate and dedicated frontend and backend 10Gbps
>> network.
>> >   * We don't have any network issues observed or reported by our
>> monitoring
>> > system.
>> >
>> > Here is our current cluster status:
>> > https://paste.openstack.org/show/biVnkm9Yyog3lmSUn0UK/
>> > Here is a detailed view of our cluster status:
>> > https://paste.openstack.org/show/bgKCSVuow0JUZITo2Ndj/
>> >
>> > My main issue here is that this health alert is starting to fill the
>> > Monitor's disk and so trigger a MON_DISK_BIG alert.
>> >
>> > I'm worried as I'm having a hard time to identify which osd operation is
>> > actually slow and especially, which image does it concern and which
>> client
>> > is using it.
>> >
>> > So far I've try:
>> >   * To match this client ID with any watcher of our stored
>> > volumes/vms/snaphots by extracting the whole list and then using the
>> > following command: *rbd status /*
>> >  Unfortunately none of the watchers is matching my reported client
>> from
>> > the OSD on any pool.
>> >
>> > *  * *To map this reported chunk of data to any of our store image
>> > using:  *ceph
>> > osd map /rbd_data.5.89a4a940aba90b.00a0*
>> >  Unfortunately any pool name existing within our cluster give me
>> back
>> > an answer with no image information and a different watcher client ID.
>> >
>> > So my questions are:
>> >
>> > How can I identify which operation this OSD is trying to achieve as
>> > osd_op() is a bit large ^^ ?
>> > Does the *snapc *information part within the log relate to snapshot or
>> is
>> > that something totally different?
>> > How can I identify the related images to this data chunk?
>> > Is there official documentation about SLOW_OPS operations code
>> explaining
>> > how to read the logs like something that explains which block is PG
>> > number, which is the ID of something etc?
>> >
>> > Thanks a lot everyone and feel free to ask for additional information!
>> > G.
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>>
>>
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

2022-02-23 Thread Eugen Block
That is indeed unexpected, but good for you. ;-) Is the rest of the  
cluster healthy now?


Zitat von Gaël THEROND :


So!

Here is really mysterious resolution.
The issue vanished at the moment I requested the osd about its slow_ops
history.

I didn’t had time to do anything except to look for the osd ops history
that was actually empty :-)

I’ll keep all your suggestions if it ever came back :-)

Thanks a lot!

Le mer. 23 févr. 2022 à 12:51, Gaël THEROND  a
écrit :


Thanks a lot Eugene, I dumbly forgot about the rbd block prefix!

I’ll try that this afternoon and told you how it went.

Le mer. 23 févr. 2022 à 11:41, Eugen Block  a écrit :


Hi,

> How can I identify which operation this OSD is trying to achieve as
> osd_op() is a bit large ^^ ?

I would start by querying the OSD for historic_slow_ops:

ceph daemon osd. dump_historic_slow_ops to see which operation it is.

> How can I identify the related images to this data chunk?

You could go through all rbd images and check for the line containing
block_name_prefix, this could take some time depending on how many
images you have:

 block_name_prefix: rbd_data.ca69416b8b4567

I sometimes do that with this for loop:

for i in `rbd -p  ls`; do if [ $(rbd info /$i | grep -c
) -gt 0 ]; then echo "image: $i"; break; fi; done

So in your case it would look something like this:

for i in `rbd -p  ls`; do if [ $(rbd info /$i | grep -c
89a4a940aba90b -gt 0 ]; then echo "image: $i"; break; fi; done

To see which clients are connected you can check the mon daemon:

ceph daemon mon. sessions

The mon daemon also has a history of slow ops:

ceph daemon mon. dump_historic_slow_ops

Regards,
Eugen


Zitat von Gaël THEROND :

> Hi everyone, I'm having a really nasty issue since around two days where
> our cluster report a bunch of SLOW_OPS on one of our OSD as:
>
> https://paste.openstack.org/show/b3DkgnJDVx05vL5o4OmY/
>
> Here is the cluster specification:
>   * Used to store Openstack related data (VMs/Snaphots/Volumes/Swift).
>   * Based on CEPH Nautilus 14.2.8 installed using ceph-ansible.
>   * Use an EC based storage profile.
>   * We have a separate and dedicated frontend and backend 10Gbps
network.
>   * We don't have any network issues observed or reported by our
monitoring
> system.
>
> Here is our current cluster status:
> https://paste.openstack.org/show/biVnkm9Yyog3lmSUn0UK/
> Here is a detailed view of our cluster status:
> https://paste.openstack.org/show/bgKCSVuow0JUZITo2Ndj/
>
> My main issue here is that this health alert is starting to fill the
> Monitor's disk and so trigger a MON_DISK_BIG alert.
>
> I'm worried as I'm having a hard time to identify which osd operation is
> actually slow and especially, which image does it concern and which
client
> is using it.
>
> So far I've try:
>   * To match this client ID with any watcher of our stored
> volumes/vms/snaphots by extracting the whole list and then using the
> following command: *rbd status /*
>  Unfortunately none of the watchers is matching my reported client
from
> the OSD on any pool.
>
> *  * *To map this reported chunk of data to any of our store image
> using:  *ceph
> osd map /rbd_data.5.89a4a940aba90b.00a0*
>  Unfortunately any pool name existing within our cluster give me
back
> an answer with no image information and a different watcher client ID.
>
> So my questions are:
>
> How can I identify which operation this OSD is trying to achieve as
> osd_op() is a bit large ^^ ?
> Does the *snapc *information part within the log relate to snapshot or
is
> that something totally different?
> How can I identify the related images to this data chunk?
> Is there official documentation about SLOW_OPS operations code
explaining
> how to read the logs like something that explains which block is PG
> number, which is the ID of something etc?
>
> Thanks a lot everyone and feel free to ask for additional information!
> G.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io







___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] CephFS snaptrim bug?

2022-02-23 Thread Linkriver Technology
Hello,

I have upgraded our Ceph cluster from Nautilus to Octopus (15.2.15) over the
weekend. The upgrade went well as far as I can tell.

Earlier today, noticing that our CephFS data pool was approaching capacity, I
removed some old CephFS snapshots (taken weekly at the root of the filesystem),
keeping only the most recent one (created today, 2022-02-21). As expected, a
good fraction of the PGs transitioned from active+clean to active+clean+snaptrim
or active+clean+snaptrim_wait. In previous occasions when I removed a snapshot
it took a few days for snaptrimming to complete. This would happen without
noticeably impacting other workloads, and would also free up an appreciable
amount of disk space.

This time around, after a few hours of snaptrimming, users complained of high IO
latency, and indeed Ceph reported "slow ops" on a number of OSDs and on the
active MDS. I attributed this to the snaptrimming and decided to reduce it by
initially setting osd_pg_max_concurrent_snap_trims to 1, which didn't seem to
help much, so I then set it to 0, which had the surprising effect of
transitioning all PGs back to active+clean (is this intended?). I also restarted
the MDS which seemed to be struggling. IO latency went back to normal
immediately.

Outside of users' working hours, I decided to resume snaptrimming by setting
osd_pg_max_concurrent_snap_trims back to 1. Much to my surprise, nothing
happened. All PGs remained (and still remain at time of writing) in the state
active+clean, even after restarting some of them. This definitely seems
abnormal, as I mentioned earlier, snaptrimming this FS previously would take in
the order of multiple days. Moreover, if snaptrim were truly complete, I would
expect pool usage to have dropped by appreciable amounts (at least a dozen
terabytes), but that doesn't seem to be the case.

A du on the CephFS root gives:

# du -sh /mnt/pve/cephfs
31T/mnt/pve/cephfs

But:

# ceph df

--- POOLS ---
POOL   ID  PGS   STORED   OBJECTS  USED %USED  MAX AVAIL
cephfs_data 7   512   43 TiB  190.83M  147 TiB  93.223.6 TiB
cephfs_metadata 832   89 GiB  694.60k  266 GiB   1.326.4 TiB


ceph pg dump reports a SNAPTRIMQ_LEN of 0 on all PGs.

Did CephFS just leak a massive 12 TiB worth of objects...? It seems to me that
the snaptrim operation did not complete at all.

Perhaps relatedly:

# ceph daemon mds.choi dump snaps
{
"last_created": 93,
"last_destroyed": 94,
"snaps": [
{
"snapid": 93,
"ino": 1,
"stamp": "2022-02-21T00:00:01.245459+0800",
"name": "2022-02-21"
}
]
}

How can last_destroyed > last_created? The last snapshot to have been taken on
this FS is indeed #93, and the removed snapshots were all created on previous
weeks.

Could someone shed some light please? Assuming that snaptrim didn't run to
completion, how can I manually delete objects from now-removed snapshots? I
believe this is what the Ceph documentation calls a "backwards scrub" - but I
didn't find anything in the Ceph suite that can run such a scrub. This pool is
filling up fast, I'll throw in some more OSDs for the moment to buy some time,
but I certainly would appreciate your help!

Happy to attach any logs or info you deem necessary.

Regards,

LRT
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-02-23 Thread Sebastian Mazza
Hi Alexander,

thank you for your suggestion! All my Nodes have ECC memory. However, I have 
now checked that it was recognized correctly on every system (dmesg | grep 
EDAC). Furthermore I checkt if an error occurred by using `edac-util` and also 
by searching in the logs of the mainboard BMCs. Everything look perfectly fine. 
So I think we can now be sure that it is not a memory issue.

Thanks for reminding me to check memory!


Best regards,
Sebastian

 

> On 23.02.2022, at 02:03, Alexander E. Patrakov  wrote:
> 
> I have another suggestion: check the RAM, just in case, with memtest86
> or https://github.com/martinwhitaker/pcmemtest (which is a fork of
> memtest86+). Ignore the suggestion if you have ECC RAM.
> 
> вт, 22 февр. 2022 г. в 15:45, Igor Fedotov :
>> 
>> Hi Sebastian,
>> 
>> On 2/22/2022 3:01 AM, Sebastian Mazza wrote:
>>> Hey Igor!
>>> 
>>> 
 thanks a lot for the new logs - looks like they provides some insight.
>>> I'm glad the logs are helpful.
>>> 
>>> 
 At this point I think the root cause is apparently a race between deferred 
 writes replay and some DB maintenance task happening on OSD startup. It 
 seems that deferred write replay updates a block extent which 
 RocksDB/BlueFS are using. Hence the target BlueFS file gets all-zeros 
 content. Evidently that's just a matter of chance whether they use 
 conflicting physical extent or not hence the occasional nature of the 
 issue...
>>> 
>>> Do I understand that correct: The corruption of the rocksDB (Table 
>>> overwritten by zeros) happens at the first start of the OSD after  “*** 
>>> Immediate shutdown (osd_fast_shutdown=true) ***”? Before the system 
>>> launches the OSD Service the RocksDB is still fine?
>> Looks like that. From logs I can see an unexpected write to specific
>> extent (LBA 0x63) which shouldn't occur and at which RocksDB
>> subsequently fails.
>>> 
>>> 
 So first of all I'm curious if you have any particular write patterns that 
 can be culprits? E.g. something like disk wiping procedure which writes 
 all-zeros to an object followed by object truncate or removal comes to my 
 mind. If you can identify something like that - could you please collect 
 OSD log for such an operation (followed by OSD restart) with 
 debug-bluestore set to 20?
>>> Best to my knowledge the OSD was hardly doing anything and I do not see any 
>>> pattern that would fit to you explanation.
>>> However, you certainly understand a lot more about it than I do, so I try 
>>> to explain everything that could be relevant.
>>> 
>>> The Cluster has 3 Nodes. Each has a 240GB NVMe m.2 SSD as boot drive, which 
>>> should not be relevant. Each node has 3 OSDs, one is on an U.2 NVMe SSD 
>>> with 2TB and the other two are on 12TB HDDs.
>>> 
>>> I have configured two crush rules ‘c3nvme’ and ‘ec4x2hdd’. The ‘c3nvme’ is 
>>> a replicated rule that uses only OSDs with class ’nvme’. The second rule is 
>>> a tricky erasure rule. It selects exactly 2 OSDs on exactly 4 Hosts with 
>>> class ‘hdd’. So it only works for a size of exactly 8. That means that a 
>>> pool that uses this rule has always only “undersized” placement groups, 
>>> since the cluster has only 3 nodes. (I did not add the fourth server after 
>>> the first crash in December, since we want to reproduce the problem.)
>>> 
>>> The pools device_health_metrics, test-pool, fs.metadata-root-pool, 
>>> fs.data-root-pool, fs.data-nvme.c-pool, and block-nvme.c-pool uses the 
>>> crush rule ‘c3nvme’ with a size of 3 and a min size of 2. The pools 
>>> fs.data-hdd.ec-pool, block-hdd.ec-pool uses the crush rule ‘ec4x2hdd’ with 
>>> k=5,m=3 and a min size of 6.
>>> 
>>> The pool fs.data-nvme.c-pool is not used and the pool test-pool was used 
>>> for rados bench a few month ago.
>>> 
>>> The pool fs.metadata-root-pool is used as metadata pool for cephFS and 
>>> fs.data-root-pool as the root data pool for the cephFS. The pool 
>>> fs.data-hdd.ec-pool is an additional data pool for the cephFS and is 
>>> specified as ceph.dir.layout for some folders of the cephFS. The whole 
>>> cephFS is mounted by each of the 3 nodes.
>>> 
>>> The pool block-nvme.c-pool hosts two RBD images that are used as boot 
>>> drives for two VMs. The first VM runes with Ubuntu Desktop and the second 
>>> with Debian as OS. The pool block-hdd.ec-pool hosts one RBD image (the data 
>>> part, metadata on block-nvme.c-pool) that is attached to the Debian VM as 
>>> second drive formatted with BTRFS. Furthermore the Debian VM mounts a sub 
>>> directory of the cephFS that has the fs.data-hdd.ec-pool set as layout. 
>>> Both VMs was doing nothing, except from being booted, in the last couple of 
>>> days.
>>> 
>>> I try to illustrate the pool usage as a tree:
>>> * c3nvme (replicated, size=3, min_size=2)
>>> + device_health_metrics
>>> + test-pool
>>> - rados bench
>>> + fs.metadata-root-pool
>>> - CephFS (metadata)
>>> + fs.data-root-pool
>>> 

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

2022-02-23 Thread Sebastian Mazza
Hi Igor,

I let ceph rebuild the OSD.7. Then I added 
```
[osd]
debug bluefs = 20
debug bdev = 20
debug bluestore = 20
```
to the ceph.conf of all 3 nodes and shut down all 3 nodes without writing 
anything to the pools on the HDDs (the Debian VM was not even running).
Immediately at the first boot OSD.5 and 6 crashed with the same “Bad table 
magic number” error. The OSDs 5 and 6 are on the same node, but not on the node 
of OSD 7, wich crashed the last two times.

Logs and corrupted rocks DB Files: https://we.tl/t-ZBXYp8r4Hq
I have saved the entire /var/log directory of every node and the result of 
```
$ ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-5 --out-dir 
/tmp/osd.5-data
$ ceph-bluestore-tool bluefs-export --path /var/lib/ceph/osd/ceph-6 --out-dir 
/tmp/osd.6-data
```
Let me know if you need something else. 


I hop you can now track it down. I'm really looking forward to your 
interpretation of the logs.


Best Regards,
Sebastian


> On 22.02.2022, at 11:44, Igor Fedotov  wrote:
> 
> Hi Sebastian,
> 
> On 2/22/2022 3:01 AM, Sebastian Mazza wrote:
>> Hey Igor!
>> 
>> 
>>> thanks a lot for the new logs - looks like they provides some insight.
>> I'm glad the logs are helpful.
>> 
>> 
>>> At this point I think the root cause is apparently a race between deferred 
>>> writes replay and some DB maintenance task happening on OSD startup. It 
>>> seems that deferred write replay updates a block extent which 
>>> RocksDB/BlueFS are using. Hence the target BlueFS file gets all-zeros 
>>> content. Evidently that's just a matter of chance whether they use 
>>> conflicting physical extent or not hence the occasional nature of the 
>>> issue...
>> 
>> Do I understand that correct: The corruption of the rocksDB (Table 
>> overwritten by zeros) happens at the first start of the OSD after  “*** 
>> Immediate shutdown (osd_fast_shutdown=true) ***”? Before the system launches 
>> the OSD Service the RocksDB is still fine?
> Looks like that. From logs I can see an unexpected write to specific extent 
> (LBA 0x63) which shouldn't occur and at which RocksDB subsequently fails.
>> 
>> 
>>> So first of all I'm curious if you have any particular write patterns that 
>>> can be culprits? E.g. something like disk wiping procedure which writes 
>>> all-zeros to an object followed by object truncate or removal comes to my 
>>> mind. If you can identify something like that - could you please collect 
>>> OSD log for such an operation (followed by OSD restart) with 
>>> debug-bluestore set to 20?
>> Best to my knowledge the OSD was hardly doing anything and I do not see any 
>> pattern that would fit to you explanation.
>> However, you certainly understand a lot more about it than I do, so I try to 
>> explain everything that could be relevant.
>> 
>> The Cluster has 3 Nodes. Each has a 240GB NVMe m.2 SSD as boot drive, which 
>> should not be relevant. Each node has 3 OSDs, one is on an U.2 NVMe SSD with 
>> 2TB and the other two are on 12TB HDDs.
>> 
>> I have configured two crush rules ‘c3nvme’ and ‘ec4x2hdd’. The ‘c3nvme’ is a 
>> replicated rule that uses only OSDs with class ’nvme’. The second rule is a 
>> tricky erasure rule. It selects exactly 2 OSDs on exactly 4 Hosts with class 
>> ‘hdd’. So it only works for a size of exactly 8. That means that a pool that 
>> uses this rule has always only “undersized” placement groups, since the 
>> cluster has only 3 nodes. (I did not add the fourth server after the first 
>> crash in December, since we want to reproduce the problem.)
>> 
>> The pools device_health_metrics, test-pool, fs.metadata-root-pool, 
>> fs.data-root-pool, fs.data-nvme.c-pool, and block-nvme.c-pool uses the crush 
>> rule ‘c3nvme’ with a size of 3 and a min size of 2. The pools 
>> fs.data-hdd.ec-pool, block-hdd.ec-pool uses the crush rule ‘ec4x2hdd’ with 
>> k=5,m=3 and a min size of 6.
>> 
>> The pool fs.data-nvme.c-pool is not used and the pool test-pool was used for 
>> rados bench a few month ago.
>> 
>> The pool fs.metadata-root-pool is used as metadata pool for cephFS and 
>> fs.data-root-pool as the root data pool for the cephFS. The pool 
>> fs.data-hdd.ec-pool is an additional data pool for the cephFS and is 
>> specified as ceph.dir.layout for some folders of the cephFS. The whole 
>> cephFS is mounted by each of the 3 nodes.
>> 
>> The pool block-nvme.c-pool hosts two RBD images that are used as boot drives 
>> for two VMs. The first VM runes with Ubuntu Desktop and the second with 
>> Debian as OS. The pool block-hdd.ec-pool hosts one RBD image (the data part, 
>> metadata on block-nvme.c-pool) that is attached to the Debian VM as second 
>> drive formatted with BTRFS. Furthermore the Debian VM mounts a sub directory 
>> of the cephFS that has the fs.data-hdd.ec-pool set as layout. Both VMs was 
>> doing nothing, except from being booted, in the last couple of days.
>> 
>> I try to illustrate the pool usage as a tree:
>> * c3nvme (replicated, si

[ceph-users] Re: Error removing snapshot schedule

2022-02-23 Thread Venky Shankar
On Thu, Feb 24, 2022 at 8:00 AM Jeremy Hansen  wrote:
>
> Can’t figure out what I’m doing wrong. Is there another way to remove a 
> snapshot schedule?
>
> [ceph: root@cephn1 /]# ceph fs snap-schedule status / / testfs
> {"fs": "testfs", "subvol": null, "path": "/", "rel_path": "/", "schedule": 
> "1h", "retention": {}, "start": "2022-02-22T20:08:30", "created": 
> "2022-02-23T04:08:46", "first": "2022-02-24T01:08:30", "last": 
> "2022-02-24T02:08:30", "last_pruned": null, "created_count": 2, 
> "pruned_count": 0, "active": true}
> ===
> {"fs": "testfs", "subvol": null, "path": "/", "rel_path": "/", "schedule": 
> "1h", "retention": {}, "start": "2022-02-23T04:30:00", "created": 
> "2022-02-23T04:15:45", "first": "2022-02-24T00:30:00", "last": 
> "2022-02-24T01:30:00", "last_pruned": "2022-02-24T01:30:00", "created_count": 
> 2, "pruned_count": 1, "active": true}
> [ceph: root@cephn1 /]# ceph fs snap-schedule remove / 1h 2022-02-23T04:30:00 
> / testfs

Could you try:

ceph fs snap-schedule remove / 1h 2022-02-23T04:30:00 --fs testfs

The "/" before the file system name is optional. We should clear that
in the docs.

> Error EINVAL: Traceback (most recent call last):
> File "/usr/share/ceph/mgr/mgr_module.py", line 1386, in _handle_command
> return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
> File "/usr/share/ceph/mgr/mgr_module.py", line 397, in call
> return self.func(mgr, **kwargs)
> File "/usr/share/ceph/mgr/snap_schedule/module.py", line 149, in 
> snap_schedule_rm
> abs_path = self.resolve_subvolume_path(fs, subvol, path)
> File "/usr/share/ceph/mgr/snap_schedule/module.py", line 37, in 
> resolve_subvolume_path
> fs, subvol)
> File "/usr/share/ceph/mgr/mgr_module.py", line 1770, in remote
> args, kwargs)
> ImportError: Module not found
>
> -jeremy
>
> > On Tuesday, Feb 22, 2022 at 8:36 PM, Jeremy Hansen  > (mailto:jer...@skidrow.la)> wrote:
> > Ceph Pacific 16.2.7 using podman containers for orchestration on Rocky 
> > Linux 8.5
> >
> > I’m able to add schedules, but trying to remove them, or even use the 
> > remove command at all results in a python barf.
> >
> > [root@cephn1 ~]# cephadm shell
> > [ceph: root@cephn1 /]# ceph fs snap-schedule status / / testfs
> > {"fs": "testfs", "subvol": null, "path": "/", "rel_path": "/", "schedule": 
> > "1h", "retention": {}, "start": "2022-02-22T20:08:30", "created": 
> > "2022-02-23T04:08:46", "first": null, "last": null, "last_pruned": null, 
> > "created_count": 0, "pruned_count": 0, "active": true}
> > ===
> > {"fs": "testfs", "subvol": null, "path": "/", "rel_path": "/", "schedule": 
> > "1h", "retention": {}, "start": "2022-02-23T04:30:00", "created": 
> > "2022-02-23T04:15:45", "first": null, "last": null, "last_pruned": null, 
> > "created_count": 0, "pruned_count": 0, "active": true}
> > (failed reverse-i-search)`remve': ceph fs snap-schedule ^Cmove
> > [ceph: root@cephn1 /]# ceph fs snap-schedule remove / 1h 
> > 2022-02-22T20:08:30 / testfs
> > Error EINVAL: Traceback (most recent call last):
> > File "/usr/share/ceph/mgr/mgr_module.py", line 1386, in _handle_command
> > return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
> > File "/usr/share/ceph/mgr/mgr_module.py", line 397, in call
> > return self.func(mgr, **kwargs)
> > File "/usr/share/ceph/mgr/snap_schedule/module.py", line 149, in 
> > snap_schedule_rm
> > abs_path = self.resolve_subvolume_path(fs, subvol, path)
> > File "/usr/share/ceph/mgr/snap_schedule/module.py", line 37, in 
> > resolve_subvolume_path
> > fs, subvol)
> > File "/usr/share/ceph/mgr/mgr_module.py", line 1770, in remote
> > args, kwargs)
> > ImportError: Module not found
> >
> > [ceph: root@cephn1 /]# ceph fs snap-schedule remove
> > Error EINVAL: Traceback (most recent call last):
> > File "/usr/share/ceph/mgr/mgr_module.py", line 1386, in _handle_command
> > return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
> > File "/usr/share/ceph/mgr/mgr_module.py", line 397, in call
> > return self.func(mgr, **kwargs)
> > File "/usr/share/ceph/mgr/snap_schedule/module.py", line 150, in 
> > snap_schedule_rm
> > self.client.rm_snap_schedule(use_fs, abs_path, repeat, start)
> > File "/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py", line 51, in 
> > f
> > func(self, fs, schedule_or_path, *args)
> > File "/usr/share/ceph/mgr/snap_schedule/fs/schedule_client.py", line 274, 
> > in rm_snap_schedule
> > Schedule.rm_schedule(db, path, schedule, start)
> > File "/usr/share/ceph/mgr/snap_schedule/fs/schedule.py", line 278, in 
> > rm_schedule
> > if len(row) == 0:
> > TypeError: object of type 'NoneType' has no len()
> >
> >
> >
> > Seems like a python version thing, but how do I get around that since this 
> > is a container?
> >
> > Thanks
> > -jeremy
> >
> >
> >
> >
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io



-- 
Cheers,
Venky

__