[ceph-users] Re: Failed to probe daemons or devices

2022-10-25 Thread Sake Paulusma
I've created an issue: https://tracker.ceph.com/issues/57918
What can I do more the get to fix this issue?

And the output of the requested commands
[cephadm@mdshost2 ~]$ sudo lvs -a
  LV VG Attr LSize Pool Origin Data% Meta% Move Log Cpy%Sync Convert
  lv_home vg_sys -wi-ao 256.00m
  lv_opt vg_sys -wi-ao 3.00g
  lv_root vg_sys -wi-ao 5.00g
  lv_swap vg_sys -wi-ao 7.56g
  lv_tmp vg_sys -wi-ao 1.00g
  lv_var vg_sys -wi-ao 15.00g
  lv_var_log vg_sys -wi-ao 5.00g
  lv_var_log_audit vg_sys -wi-ao 512.00m

[cephadm@mdshost2 ~]$ sudo vgs -a
  VG #PV #LV #SN Attr VSize VFree
  vg_sys 1 8 0 wz--n- <49.00g 11.68g

[cephadm@mdshost2 ~]$ sudo parted --list
Model: VMware Virtual disk (scsi)
Disk /dev/sda: 53.7GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number Start End Size Type File system Flags
 1 1049kB 1075MB 1074MB primary xfs boot
 2 1075MB 53.7GB 52.6GB primary lvm

Error: /dev/sdb: unrecognised disk label
Model: VMware Virtual disk (scsi)
Disk /dev/sdb: 53.7GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

From: Guillaume Abrioux 
Sent: Monday, October 24, 2022 5:50:20 PM
To: Sake Paulusma 
Cc: ceph-users@ceph.io 
Subject: Re: [ceph-users] Failed to probe daemons or devices

Hello Sake,

Could you share the output of vgs / lvs commands?
Also, I would suggest you to open a tracker [1]

Thanks!

[1] 
https://tracker.ceph.com/projects/ceph-volume

On Mon, 24 Oct 2022 at 10:51, Sake Paulusma 
mailto:sake1...@hotmail.com>> wrote:
Last friday I upgrade the Ceph cluster from 17.2.3 to 17.2.5 with "ceph orch 
upgrade start --image 
localcontainerregistry.local.com:5000/ceph/ceph:v17.2.5-20221017".
 After sometime, an hour?, I've got a health warning: CEPHADM_REFRESH_FAILED: 
failed to probe daemons or devices. I'm using only Cephfs on the cluster and 
it's still working correctly.
Checking the running services, everything is up and running; mon, osd and mds. 
But on the hosts running mon and mds services I get errors in the cephadm.log, 
see the loglines below.

I look likes cephadm tries to start a container for checking something? What 
could be wrong?


On mon nodes I got the following:
2022-10-24 10:31:43,880 7f179e5bfb80 DEBUG 

cephadm ['gather-facts']
2022-10-24 10:31:44,333 7fc2d52b6b80 DEBUG 

cephadm ['--image', 
'localcontainerregistry.local.com:5000/ceph/ceph@sha256:122436e2f1df0c803666c5591db4a9b6c9196a71b4d44c6bd5d18102509dfca0',
 'ceph-volume', '--fsid', '8909ef90-22ea-11ed-8df1-0050569ee1b1', '--', 
'inventory', '--format=json-pretty', '--filter-for-batch']
2022-10-24 10:31:44,663 7fc2d52b6b80 INFO Inferring config 
/var/lib/ceph/8909ef90-22ea-11ed-8df1-0050569ee1b1/mon.oqsoel24332/config
2022-10-24 10:31:44,663 7fc2d52b6b80 DEBUG Using specified fsid: 
8909ef90-22ea-11ed-8df1-0050569ee1b1
2022-10-24 10:31:45,574 7fc2d52b6b80 INFO Non-zero exit code 1 from /bin/podman 
run --rm --ipc=host --stop-signal=SIGTERM --authfile=/etc/ceph/podman-auth.json 
--net=host --entrypoint /usr/sbin/ceph-volume --privileged --group-add=disk 
--init -e 
CONTAINER_IMAGE=localcontainerregistry.local.com:5000/ceph/ceph@sha256:122436e2f1df0c803666c5591db4a9b6c9196a71b4d44c6bd5d18102509dfca0

[ceph-users] Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread Martin Johansen
Hi

Two questions:

1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring several times a
day?

2) Why was it turned into a directory? It contains one file
"ceph.client.admin.keyring.new". This then causes an error in the ceph logs
when ceph tries to remove the file: "rm: cannot remove
'/etc/ceph/ceph.client.admin.keyring': Is a directory".

Best Regards,

Martin Johansen
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Failed to probe daemons or devices

2022-10-25 Thread Sake Paulusma
I fixed the issue by removing the blanco/not labeled disk. It is still a bug, 
so hopefully it can get fixed for someone else who can't easily remove a disk :)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread Marc
> 
> 1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring several
> times a
> day?
> 
> 2) Why was it turned into a directory? It contains one file
> "ceph.client.admin.keyring.new". This then causes an error in the ceph
> logs
> when ceph tries to remove the file: "rm: cannot remove
> '/etc/ceph/ceph.client.admin.keyring': Is a directory".
> 

Are you using the ceph-csi driver? The ceph csi people just delete your 
existing ceph files and mount your root fs when you are not running the driver 
in a container. They seem to think that checking for files and validating 
parameters is not necessary. 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread Martin Johansen
Yes, we are using the ceph-csi driver in a kubernetes cluster. Is it that
that is causing this?

Best Regards,

Martin Johansen


On Tue, Oct 25, 2022 at 9:44 AM Marc  wrote:

> >
> > 1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring several
> > times a
> > day?
> >
> > 2) Why was it turned into a directory? It contains one file
> > "ceph.client.admin.keyring.new". This then causes an error in the ceph
> > logs
> > when ceph tries to remove the file: "rm: cannot remove
> > '/etc/ceph/ceph.client.admin.keyring': Is a directory".
> >
>
> Are you using the ceph-csi driver? The ceph csi people just delete your
> existing ceph files and mount your root fs when you are not running the
> driver in a container. They seem to think that checking for files and
> validating parameters is not necessary.
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Temporary shutdown of subcluster and cephfs

2022-10-25 Thread Frank Schilder
Hi Patrick,

thanks for your answer. This is exactly the behaviour we need.

For future reference some more background:

We need to prepare a quite large installation for planned power outages. Even 
though they are called planned, we will not be able to handle these manually in 
good time for reasons irrelevant here. Our installation is protected by an UPS, 
but the guaranteed uptime on outage is only 6 minutes. So, we talk more about 
transient protection than uninterrupted power supply. Although we survived more 
than 20 minute power outages without loss of power to the DC, we need to plan 
with these 6 minutes.

In these 6 minutes, we need to wait for at least 1-2 minutes to avoid 
unintended shut-downs. In the remaining 4 minutes, we need to take down a 500 
node HPC cluster and an 1000OSD+12MDS+2MON ceph sub-cluster. Part of this ceph 
cluster will continue running on another site with higher power redundancy. 
This gives maybe 1-2 minutes response time for the ceph cluster and the best we 
can do is to try to achieve a "consistent at rest" state and hope we can 
cleanly power down the system before the power is cut.

Why am I so concerned about a "consistent at rest" state?

Its because while not all instances of a power loss lead to data loss, all 
instances of data loss I know of and were not caused by admin errors were 
caused by a power loss (see https://tracker.ceph.com/issues/46847). We were 
asked to prepare for a worst case of weekly power cuts, so no room for taking 
too many chances here. Our approach is: unmount as much as possible, fail the 
quickly FS to stop all remaining IO, give OSDs and MDSes a chance to flush 
pending operations to disk or journal and then try a clean shut down.

I will also have to temporarily adjust a number of parameters to ensure that 
the remaining sub-cluster continues to operate as normal as possible, for 
example, handles OSD fails in the usual way despite 90% of OSDs being down 
already.

Thanks for your input and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Patrick Donnelly 
Sent: 24 October 2022 20:01:01
To: Frank Schilder
Cc: Dan van der Ster; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Temporary shutdown of subcluster and cephfs

On Wed, Oct 19, 2022 at 7:54 AM Frank Schilder  wrote:
>
> Hi Dan,
>
> I know that "fs fail ..." is not ideal, but we will not have time for a clean 
> "fs down true" and wait for journal flush procedure to complete (on our 
> cluster this takes at least 20 minutes, which is way too long). My question 
> is more along the lines 'Is an "fs fail" destructive?'

It is not but lingering clients will not be evicted automatically by
the MDS. If you can, unmount before doing `fs fail`.

A journal flush is not really necessary. You only should wait ~10
seconds after the last client unmounts to give the MDS time to write
out to its journal any outstanding events.

> , that is, will an FS come up again after
>
> - fs fail
> ...
> - fs set  joinable true

Yes.

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] MGR failures and pg autoscaler

2022-10-25 Thread Lo Re Giuseppe
Hi,
Since some weeks we started to us pg autoscale on our pools.
We run with version 16.2.7.
Maybe a coincidence, maybe not,  from some weeks we started to experience mgr 
progress module failures:

“””
[root@naret-monitor01 ~]# ceph -s
  cluster:
id: 63334166-d991-11eb-99de-40a6b72108d0
health: HEALTH_ERR
Module 'progress' has failed: 
('346ee7e0-35f0-4fdf-960e-a36e7e2441e4',)
1 pool(s) full  services:
mon: 3 daemons, quorum naret-monitor01,naret-monitor02,naret-monitor03 (age 
5d)
mgr: naret-monitor02.ciqvgv(active, since 6d), standbys: 
naret-monitor03.escwyg, naret-monitor01.suwugf
mds: 1/1 daemons up, 2 standby
osd: 760 osds: 760 up (since 4d), 760 in (since 4d); 10 remapped pgs
rgw: 3 daemons active (3 hosts, 1 zones)  data:
volumes: 1/1 healthy
pools:   32 pools, 6250 pgs
objects: 977.79M objects, 3.6 PiB
usage:   5.7 PiB used, 5.1 PiB / 11 PiB avail
pgs: 4602612/5990777501 objects misplaced (0.077%)
 6214 active+clean
 25   active+clean+scrubbing+deep
 10   active+remapped+backfilling
 1active+clean+scrubbing  io:
client:   243 MiB/s rd, 292 MiB/s wr, 1.68k op/s rd, 842 op/s wr
recovery: 430 MiB/s, 109 objects/s  progress:
Global Recovery Event (14h)
  [===.] (remaining: 70s)
“””

In the mgr logs I see:
“””

debug 2022-10-20T23:09:03.859+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 2 has overlapping roots: {-60, -1}

debug 2022-10-20T23:09:03.863+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 3 has overlapping roots: {-60, -1, -2}

debug 2022-10-20T23:09:03.866+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 5 has overlapping roots: {-60, -1, -2}

debug 2022-10-20T23:09:03.870+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 6 has overlapping roots: {-60, -1, -2}

debug 2022-10-20T23:09:03.873+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 10 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.877+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 11 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.880+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 12 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.884+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 13 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.887+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 14 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.891+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 15 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.894+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 26 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.898+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 28 has overlapping roots: {-105, -60, -1, -2}

debug 2022-10-20T23:09:03.901+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 29 has overlapping roots: {-105, -60, -1, -2}

debug 2022-10-20T23:09:03.905+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
pool 30 has overlapping roots: {-105, -60, -1, -2}

...
“””
Do you have any explanation/fix for this errors?
Regards,

Giuseppe

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread Martin Johansen
How should we fix it? Should we remove the directory and add back the
keyring file?

Best Regards,

Martin Johansen


On Tue, Oct 25, 2022 at 9:45 AM Martin Johansen  wrote:

> Yes, we are using the ceph-csi driver in a kubernetes cluster. Is it that
> that is causing this?
>
> Best Regards,
>
> Martin Johansen
>
>
> On Tue, Oct 25, 2022 at 9:44 AM Marc  wrote:
>
>> >
>> > 1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring several
>> > times a
>> > day?
>> >
>> > 2) Why was it turned into a directory? It contains one file
>> > "ceph.client.admin.keyring.new". This then causes an error in the ceph
>> > logs
>> > when ceph tries to remove the file: "rm: cannot remove
>> > '/etc/ceph/ceph.client.admin.keyring': Is a directory".
>> >
>>
>> Are you using the ceph-csi driver? The ceph csi people just delete your
>> existing ceph files and mount your root fs when you are not running the
>> driver in a container. They seem to think that checking for files and
>> validating parameters is not necessary.
>>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph status does not report IO any more

2022-10-25 Thread Frank Schilder
Hi all,

I have a strange problem. I just completed an increase of pg_num on a pool and 
since then "ceph status" does not report aggregated client/recovery IO any 
more. It just looks like this now:

# ceph status
  cluster:
id:
health: HEALTH_OK
 
  services:
mon: 5 daemons, quorum ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 (age 2w)
mgr: ceph-25(active, since 2w), standbys: ceph-26, ceph-03, ceph-02, ceph-01
mds: con-fs2:8 4 up:standby 8 up:active
osd: 1120 osds: 1115 up (since 11m), 1114 in (since 18h)
 
  task status:
 
  data:
pools:   14 pools, 18399 pgs
objects: 1.41G objects, 2.5 PiB
usage:   3.2 PiB used, 8.3 PiB / 12 PiB avail
pgs: 18378 active+clean
 20active+clean+scrubbing+deep
 1 active+clean+scrubbing

Any idea what the problem could be?

Thanks and best regards.
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw multisite octopus - bucket can not be resharded after cancelling prior reshard process

2022-10-25 Thread Boris Behrens
Opened a bug on the tracker for it: https://tracker.ceph.com/issues/57919

Am Fr., 7. Okt. 2022 um 11:30 Uhr schrieb Boris Behrens :

> Hi,
> I just wanted to reshard a bucket but mistyped the amount of shards. In a
> reflex I hit ctrl-c and waited. It looked like the resharding did not
> finish so I canceled it, and now the bucket is in this state.
> How can I fix it. It does not show up in the stale-instace list. It's also
> a multisite environment (we only sync metadata).
>
> $ radosgw-admin reshard status --bucket bucket
> [
> {
> "reshard_status": "not-resharding",
> "new_bucket_instance_id": "",
> "num_shards": -1
> }
> ]
>
> $ radosgw-admin bucket stats --bucket bucket
> {
> "bucket": "bucket",
> *"num_shards": 0,*
> ...
> *"id": "ff7a8b0c-07e6-463a-861b-78f0adeba8ad.2296333939.14",*
> "marker": "ff7a8b0c-07e6-463a-861b-78f0adeba8ad.2296333939.14",
> ...
> }
>
> $ radosgw-admin metadata get
> bucket.instance:bucket:ff7a8b0c-07e6-463a-861b-78f0adeba8ad.2368407345.1
> {
> "key":
> "bucket.instance:bucket:ff7a8b0c-07e6-463a-861b-78f0adeba8ad.2368407345.1",
> "ver": {
> "tag": "QndcbsKPFDjs6rYKKDHde9bM",
> "ver": 2
> },
> "mtime": "2022-10-07T07:16:49.231685Z",
> "data": {
> "bucket_info": {
> "bucket": {
> "name": "bucket",
> "marker":
> "ff7a8b0c-07e6-463a-861b-78f0adeba8ad.2296333939.14",
> *"bucket_id":
> "ff7a8b0c-07e6-463a-861b-78f0adeba8ad.2368407345.1",*
> ...
> },
> ...
> *"num_shards": 211,*
> ...
> },
> }
>
>
> Cheers
>  Boris
>


-- 
Die Selbsthilfegruppe "UTF-8-Probleme" trifft sich diesmal abweichend im
groüen Saal.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread Marc
Wtf, unbelievable that it is still like this. You can't fix it, I had to fork 
and patch it because these @#$@#$@ ignored it. I don't know much about 
kubernetes I am running mesos. Can't you set/configure kubernetes to launch the 
driver in a container mode?


> 
> How should we fix it? Should we remove the directory and add back the
> keyring file?
> 
> 
> 
> On Tue, Oct 25, 2022 at 9:45 AM Martin Johansen   > wrote:
> 
> 
>   Yes, we are using the ceph-csi driver in a kubernetes cluster. Is
> it that that is causing this?
> 
>   Best Regards,
> 
> 
>   Martin Johansen
> 
> 
>   On Tue, Oct 25, 2022 at 9:44 AM Marc   > wrote:
> 
> 
>   >
>   > 1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring
> several
>   > times a
>   > day?
>   >
>   > 2) Why was it turned into a directory? It contains one file
>   > "ceph.client.admin.keyring.new". This then causes an error
> in the ceph
>   > logs
>   > when ceph tries to remove the file: "rm: cannot remove
>   > '/etc/ceph/ceph.client.admin.keyring': Is a directory".
>   >
> 
>   Are you using the ceph-csi driver? The ceph csi people just
> delete your existing ceph files and mount your root fs when you are not
> running the driver in a container. They seem to think that checking for
> files and validating parameters is not necessary.
> 

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MGR failures and pg autoscaler

2022-10-25 Thread Lo Re Giuseppe
I have found the logs showing the progress module failure:

debug 2022-10-25T05:06:08.877+ 7f40868e7700  0 [rbd_support INFO root] 
execute_trash_remove: task={"sequence": 150, "id": 
"fcc864a0-9bde-4512-9f84-be10976613db", "message": "Removing i
mage fulen-hdd/f3f237d2f7e304 from trash", "refs": {"action": "trash remove", 
"pool_name": "fulen-hdd", "pool_namespace": "", "image_id": "f3f237d2f7e304"}, 
"in_progress": true, "progress"
: 0.0}
debug 2022-10-25T05:06:08.884+ 7f4106e90700 -1 log_channel(cluster) log 
[ERR] : Unhandled exception from module 'progress' while running on 
mgr.naret-monitor03.escwyg: ('42efb95d-ceaa-4a91-a9b2-b91f65f1834d',)
debug 2022-10-25T05:06:08.884+ 7f4106e90700 -1 progress.serve:
debug 2022-10-25T05:06:08.897+ 7f4139e96700  0 log_channel(audit) log [DBG] 
: from='client.22182342 -' entity='client.combin' 
cmd=[{"format":"json","group_name":"combin","prefix":"fs subvolume 
info","sub_name":"combin-4b53e28d-2f59-11ed-8aa5-9aa9e2c5aae2","vol_name":"cephfs"}]:
 dispatch
debug 2022-10-25T05:06:08.884+ 7f4106e90700 -1 Traceback (most recent call 
last):
  File "/usr/share/ceph/mgr/progress/module.py", line 716, in serve
self._process_pg_summary()
  File "/usr/share/ceph/mgr/progress/module.py", line 629, in 
_process_pg_summary
ev = self._events[ev_id]
KeyError: '42efb95d-ceaa-4a91-a9b2-b91f65f1834d'




On 25.10.22, 09:58, "Lo Re  Giuseppe"  wrote:

Hi,
Since some weeks we started to us pg autoscale on our pools.
We run with version 16.2.7.
Maybe a coincidence, maybe not,  from some weeks we started to experience 
mgr progress module failures:

“””
[root@naret-monitor01 ~]# ceph -s
  cluster:
id: 63334166-d991-11eb-99de-40a6b72108d0
health: HEALTH_ERR
Module 'progress' has failed: 
('346ee7e0-35f0-4fdf-960e-a36e7e2441e4',)
1 pool(s) full  services:
mon: 3 daemons, quorum naret-monitor01,naret-monitor02,naret-monitor03 
(age 5d)
mgr: naret-monitor02.ciqvgv(active, since 6d), standbys: 
naret-monitor03.escwyg, naret-monitor01.suwugf
mds: 1/1 daemons up, 2 standby
osd: 760 osds: 760 up (since 4d), 760 in (since 4d); 10 remapped pgs
rgw: 3 daemons active (3 hosts, 1 zones)  data:
volumes: 1/1 healthy
pools:   32 pools, 6250 pgs
objects: 977.79M objects, 3.6 PiB
usage:   5.7 PiB used, 5.1 PiB / 11 PiB avail
pgs: 4602612/5990777501 objects misplaced (0.077%)
 6214 active+clean
 25   active+clean+scrubbing+deep
 10   active+remapped+backfilling
 1active+clean+scrubbing  io:
client:   243 MiB/s rd, 292 MiB/s wr, 1.68k op/s rd, 842 op/s wr
recovery: 430 MiB/s, 109 objects/s  progress:
Global Recovery Event (14h)
  [===.] (remaining: 70s)
“””

In the mgr logs I see:
“””

debug 2022-10-20T23:09:03.859+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 2 has overlapping roots: {-60, -1}

debug 2022-10-20T23:09:03.863+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 3 has overlapping roots: {-60, -1, -2}

debug 2022-10-20T23:09:03.866+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 5 has overlapping roots: {-60, -1, -2}

debug 2022-10-20T23:09:03.870+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 6 has overlapping roots: {-60, -1, -2}

debug 2022-10-20T23:09:03.873+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 10 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.877+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 11 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.880+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 12 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.884+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 13 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.887+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 14 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.891+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 15 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.894+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 26 has overlapping roots: {-105, -60,

-1, -2}

debug 2022-10-20T23:09:03.898+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 28 has overlapping roots: {-105, -60, -1, -2}

debug 2022-10-20T23:09:03.901+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 29 has overlapping roots: {-105, -60, -1, -2}

debug 2022-10-20T23:09:03.905+ 7fba5f300700  0 [pg_autoscaler ERROR 
root] pool 30 has overlapping roots: {-105, -60, -1, -2}

...
“””
Do you have any explanation/fix for this errors?
Regards,

Giuseppe

__

[ceph-users] changing alerts in cephadm (pacific) installed prometheus/alertmanager

2022-10-25 Thread Lasse Aagren
Hello,

Is it possible to change/remove any of the provided alerts. Only way we've
found so far, is to change ceph_alerts.yml in running containers - which
don't persist redeploys.

Best regards,
Lasse Aagren
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread Martin Johansen
Could you explain? I have just deployed Ceph CSI just like the docs
specified. What mode is it running in if not container mode?

Best Regards,

Martin Johansen


On Tue, Oct 25, 2022 at 10:56 AM Marc  wrote:

> Wtf, unbelievable that it is still like this. You can't fix it, I had to
> fork and patch it because these @#$@#$@ ignored it. I don't know much about
> kubernetes I am running mesos. Can't you set/configure kubernetes to launch
> the driver in a container mode?
>
>
> >
> > How should we fix it? Should we remove the directory and add back the
> > keyring file?
> >
> >
> >
> > On Tue, Oct 25, 2022 at 9:45 AM Martin Johansen  >  > wrote:
> >
> >
> >   Yes, we are using the ceph-csi driver in a kubernetes cluster. Is
> > it that that is causing this?
> >
> >   Best Regards,
> >
> >
> >   Martin Johansen
> >
> >
> >   On Tue, Oct 25, 2022 at 9:44 AM Marc  >  > wrote:
> >
> >
> >   >
> >   > 1) Why does ceph delete
> /etc/ceph/ceph.client.admin.keyring
> > several
> >   > times a
> >   > day?
> >   >
> >   > 2) Why was it turned into a directory? It contains one
> file
> >   > "ceph.client.admin.keyring.new". This then causes an
> error
> > in the ceph
> >   > logs
> >   > when ceph tries to remove the file: "rm: cannot remove
> >   > '/etc/ceph/ceph.client.admin.keyring': Is a directory".
> >   >
> >
> >   Are you using the ceph-csi driver? The ceph csi people just
> > delete your existing ceph files and mount your root fs when you are not
> > running the driver in a container. They seem to think that checking for
> > files and validating parameters is not necessary.
> >
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RGW/S3 after a cluster is/was full

2022-10-25 Thread Ulrich Klein
Hi,

I have a problem with a full cluster and getting it back to a healthy state.
Fortunately it's a small test cluster with no valuable data in it.
It is used exclusively for RGW/S3, running 17.2.3.

I intentionaly filled it up via rclone/S3 until it got into HEALTH_ERR, so see 
what would happen in that situation. 
At first it sort-of looks ok, as the cluster apparently goes into a read-only 
state. I can still get the stored data via S3.

But then there seems to be no way to get out of the full state. Via S3 one 
can't delete any objects or buckets.
Or did I miss anything? The requests just hang until they time out.

So, I used "rados rm -p   --force-full" to delete a bunch of those 
multipart corpses and other "old" objects.
That got the cluster back into HEALTH_OK.

But now the RGW gc seems to be screwed up:
# radosgw-admin gc list --include-all | grep oid | wc -l
158109
# radosgw-admin gc process --include-all
# radosgw-admin gc list --include-all | grep oid | wc -l
158109

I.e.. it has 158109 objects to clean up, but doesn’t clean up anything.
I guess that's because the objects it wants to collect don't exist anymore, but 
are in some index or other list.
Is there any way to reset or clean up?

I'd appreciate any hints.

Ciao, Uli

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Why did ceph turn /etc/ceph/ceph.client.admin.keyring into a directory?

2022-10-25 Thread E Taka
Question 1) makes me wonder too.

This results in errors:

2022-10-25T11:20:00.000109+0200 mon.ceph00 [INF] overall HEALTH_OK
2022-10-25T11:21:05.422793+0200 mon.ceph00 [WRN] Health check failed:
failed to probe daemons or devices (CEPHADM_REFRESH_FAILED)
2022-10-25T11:22:06.037456+0200 mon.ceph00 [INF] Health check cleared:
CEPHADM_REFRESH_FAILED (was: failed to probe daemons or devices)
2022-10-25T11:22:06.037491+0200 mon.ceph00 [INF] Cluster is now healthy
2022-10-25T11:30:00.71+0200 mon.ceph00 [INF] overall HEALTH_OK

I would like to stop this behavior. But how?

Am Di., 25. Okt. 2022 um 09:44 Uhr schrieb Marc :

> >
> > 1) Why does ceph delete /etc/ceph/ceph.client.admin.keyring several
> > times a
> > day?
> >
> > 2) Why was it turned into a directory? It contains one file
> > "ceph.client.admin.keyring.new". This then causes an error in the ceph
> > logs
> > when ceph tries to remove the file: "rm: cannot remove
> > '/etc/ceph/ceph.client.admin.keyring': Is a directory".
> >
>
> Are you using the ceph-csi driver? The ceph csi people just delete your
> existing ceph files and mount your root fs when you are not running the
> driver in a container. They seem to think that checking for files and
> validating parameters is not necessary.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] setting unique labels in cephadm installed (pacific) prometheus.yml

2022-10-25 Thread Lasse Aagren
Hello,

In our cephadm installed cluster (pacific) we are running two instances of
prometheus.

Through altering the prometheus.yml.j2 template (ceph config-key set
mgr/cephadm/services/prometheus/prometheus.yml ...) we set the prometheus'
to remote write to a corporate setup for long time retention of metrics)

As we have two prometheus instances we need a way to create a unique label
to not have clashing data in the upstream setup.

So far we haven't been able to do that in this template:

https://github.com/ceph/ceph/blob/v16.2.10/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2

other than something like:


  - target_label: prometheus_replica
replacement: 'prometheus-{{ range(1, 51) | random }}'


And keeping fingers crossed that "random" won't be the same two times in a
row.

Do anyone have a better idea on how to do this?

Best regards,
Lasse Aagren
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: setting unique labels in cephadm installed (pacific) prometheus.yml

2022-10-25 Thread Lasse Aagren
The context provided, when parsing the template:

https://github.com/ceph/ceph/blob/v16.2.10/src/pybind/mgr/cephadm/services/monitoring.py#L319-L331

doesn't seem to provide any per host uniqueness

On Tue, Oct 25, 2022 at 12:35 PM Lasse Aagren  wrote:

> Hello,
>
> In our cephadm installed cluster (pacific) we are running two instances of
> prometheus.
>
> Through altering the prometheus.yml.j2 template (ceph config-key set
> mgr/cephadm/services/prometheus/prometheus.yml ...) we set the prometheus'
> to remote write to a corporate setup for long time retention of metrics)
>
> As we have two prometheus instances we need a way to create a unique label
> to not have clashing data in the upstream setup.
>
> So far we haven't been able to do that in this template:
>
>
> https://github.com/ceph/ceph/blob/v16.2.10/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2
>
> other than something like:
>
> 
>   - target_label: prometheus_replica
> replacement: 'prometheus-{{ range(1, 51) | random }}'
> 
>
> And keeping fingers crossed that "random" won't be the same two times in a
> row.
>
> Do anyone have a better idea on how to do this?
>
> Best regards,
> Lasse Aagren
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: setting unique labels in cephadm installed (pacific) prometheus.yml

2022-10-25 Thread Redouane Kachach Elhichou
Currently the generated template is the same for all the hosts and there's
no way to have a dedicated template for a specific host AFAIK.

On Tue, Oct 25, 2022 at 12:45 PM Lasse Aagren  wrote:

> The context provided, when parsing the template:
>
>
> https://github.com/ceph/ceph/blob/v16.2.10/src/pybind/mgr/cephadm/services/monitoring.py#L319-L331
>
> doesn't seem to provide any per host uniqueness
>
> On Tue, Oct 25, 2022 at 12:35 PM Lasse Aagren  wrote:
>
> > Hello,
> >
> > In our cephadm installed cluster (pacific) we are running two instances
> of
> > prometheus.
> >
> > Through altering the prometheus.yml.j2 template (ceph config-key set
> > mgr/cephadm/services/prometheus/prometheus.yml ...) we set the
> prometheus'
> > to remote write to a corporate setup for long time retention of metrics)
> >
> > As we have two prometheus instances we need a way to create a unique
> label
> > to not have clashing data in the upstream setup.
> >
> > So far we haven't been able to do that in this template:
> >
> >
> >
> https://github.com/ceph/ceph/blob/v16.2.10/src/pybind/mgr/cephadm/templates/services/prometheus/prometheus.yml.j2
> >
> > other than something like:
> >
> > 
> >   - target_label: prometheus_replica
> > replacement: 'prometheus-{{ range(1, 51) | random }}'
> > 
> >
> > And keeping fingers crossed that "random" won't be the same two times in
> a
> > row.
> >
> > Do anyone have a better idea on how to do this?
> >
> > Best regards,
> > Lasse Aagren
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Using multiple SSDs as DB

2022-10-25 Thread Christian
Thank you!

Robert Sander  schrieb am Fr. 21. Okt. 2022

> This is a bug in certain versions of ceph-volume:
>
> https://tracker.ceph.com/issues/56031
>
> It should be fixed in the latest releases.


For completeness's sake: The cluster is on 16.2.10.
Issue is resolved and marked as backported. 16.2.10 was released shortly
before the backport.
Fixed version for Pacific should be 16.2.11.

A partial workaround I found, was limiting data_devices to 8 and db_devices
to 1. This resulted in correct db usage for one db device.
I then tried 16 data 2 db: This did not work: it (would have) resulted in
extra 8 Ceph OSDs with no db device.

Best,
Christian
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephadm container configurations

2022-10-25 Thread Mikhail Sidorov
Hello!

I am planning a ceph cluster and evaluating cephadm as an orchestration tool
My cluster is going to be relatively small at the start, so I am planning
to run monitor daemons on the same node as osd. But I wanted to provide
some QoS on memory and cpu resources, so I am wondering if it is possible
to set the resource limits for containers via cephadm? And if not, wouldn't
they be overwritten, if I configure them some other way? What is the most
convenient way to do so?
Also I wanted to configure the containers to use jumbo frames and
preferably to use host networking to avoid additional overhead, is ithat
possible?

Best regards,
Michael
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm container configurations

2022-10-25 Thread Robert Gallop
Host networking is used by default as the network layer (no ip forwarding
requirement), so if your OS is jumbo your containers are.

As for the resources I’ll let more knowledgeable answer that, but you can
certainly run mon’s and OSD’s on the same box assuming you have enough CPU
and memory.  I have a small POC cluster that runs on some nodes, MON, MGR,
and OSD’s, and other nodes running MON OSD and MDS, all works great with
192GB ram and 48 core boxes.



On Tue, Oct 25, 2022 at 6:39 AM Mikhail Sidorov 
wrote:

> Hello!
>
> I am planning a ceph cluster and evaluating cephadm as an orchestration
> tool
> My cluster is going to be relatively small at the start, so I am planning
> to run monitor daemons on the same node as osd. But I wanted to provide
> some QoS on memory and cpu resources, so I am wondering if it is possible
> to set the resource limits for containers via cephadm? And if not, wouldn't
> they be overwritten, if I configure them some other way? What is the most
> convenient way to do so?
> Also I wanted to configure the containers to use jumbo frames and
> preferably to use host networking to avoid additional overhead, is ithat
> possible?
>
> Best regards,
> Michael
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 1 pg stale, 1 pg undersized

2022-10-25 Thread Alexander Fiedler
Hello,

we run a ceph cluster with the following error which came up suddenly without 
any maintenance/changes:

HEALTH_WARN Reduced data availability: 1 pg stale; Degraded data redundancy: 1 
pg undersized

The PG in question is PG 25

Output of ceph pg dump_stuck stale:

PG_STAT  STATE UP  UP_PRIMARY  ACTING   
ACTING_PRIMARY
25.0 stale+active+undersized+remapped  []  -1  [66,64]  
66

Both acting OSDs and the mons+managers were rebooted. All OSDs in the cluster 
are up.

Do you have any idea why 1 PG is stuck?

Best regards

Alexander Fiedler


--
imos Gesellschaft fuer Internet-Marketing und Online-Services mbH
Alfons-Feifel-Str. 9 // D-73037 Goeppingen // Stauferpark Ost
Tel: 07161 93339- // Fax: 07161 93339-99 // Internet: www.imos.net

Eingetragen im Handelsregister des Amtsgerichts Ulm, HRB 532571
Vertreten durch die Geschaeftsfuehrer Alfred und Rolf Wallender
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Temporary shutdown of subcluster and cephfs

2022-10-25 Thread Patrick Donnelly
On Tue, Oct 25, 2022 at 3:48 AM Frank Schilder  wrote:
>
> Hi Patrick,
>
> thanks for your answer. This is exactly the behaviour we need.
>
> For future reference some more background:
>
> We need to prepare a quite large installation for planned power outages. Even 
> though they are called planned, we will not be able to handle these manually 
> in good time for reasons irrelevant here. Our installation is protected by an 
> UPS, but the guaranteed uptime on outage is only 6 minutes. So, we talk more 
> about transient protection than uninterrupted power supply. Although we 
> survived more than 20 minute power outages without loss of power to the DC, 
> we need to plan with these 6 minutes.
>
> In these 6 minutes, we need to wait for at least 1-2 minutes to avoid 
> unintended shut-downs. In the remaining 4 minutes, we need to take down a 500 
> node HPC cluster and an 1000OSD+12MDS+2MON ceph sub-cluster. Part of this 
> ceph cluster will continue running on another site with higher power 
> redundancy. This gives maybe 1-2 minutes response time for the ceph cluster 
> and the best we can do is to try to achieve a "consistent at rest" state and 
> hope we can cleanly power down the system before the power is cut.
>
> Why am I so concerned about a "consistent at rest" state?
>
> Its because while not all instances of a power loss lead to data loss, all 
> instances of data loss I know of and were not caused by admin errors were 
> caused by a power loss (see https://tracker.ceph.com/issues/46847). We were 
> asked to prepare for a worst case of weekly power cuts, so no room for taking 
> too many chances here. Our approach is: unmount as much as possible, fail the 
> quickly FS to stop all remaining IO, give OSDs and MDSes a chance to flush 
> pending operations to disk or journal and then try a clean shut down.

To be clear in case there is any confusion: once you do `fs fail`, the
MDS are removed from the cluster and they will respawn. They are not
given any time to flush remaining I/O.

FYI as this may interest you: we have a ticket to set a flag on the
file system to prevent new client mounts:
https://tracker.ceph.com/issues/57090

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Cephadm container configurations

2022-10-25 Thread Adam King
If you're using a fairly recent cephadm version, there is the ability to
provide miscellaneous container arguments in the service spec
https://docs.ceph.com/en/quincy/cephadm/services/#extra-container-arguments.
This means you can have cephadm deploy each container in that service with,
for example, --cpus and --memory flags podman/docker run command provides
(or any other flags the podman/docker run commands take) which I think
should allow you to accomplish the limiting.

On Tue, Oct 25, 2022 at 8:39 AM Mikhail Sidorov 
wrote:

> Hello!
>
> I am planning a ceph cluster and evaluating cephadm as an orchestration
> tool
> My cluster is going to be relatively small at the start, so I am planning
> to run monitor daemons on the same node as osd. But I wanted to provide
> some QoS on memory and cpu resources, so I am wondering if it is possible
> to set the resource limits for containers via cephadm? And if not, wouldn't
> they be overwritten, if I configure them some other way? What is the most
> convenient way to do so?
> Also I wanted to configure the containers to use jumbo frames and
> preferably to use host networking to avoid additional overhead, is ithat
> possible?
>
> Best regards,
> Michael
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Temporary shutdown of subcluster and cephfs

2022-10-25 Thread Frank Schilder
Hi Patrick.

> To be clear in case there is any confusion: once you do `fs fail`, the
> MDS are removed from the cluster and they will respawn. They are not
> given any time to flush remaining I/O.

This is fine, there is not enough time to flush anything. As long as they leave 
the meta-data- and data pools in a consistent state, that is, after an "fs set 
 joinable true" the MDSes start replaying the journal etc. and the FS 
comes up healthy, everything is fine. If user IO in flight gets lost in this 
process, this is not a problem. A problem would be a corruption of the file 
system itself.

In my experience, an mds fail is a clean (non-destructive) operation. I have 
never had an FS corruption due to an mds fail. As long as an "fs fail" is also 
non-destructive, it is the best way I can see to cut off all user IO as fast as 
possible and bring all hardware to rest. What I would like to avoid is a power 
loss on a busy cluster where I would have to rely on too many things to be 
implemented correctly. With >800 disks you start seeing unusual firmware fails 
and also disk fails after power up are not uncommon. I just want to take as 
much as possible out of the "does this really work in all corner cases" 
equation and rather rely on "I did this 100 times in the past without a 
problem" situations.

That users may have to repeat a task is not a problem. Damaging the file system 
itself, on the other hand, is.

Thanks and best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Patrick Donnelly 
Sent: 25 October 2022 14:51:33
To: Frank Schilder
Cc: Dan van der Ster; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: Temporary shutdown of subcluster and cephfs

On Tue, Oct 25, 2022 at 3:48 AM Frank Schilder  wrote:
>
> Hi Patrick,
>
> thanks for your answer. This is exactly the behaviour we need.
>
> For future reference some more background:
>
> We need to prepare a quite large installation for planned power outages. Even 
> though they are called planned, we will not be able to handle these manually 
> in good time for reasons irrelevant here. Our installation is protected by an 
> UPS, but the guaranteed uptime on outage is only 6 minutes. So, we talk more 
> about transient protection than uninterrupted power supply. Although we 
> survived more than 20 minute power outages without loss of power to the DC, 
> we need to plan with these 6 minutes.
>
> In these 6 minutes, we need to wait for at least 1-2 minutes to avoid 
> unintended shut-downs. In the remaining 4 minutes, we need to take down a 500 
> node HPC cluster and an 1000OSD+12MDS+2MON ceph sub-cluster. Part of this 
> ceph cluster will continue running on another site with higher power 
> redundancy. This gives maybe 1-2 minutes response time for the ceph cluster 
> and the best we can do is to try to achieve a "consistent at rest" state and 
> hope we can cleanly power down the system before the power is cut.
>
> Why am I so concerned about a "consistent at rest" state?
>
> Its because while not all instances of a power loss lead to data loss, all 
> instances of data loss I know of and were not caused by admin errors were 
> caused by a power loss (see https://tracker.ceph.com/issues/46847). We were 
> asked to prepare for a worst case of weekly power cuts, so no room for taking 
> too many chances here. Our approach is: unmount as much as possible, fail the 
> quickly FS to stop all remaining IO, give OSDs and MDSes a chance to flush 
> pending operations to disk or journal and then try a clean shut down.

To be clear in case there is any confusion: once you do `fs fail`, the
MDS are removed from the cluster and they will respawn. They are not
given any time to flush remaining I/O.

FYI as this may interest you: we have a ticket to set a flag on the
file system to prevent new client mounts:
https://tracker.ceph.com/issues/57090

--
Patrick Donnelly, Ph.D.
He / Him / His
Principal Software Engineer
Red Hat, Inc.
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Large OMAP Objects & Pubsub

2022-10-25 Thread Alex Hussein-Kershaw (HE/HIM)
Hi All,

Looking to get some advice on an issue my clusters have been suffering from. 
Realize there are lots of text below. Thanks in advance for your consideration.

The cluster has a health warning of "32 large omap objects". It's persisted for 
several months.

It appears functional and there are no indications of a performance problem at 
the client for now (no slow ops - everything seems to work fine). It is a 
multisite cluster with CephFS & S3 in use, as well as pubsub. It is running 
Ceph version 15.2.13.

We run automated client load tests against this system every day and have been 
doing that for a year or longer against this system. The key counts of the 
large OMAP objects in question are growing, I've monitored this over a period 
of several months. Intuitively I gather this means at some point in the future 
I will hit performance problems as a result of this.

Large OMAP objects are split across two pools: siteApubsub.rgw.log and 
siteApubsub.rgw.buckets.index. My client is responsible for processing the 
pubsub queue. It appears to be doing that successfully: there are no objects in 
the pubsub data pool as shown in the details below.

I've been keeping a spreadsheet to track the growth of these, assuming I can't 
attach a file to the mailing list so I've uploaded an image of it here: 
https://imgur.com/a/gAtAcvp. The data shows constant growth of all of these 
objects through the last couple of months. It also includes the names of the 
objects, where there are two categories:

  *   16 instances of objects with names like: 9:03d18f4d:::data_log.47:head
  *   16 instances of objects with names like: 
13:0118e6b8:::.dir.4f442377-4b71-4c6a-aaa9-ba945d7694f8.84778.1.15:head

Please find output of a few Ceph commands below giving details of the cluster.

  *   I'm really keen to understand this better and would be more than happy to 
share additional diags.
  *   I'd like to understand what I need to do to remove these large OMAP 
objects and prevent future build ups, so I don't need to worry about the 
stability of this system.

Thanks,
Alex


$ ceph -s
id: 0b91b8be-3e01-4240-bea5-df01c7e53b7c
health: HEALTH_WARN
32 large omap objects

  services:
mon: 3 daemons, quorum albans_sc0,albans_sc1,albans_sc2 (age 6w)
mgr: albans_sc2(active, since 6w), standbys: albans_sc1, albans_sc0
mds: cephfs:1 {0=albans_sc2=up:active} 2 up:standby
osd: 3 osds: 3 up (since 6w), 3 in (since 10M)
rgw: 6 daemons active (albans_sc0.pubsub, albans_sc0.rgw0, 
albans_sc1.pubsub, albans_sc1.rgw0, albans_sc2.pubsub, albans_sc2.rgw0)

  task status:

  data:
pools:   14 pools, 137 pgs
objects: 4.52M objects, 160 GiB
usage:   536 GiB used, 514 GiB / 1.0 TiB avail
pgs: 137 active+clean

  io:
client:   28 MiB/s rd, 1.2 MiB/s wr, 673 op/s rd, 189 op/s wr


$ ceph health detail
HEALTH_WARN 32 large omap objects
[WRN] LARGE_OMAP_OBJECTS: 32 large omap objects
16 large objects found in pool 'siteApubsub.rgw.log'
16 large objects found in pool 'siteApubsub.rgw.buckets.index'
Search the cluster log for 'Large omap object found' for more details.

$ ceph df
--- RAW STORAGE ---
CLASS  SIZE AVAILUSED RAW USED  %RAW USED
ssd1.0 TiB  514 GiB  496 GiB   536 GiB  51.07
TOTAL  1.0 TiB  514 GiB  496 GiB   536 GiB  51.07

--- POOLS ---
POOL   ID  PGS  STORED   OBJECTS  USED %USED  MAX 
AVAIL
device_health_metrics   11  0 B0  0 B  0153 
GiB
cephfs_data 2   32  135 GiB1.99M  415 GiB  47.50153 
GiB
cephfs_metadata 3   32  3.3 GiB2.09M  9.8 GiB   2.09153 
GiB
siteA.rgw.buckets.data  4   32   24 GiB  438.62k   80 GiB  14.88153 
GiB
.rgw.root   54   19 KiB   29  1.3 MiB  0153 
GiB
siteA.rgw.log   64   79 MiB  799  247 MiB   0.05153 
GiB
siteA.rgw.control   74  0 B8  0 B  0153 
GiB
siteA.rgw.meta  84   13 KiB   37  1.6 MiB  0153 
GiB
siteApubsub.rgw.log 94  1.9 GiB  789  5.7 GiB   1.22153 
GiB
siteA.rgw.buckets.index104  456 MiB   31  1.3 GiB   0.29153 
GiB
siteApubsub.rgw.control114  0 B8  0 B  0153 
GiB
siteApubsub.rgw.meta   124   11 KiB   40  1.7 MiB  0153 
GiB
siteApubsub.rgw.buckets.index  134  2.0 GiB   47  6.1 GiB   1.31153 
GiB
siteApubsub.rgw.buckets.data   144  0 B0  0 B  0153 
GiB





___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MGR failures and pg autoscaler

2022-10-25 Thread Andreas Haupt
Hi Giuseppe,

On Tue, 2022-10-25 at 07:54 +, Lo Re  Giuseppe wrote:
> “””
> 
> In the mgr logs I see:
> “””
> 
> debug 2022-10-20T23:09:03.859+ 7fba5f300700  0 [pg_autoscaler ERROR root] 
> pool 2 has overlapping roots: {-60, -1}

This is unrelated, I asked the same question some days ago:

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/message/OZTOVT2TXEA23NI2TPTWD3WU2AZM6YSH/

Starting with Pacific the autoscaler is unable to deal with mixed pools
spread over different storage device classes. Although this is documented,
I'd call it a regression - the same kind of setup still worked with
autoscaler in Octopus.

You will find the overlapping roots by listing the device-class-based
shadow entries:

ceph osd crush tree --show-shadow


Regarding your problem, you need to look for further errors. Last time an
mgr module failed here it was due to some missing python modules ...

Something suspicious in the output of "ceph crash ls" ?

Cheers,
Andreas
-- 
| Andreas Haupt| E-Mail: andreas.ha...@desy.de
|  DESY Zeuthen| WWW:http://www-zeuthen.desy.de/~ahaupt
|  Platanenallee 6 | Phone:  +49/33762/7-7359
|  D-15738 Zeuthen | Fax:+49/33762/7-7216

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MGR process regularly not responding

2022-10-25 Thread Eugen Block

Hi,

I see the same on different Nautilus clusters, I was pointed to this  
tracker issue: https://tracker.ceph.com/issues/39264
In one cluster disabling the prometheus module seemed to have stopped  
the failing MGRs. But they happen so rarely that it might be something  
different and we just didn't wait long enough. So it seems to be a  
reoccuring issue, you could try to see if it occurs with disabled  
prometheus mgr module, if you use it, of course.
Just two days ago we had the same thing in another cluster where the  
prometheus module is disabled, so there it might be something else  
just with similar symptoms.


Regards,
Eugen

Zitat von Gilles Mocellin :


Hi,

In our Ceph Pacific clusters (16.2.10) (1 for OpenStack and S3, 2  
for backup on RBD and S3),
since the upgrade to Pacific, we have regularly the MGR not  
responding, not seen anymore in ceph status.

The process is still there.
Noting in the MGR log, just no more logs.

Restarting the service make it come back.

When all MGR are down, we have a warning in ceph status, but not before.

I can't find a similar bug in the Tracker.

Does someone also have that symptom ?
Do you have a workaround or solution ?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] post-mortem of a ceph disruption

2022-10-25 Thread Simon Oosthoek

Dear list,

recently we experienced a short outage of our ceph storage, it was a 
surprising cause, and probably indicates a subtle misconfiguration on 
our part, I'm hoping for a useful suggestion ;-)


We are running a 3PB cluster with 21 osd nodes (spread across 3 
datacenters), 3 mon/mgrs and 2mds nodes. Currently we are on octopus 
15.2.16 (will upgrade to .17 soon).
The cluster has a single network interface (most are a bond) with 
25Gbit/s. The physical nodes are all Dell AMD EPYC hardware.


The "cluster network" and "public network" configurations in 
/etc/ceph/ceph.conf were all set to 0.0.0.0/0 since we only have a 
single interface for all Ceph nodes (or so we thought...)


Our nodes are managed using cfengine3 (community), though we avoid 
package upgrades during normal operation. New packages are installed 
though, if commanded by cfengine.


Last Sunday at around 23:05 (local time) we experienced a short network 
glitch (an MLAG link lost one sublink for 4 seconds)), our logs show 
that it should have been relatively painless, since the peer-link took 
over and after 4s the MLAG went back to FULL mode. However, it seems a 
lot of ceph-osd services restarted or re-connected to the network and 
failed to find the other ceph osd's. They consequently shut themselves 
down. Shortly after this happened, the ceph services became unavailable 
due to not enough osd nodes, so services of ours depending on ceph 
became unavailable as well.


At this point I was able to start trying to fix it, I tried rebooting a 
ceph osd machine and also tried restarting just the osd services on the 
nodes. Both seemed to work and I could soon turn in when all was well again.


When trying to understand what had happened, we obviously suspected all 
kinds of unrelated things (the ceph logs are way too noisy to quickly 
get to the point), but one thing "osd.54 662927 set_numa_affinity unable 
to identify public interface '' numa node: (2) No such file or 
directory" turned out to be more important than we first thought after 
some googling. 
(https://forum.proxmox.com/threads/ceph-set_numa_affinity-unable-to-identify-public-interface.58239/)


We couldn't understand why the network glitch could cause such a massive 
die-off of ceph-osd services.
In the assumption that sooner or later we were going to need some help 
with this, it seemed a good idea to first try to get busy updating the 
nodes to latest and then supported releases of ceph, so we started the 
upgrade to 15.2.17 today.


The upgrade of the 2 virtual and 1 physical mon went OK, also the first 
osd node was fine. But on the second osd node, the osd services would 
not keep running after the upgrade+reboot.


Again we noticed this numa message, but now 6 times in a row and then 
the nice: "_committed_osd_maps marked down 6 > osd_max_markdown_count 5 
in last 600.00 seconds, shutting down"

and
"received  signal: Interrupt from Kernel"

At this point, one of noticed that a strange ip adress was mentioned; 
169.254.0.2, it turns out that a recently added package (openmanage) and 
some configuration had added this interface and address to hardware 
nodes from Dell. For us, our single interface assumption is now out the 
window and 0.0.0.0/0 is a bad idea in /etc/ceph/ceph.conf for public and 
cluster network (though it's the same network for us).


Our 3 datacenters are on three different subnets so it becomes a bit 
difficult to make it more specific. The nodes are all under the same 
/16, so we can choose that, but it is starting to look like a weird 
network setup.
I've always thought that this configuration was kind of non-intuitive 
and I still do. And now it has bitten us :-(



Thanks for reading and if you have any suggestions on how to fix/prevent 
this kind of error, we'll be glad to hear it!


Cheers

/Simon
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: What is the use case of your Ceph cluster? Developers want to know!

2022-10-25 Thread Laura Flores
Reminder that the survey will close this Friday, October 28th at 10:00pm
UTC. Thanks to everyone who has submitted responses!

On Thu, Oct 20, 2022 at 10:25 AM Laura Flores  wrote:

> Dear Ceph Users,
>
> Ceph developers and doc writers are looking for responses from people in
> the user/dev community who have experience with a Ceph cluster. Our
> question: *What is the use case of your Ceph cluster?*
>
> Since the first official Argonaut release in 2012, Ceph has greatly
> expanded its features and user base. With the next major release on the
> horizon, developers are now more curious than ever to know how people are
> using their clusters in the wild.
>
> Our goal is to share these insightful results with the community, as well
> as make it easy for beginning developers (e.g. students from Google
> Summer of Code, Outreachy, or Grace Hopper) to understand all the ways
> that Ceph can be used.
>
> We plan to add interesting use cases to our website 
> [1] and/or documentation  [2].
>
> In completing this survey, you'll have the option of providing your name
> or remaining anonymous. If your use case is chosen to include on the
> website or documentation, we will be sure to honor your choice of being
> recognized or remaining anonymous.
>
> Follow this link
> 
> [3] to begin the survey. Feel free to reach out to me with any questions!
>
> - Laura Flores
>
> 1. Ceph website: https://ceph.io/
> 2. Ceph documentation: https://docs.ceph.com/en/latest/
> 3. Survey link:
> https://docs.google.com/forms/d/e/1FAIpQLSceR8i2vmjdL34hbkhqyU5dAJjZKzjVokx2rI4sB2n1Q0fHKA/viewform?usp=sf_link
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage
>
> Red Hat Inc. 
>
> Chicago, IL
>
> lflo...@redhat.com
> M: +17087388804
> @RedHat    Red Hat
>   Red Hat
> 
> 
>
>

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage

Red Hat Inc. 

Chicago, IL

lflo...@redhat.com
M: +17087388804
@RedHat    Red Hat
  Red Hat


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MGR process regularly not responding

2022-10-25 Thread Gilles Mocellin
Thank you,

Indeed, the last messages I have in the logs are Prometheus access.

But I use Prometheus (for the Ceph Dashboard and other Grafana Dashboards). I 
can't really deactivate it for days...

One thing I can try instead of restarting the MGR is to disable/enable the 
Prometheus module, and perhaps other modules too...

But It will be great to have more logs.

Ah, one other thing I saw: the process as still many TCP connections (I use 
lsof), but in CLOSE_WAIT state.


Le mardi 25 octobre 2022, 16:03:43 CEST Eugen Block a ?crit :
> Hi,
> 
> I see the same on different Nautilus clusters, I was pointed to this
> tracker issue: https://tracker.ceph.com/issues/39264
> In one cluster disabling the prometheus module seemed to have stopped
> the failing MGRs. But they happen so rarely that it might be something
> different and we just didn't wait long enough. So it seems to be a
> reoccuring issue, you could try to see if it occurs with disabled
> prometheus mgr module, if you use it, of course.
> Just two days ago we had the same thing in another cluster where the
> prometheus module is disabled, so there it might be something else
> just with similar symptoms.
> 
> Regards,
> Eugen
> 
> Zitat von Gilles Mocellin :
> > Hi,
> > 
> > In our Ceph Pacific clusters (16.2.10) (1 for OpenStack and S3, 2
> > for backup on RBD and S3),
> > since the upgrade to Pacific, we have regularly the MGR not
> > responding, not seen anymore in ceph status.
> > The process is still there.
> > Noting in the MGR log, just no more logs.
> > 
> > Restarting the service make it come back.
> > 
> > When all MGR are down, we have a warning in ceph status, but not before.
> > 
> > I can't find a similar bug in the Tracker.
> > 
> > Does someone also have that symptom ?
> > Do you have a workaround or solution ?
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: MGR failures and pg autoscaler

2022-10-25 Thread Satoru Takeuchi
Hi Lo,

2022年10月25日(火) 18:01 Lo Re Giuseppe :
>
> I have found the logs showing the progress module failure:
>
> debug 2022-10-25T05:06:08.877+ 7f40868e7700  0 [rbd_support INFO root] 
> execute_trash_remove: task={"sequence": 150, "id": 
> "fcc864a0-9bde-4512-9f84-be10976613db", "message": "Removing i
> mage fulen-hdd/f3f237d2f7e304 from trash", "refs": {"action": "trash remove", 
> "pool_name": "fulen-hdd", "pool_namespace": "", "image_id": 
> "f3f237d2f7e304"}, "in_progress": true, "progress"
> : 0.0}
> debug 2022-10-25T05:06:08.884+ 7f4106e90700 -1 log_channel(cluster) log 
> [ERR] : Unhandled exception from module 'progress' while running on 
> mgr.naret-monitor03.escwyg: ('42efb95d-ceaa-4a91-a9b2-b91f65f1834d',)
> debug 2022-10-25T05:06:08.884+ 7f4106e90700 -1 progress.serve:
> debug 2022-10-25T05:06:08.897+ 7f4139e96700  0 log_channel(audit) log 
> [DBG] : from='client.22182342 -' entity='client.combin' 
> cmd=[{"format":"json","group_name":"combin","prefix":"fs subvolume 
> info","sub_name":"combin-4b53e28d-2f59-11ed-8aa5-9aa9e2c5aae2","vol_name":"cephfs"}]:
>  dispatch
> debug 2022-10-25T05:06:08.884+ 7f4106e90700 -1 Traceback (most recent 
> call last):
>   File "/usr/share/ceph/mgr/progress/module.py", line 716, in serve
> self._process_pg_summary()
>   File "/usr/share/ceph/mgr/progress/module.py", line 629, in 
> _process_pg_summary
> ev = self._events[ev_id]
> KeyError: '42efb95d-ceaa-4a91-a9b2-b91f65f1834d'

I encountered a similar problem and reported this to this ML.

https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/Q7A3TM6Z3XMRJPRBSHWGGACR653ICWXT/

I guess that you have multiple CRUSH rules and at least one pool uses
default root.
I'm not sure the detail of your question but hope this information help you.

Thanks,
Satoru

>
>
>
>
> On 25.10.22, 09:58, "Lo Re  Giuseppe"  wrote:
>
> Hi,
> Since some weeks we started to us pg autoscale on our pools.
> We run with version 16.2.7.
> Maybe a coincidence, maybe not,  from some weeks we started to experience 
> mgr progress module failures:
>
> “””
> [root@naret-monitor01 ~]# ceph -s
>   cluster:
> id: 63334166-d991-11eb-99de-40a6b72108d0
> health: HEALTH_ERR
> Module 'progress' has failed: 
> ('346ee7e0-35f0-4fdf-960e-a36e7e2441e4',)
> 1 pool(s) full  services:
> mon: 3 daemons, quorum 
> naret-monitor01,naret-monitor02,naret-monitor03 (age 5d)
> mgr: naret-monitor02.ciqvgv(active, since 6d), standbys: 
> naret-monitor03.escwyg, naret-monitor01.suwugf
> mds: 1/1 daemons up, 2 standby
> osd: 760 osds: 760 up (since 4d), 760 in (since 4d); 10 remapped pgs
> rgw: 3 daemons active (3 hosts, 1 zones)  data:
> volumes: 1/1 healthy
> pools:   32 pools, 6250 pgs
> objects: 977.79M objects, 3.6 PiB
> usage:   5.7 PiB used, 5.1 PiB / 11 PiB avail
> pgs: 4602612/5990777501 objects misplaced (0.077%)
>  6214 active+clean
>  25   active+clean+scrubbing+deep
>  10   active+remapped+backfilling
>  1active+clean+scrubbing  io:
> client:   243 MiB/s rd, 292 MiB/s wr, 1.68k op/s rd, 842 op/s wr
> recovery: 430 MiB/s, 109 objects/s  progress:
> Global Recovery Event (14h)
>   [===.] (remaining: 70s)
> “””
>
> In the mgr logs I see:
> “””
>
> debug 2022-10-20T23:09:03.859+ 7fba5f300700  0 [pg_autoscaler ERROR 
> root] pool 2 has overlapping roots: {-60, -1}
>
> debug 2022-10-20T23:09:03.863+ 7fba5f300700  0 [pg_autoscaler ERROR 
> root] pool 3 has overlapping roots: {-60, -1, -2}
>
> debug 2022-10-20T23:09:03.866+ 7fba5f300700  0 [pg_autoscaler ERROR 
> root] pool 5 has overlapping roots: {-60, -1, -2}
>
> debug 2022-10-20T23:09:03.870+ 7fba5f300700  0 [pg_autoscaler ERROR 
> root] pool 6 has overlapping roots: {-60, -1, -2}
>
> debug 2022-10-20T23:09:03.873+ 7fba5f300700  0 [pg_autoscaler ERROR 
> root] pool 10 has overlapping roots: {-105, -60,
>
> -1, -2}
>
> debug 2022-10-20T23:09:03.877+ 7fba5f300700  0 [pg_autoscaler ERROR 
> root] pool 11 has overlapping roots: {-105, -60,
>
> -1, -2}
>
> debug 2022-10-20T23:09:03.880+ 7fba5f300700  0 [pg_autoscaler ERROR 
> root] pool 12 has overlapping roots: {-105, -60,
>
> -1, -2}
>
> debug 2022-10-20T23:09:03.884+ 7fba5f300700  0 [pg_autoscaler ERROR 
> root] pool 13 has overlapping roots: {-105, -60,
>
> -1, -2}
>
> debug 2022-10-20T23:09:03.887+ 7fba5f300700  0 [pg_autoscaler ERROR 
> root] pool 14 has overlapping roots: {-105, -60,
>
> -1, -2}
>
> debug 2022-10-20T23:09:03.891+ 7fba5f300700  0 [pg_autoscaler ERROR 
> root] pool 15 has overlapping roots: {-105, -60,
>
> -1, -2}
>
> debug 2022-10-20T23:09:03.894+ 7fba5f300700  0 [pg_autoscaler ERROR 
> root] pool 26 has

[ceph-users] Statefull set usage with ceph storage class

2022-10-25 Thread Oğuz Yarımtepe
Hi,

I am wondering whether there are good or bad experiences using Ceph with
application level replication also, like mongo or kafka on Kubernetes. Is
it a bad idea to define separate pools and storage classes for such
applications and define replication factor 1?

Regards.
-- 
Oğuz Yarımtepe
http://about.me/oguzy
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io