[ceph-users] Re: Quincy recovery load

2022-07-25 Thread Satoru Takeuchi
I'm trying to upgrade my Pacific cluster to Quincy and found this
thread. Let me confirm a few things.

- Does this problem not exist in Pacific and older versions?
- Does this problem happen only if `osd_op_queue=mclock_scheduler`?
- Do all parameters written in the OPERATIONS section not work if
mclock scheduler is used?
  
https://docs.ceph.com/en/pacific/rados/configuration/osd-config-ref/#operations

Thanks,
Satoru

2022年7月22日(金) 12:33 Sridhar Seshasayee :
>
> On Fri, Jul 22, 2022 at 12:47 AM Sridhar Seshasayee 
> wrote:
>
> > I forgot to mention that the charts show CPU utilization when both client
> > ops and recoveries are going on. The steep drop in CPU utilization is when
> > client ops are stopped but recoveries are still going on.
> >
>
> It looks like the charts were filtered out. In case you wish to see the
> charts, they have
> been uploaded to the PR as part of the following comment:
>
> https://github.com/ceph/ceph/pull/47216#issuecomment-1192141530
>
> -Sridhar
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph health "overall_status": "HEALTH_WARN"

2022-07-25 Thread Konstantin Shalygin
Hi,

The Mimic have many HEALTH troubles like this
Mimic is EOL for a years, I suggest you to upgrade to Nautilus 14.2.22 at least


k

> On 25 Jul 2022, at 11:45, Frank Schilder  wrote:
> 
> Hi all,
> 
> I made a strange observation on our cluster. The command ceph status -f 
> json-pretty returns at the beginning
> 
>"health": {
>"checks": {},
>"status": "HEALTH_OK",
>"overall_status": "HEALTH_WARN"
>},
> 
> I'm a bit worried about what "overall_status": "HEALTH_WARN" could mean in 
> this context. I can't seem to find any more info about that. Ceph health 
> detail returns HEALTH_OK.
> 
> Any hint is welcome. version is mimic 13.2.10.
> 
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph health "overall_status": "HEALTH_WARN"

2022-07-25 Thread Monish Selvaraj
Hi all,

Recently, I deployed ceph orch ( pacific ) in my nodes with 5 mons 5 mgrs
238 osds and 5 rgw.

Yesterday , 4 osds went out and 2 rgws down. So, i restart whole rgw by
"ceph orch restart rgw.rgw". After two minutes , the whole rgw nodes goes
down.

Then I turned up the 4 osds and also waited to become the ceph Health OK.
But, rgw services is up and running and the port is binding.

RGW logs :

root@cephr04:/var/log/ceph/0df2c8fe-fdf1-11ec-9713-b175dcec685a# tail -800
ceph-client.rgw.rgw.cephr04.wpfaui.log
2022-07-25T06:06:37.528+ 7f89423035c0 0 deferred set uid:gid to 167:167
(ceph:ceph)
2022-07-25T06:06:37.528+ 7f89423035c0 0 ceph version 16.2.9
(4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable), process
radosgw, pid 6
2022-07-25T06:06:37.528+ 7f89423035c0 0 framework: beast
2022-07-25T06:06:37.528+ 7f89423035c0 0 framework conf key: port, val:
80
2022-07-25T06:06:37.528+ 7f89423035c0 1 radosgw_Main not setting numa
affinity
2022-07-25T06:11:37.529+ 7f892d9be700 -1 Initialization timeout, failed
to initialize
2022-07-25T06:11:47.841+ 7fae36b985c0 0 deferred set uid:gid to 167:167
(ceph:ceph)
2022-07-25T06:11:47.841+ 7fae36b985c0 0 ceph version 16.2.9
(4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable), process
radosgw, pid 7
2022-07-25T06:11:47.841+ 7fae36b985c0 0 framework: beast
2022-07-25T06:11:47.841+ 7fae36b985c0 0 framework conf key: port, val:
80
2022-07-25T06:11:47.841+ 7fae36b985c0 1 radosgw_Main not setting numa
affinity
2022-07-25T06:16:47.842+ 7fae22253700 -1 Initialization timeout, failed
to initialize
2022-07-25T06:16:58.114+ 7fb4bac385c0 0 deferred set uid:gid to 167:167
(ceph:ceph)
2022-07-25T06:16:58.114+ 7fb4bac385c0 0 ceph version 16.2.9
(4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable), process
radosgw, pid 7
2022-07-25T06:16:58.114+ 7fb4bac385c0 0 framework: beast
2022-07-25T06:16:58.114+ 7fb4bac385c0 0 framework conf key: port, val:
80
2022-07-25T06:16:58.114+ 7fb4bac385c0 1 radosgw_Main not setting numa
affinity
2022-07-25T06:21:58.111+ 7fb4a62f3700 -1 Initialization timeout, failed
to initialize
2022-07-25T06:22:08.359+ 7f4b33dbd5c0 0 deferred set uid:gid to 167:167
(ceph:ceph)
2022-07-25T06:22:08.359+ 7f4b33dbd5c0 0 ceph version 16.2.9
(4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable), process
radosgw, pid 7
2022-07-25T06:22:08.359+ 7f4b33dbd5c0 0 framework: beast
2022-07-25T06:22:08.359+ 7f4b33dbd5c0 0 framework conf key: port, val:
80
2022-07-25T06:22:08.359+ 7f4b33dbd5c0 1 radosgw_Main not setting numa
affinity
2022-07-25T06:25:03.189+ 7fa6920085c0 0 deferred set uid:gid to 167:167
(ceph:ceph)
2022-07-25T06:25:03.189+ 7fa6920085c0 0 ceph version 16.2.9
(4c3647a322c0ff5a1dd2344e039859dcbd28c830) pacific (stable), process
radosgw, pid 7
2022-07-25T06:25:03.189+ 7fa6920085c0 0 framework: beast
2022-07-25T06:25:03.189+ 7fa6920085c0 0 framework conf key: port, val:
80
2022-07-25T06:25:03.189+ 7fa6920085c0 1 radosgw_Main not setting numa
affinity

Environment:

OS Ubunutu-20.04
Kernel 5.4.0-122-generic
Docker version 20.10.17
Ceph version ceph version 16.2.9 (4c3647a322c0ff5a1dd2344e039859dcbd28c830)
pacific (stable)

On Mon, Jul 25, 2022 at 2:54 PM Konstantin Shalygin  wrote:

> Hi,
>
> The Mimic have many HEALTH troubles like this
> Mimic is EOL for a years, I suggest you to upgrade to Nautilus 14.2.22 at
> least
>
>
> k
>
> > On 25 Jul 2022, at 11:45, Frank Schilder  wrote:
> >
> > Hi all,
> >
> > I made a strange observation on our cluster. The command ceph status -f
> json-pretty returns at the beginning
> >
> >"health": {
> >"checks": {},
> >"status": "HEALTH_OK",
> >"overall_status": "HEALTH_WARN"
> >},
> >
> > I'm a bit worried about what "overall_status": "HEALTH_WARN" could mean
> in this context. I can't seem to find any more info about that. Ceph health
> detail returns HEALTH_OK.
> >
> > Any hint is welcome. version is mimic 13.2.10.
> >
> > Best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Quincy recovery load

2022-07-25 Thread Sridhar Seshasayee
On Mon, Jul 25, 2022 at 2:05 PM Satoru Takeuchi 
wrote:

>
> - Does this problem not exist in Pacific and older versions?
>
This problem does not exist in Pacific and prior versions. On Pacific, the
default osd_op_queue
is set to 'wpq'  and so this issue is not observed.

- Does this problem happen only if `osd_op_queue=mclock_scheduler`?
>
Yes, that's correct.

- Do all parameters written in the OPERATIONS section not work if
> mclock scheduler is used?
>
> https://docs.ceph.com/en/pacific/rados/configuration/osd-config-ref/#operations
>
> Only the sleep and priority related parameters will not work if mclock
scheduler is used.

-Sridhar


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Quincy recovery load

2022-07-25 Thread Satoru Takeuchi
2022年7月25日(月) 18:45 Sridhar Seshasayee :

>
>
> On Mon, Jul 25, 2022 at 2:05 PM Satoru Takeuchi 
> wrote:
>
>>
>> - Does this problem not exist in Pacific and older versions?
>>
> This problem does not exist in Pacific and prior versions. On Pacific, the
> default osd_op_queue
> is set to 'wpq'  and so this issue is not observed.
>
> - Does this problem happen only if `osd_op_queue=mclock_scheduler`?
>>
> Yes, that's correct.
>
> - Do all parameters written in the OPERATIONS section not work if
>> mclock scheduler is used?
>>
>> https://docs.ceph.com/en/pacific/rados/configuration/osd-config-ref/#operations
>>
>> Only the sleep and priority related parameters will not work if mclock
> scheduler is used.
>
> -Sridhar
>

Thank you very much!

Satoru


>
> 
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
I transitioned some servers to a new rack and now I'm having major issues
with Ceph upon bringing things back up.

I believe the issue may be related to the ceph nodes coming back up with
different IPs before VLANs were set.  That's just a guess because I can't
think of any other reason this would happen.

Current state:

Every 2.0s: ceph -s
   cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022

  cluster:
id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d
health: HEALTH_WARN
1 filesystem is degraded
2 MDSs report slow metadata IOs
2/5 mons down, quorum cn02,cn03,cn01
9 osds down
3 hosts (17 osds) down
Reduced data availability: 97 pgs inactive, 9 pgs down
Degraded data redundancy: 13860144/30824413 objects degraded
(44.965%), 411 pgs degraded, 482 pgs undersized

  services:
mon: 5 daemons, quorum cn02,cn03,cn01 (age 62m), out of quorum: cn05,
cn04
mgr: cn02.arszct(active, since 5m)
mds: 2/2 daemons up, 2 standby
osd: 35 osds: 15 up (since 62m), 24 in (since 58m); 222 remapped pgs

  data:
volumes: 1/2 healthy, 1 recovering
pools:   8 pools, 545 pgs
objects: 7.71M objects, 6.7 TiB
usage:   15 TiB used, 39 TiB / 54 TiB avail
pgs: 0.367% pgs unknown
 17.431% pgs not active
 13860144/30824413 objects degraded (44.965%)
 1137693/30824413 objects misplaced (3.691%)
 280 active+undersized+degraded
 67  undersized+degraded+remapped+backfilling+peered
 57  active+undersized+remapped
 45  active+clean+remapped
 44  active+undersized+degraded+remapped+backfilling
 18  undersized+degraded+peered
 10  active+undersized
 9   down
 7   active+clean
 3   active+undersized+remapped+backfilling
 2   active+undersized+degraded+remapped+backfill_wait
 2   unknown
 1   undersized+peered

  io:
client:   170 B/s rd, 0 op/s rd, 0 op/s wr
recovery: 168 MiB/s, 158 keys/s, 166 objects/s

I have to disable and re-enable the dashboard just to use it.  It seems to
get bogged down after a few moments.

The three servers that were moved to the new rack Ceph has marked as
"Down", but if I do a cephadm host-check, they all seem to pass:

 ceph  
- cn01.ceph.-
podman (/usr/bin/podman) version 4.0.2 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
- cn02.ceph.-
podman (/usr/bin/podman) version 4.0.2 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
- cn03.ceph.-
podman (/usr/bin/podman) version 4.0.2 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
- cn04.ceph.-
podman (/usr/bin/podman) version 4.0.2 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
- cn05.ceph.-
podman|docker (/usr/bin/podman) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
- cn06.ceph.-
podman (/usr/bin/podman) version 4.0.2 is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK

It seems to be recovering with what it has left, but a large amount of OSDs
are down.  When trying to restart one of the down'd OSDs, I see a huge dump.

Jul 25 03:19:38 cn06.ceph
ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
2022-07-25T10:19:38.532+ 7fce14a6c080  0 osd.34 30689 done with init,
starting boot process
Jul 25 03:19:38 cn06.ceph
ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
2022-07-25T10:19:38.532+ 7fce14a6c080  1 osd.34 30689 start_boot
Jul 25 03:20:10 cn06.ceph
ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
2022-07-25T10:20:10.655+ 7fcdfd12d700  1 osd.34 30689 start_boot
Jul 25 03:20:41 cn06.ceph
ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
2022-07-25T10:20:41.159+ 7fcdfd12d700  1 osd.34 30689 start_boot
Jul 25 03:21:11 cn06.ceph
ceph-bfa2ad58-c049-11eb-9098-3c8cf8ed728d-osd-34[9516]: debug
2022-07-25T10:21:11.662+ 7fcdfd12d700  1 osd.34 30689 start_boot

At this point it just keeps printing start_boot, but the dashboard has it
marked as "in" but "down".

On these three hosts that moved, there were a bunch marked as "out" and
"down", and some with "in" but "down".

Not sure where to go next.  I'm going to let the recovery continue and hope
that my 4x replication on these pools saves me.

Not sure where to go from here.  Any help is very much appreciated.  This
Ceph cluster holds all of our Cloudstack images...  it would be terrible to
lose this data.

[ceph-users] Default erasure code profile not working for 3 node cluster?

2022-07-25 Thread Mark S. Holliman
Dear All,

I've recently setup a 3 node Ceph Quincy (17.2) cluster to serve a pair of 
CephFS mounts for a Slurm cluster. Each ceph node has 6 x SSD and 6 x HDD, and 
I've setup the pools and crush rules to create separate CephFS filesystems 
using the different disk classes. I used the default erasure-code-profile to 
create the pools (see details below), as the documentation states that it works 
on a 3 node cluster. The system looked healthy after the initial setup, but now 
a few weeks in I'm seeing signs of problems: a growing count of pgs not 
deep-scrubbed in time, significant numbers of pgs in 
"active+undersized"/"active+undersized+degraded", most pgs in a 
"active+clean+remapped" state, and no recovery activity.

I looked at some of the pgs in the stuck states, and noticed that they all list 
a "NONE" OSD in their 'last acting' list, which points to this issue: 
https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/#erasure-coded-pgs-are-not-active-clean
 .

It's likely that is what is causing the pgs to get stuck in a degraded state 
and the ever growing list of late deep scrubs. But I'm confused why the 
documentation states that the default erasure code should work on a 3 node 
cluster - 
https://docs.ceph.com/en/latest/rados/operations/erasure-code/#creating-a-sample-erasure-coded-pool
  Is this documentation in error? Or is there something else going on with my 
setup? What is an ideal erasure code profile for a 3 node system?

Cheers,
  Mark

### Commands used to create the CephFS filesystem ###
ceph osd pool create cephfsHDD_data 1024 1024 erasure
ceph osd pool create cephfsHDD_metadata 64 64
ceph osd erasure-code-profile set dataHDD crush-device-class=hdd
ceph osd crush rule create-erasure dataHDD dataHDD
ceph osd pool set cephfsHDD_data crush_rule dataHDD
ceph osd pool set cephfsHDD_data allow_ec_overwrites true
ceph fs new cephfsHDD cephfsHDD_metadata cephfsHDD_data -force

### Example Status
health: HEALTH_WARN
Degraded data redundancy: 750/10450 objects degraded (7.177%), 313 
pgs degraded, 775 pgs undersized
887 pgs not deep-scrubbed in time
887 pgs not scrubbed in time
  services:
mon: 3 daemons...
mgr: ...
mds: 2/2 daemons up, 2 standby
osd: 36 osds: 36 up (since 27h), 36 in (since 5w); 1272 remapped pgs
  data:
volumes: 2/2 healthy
pools:   5 pools, 2176 pgs
objects: 2.82k objects, 361 MiB
usage:   40 GiB used, 262 TiB / 262 TiB avail
pgs: 750/10450 objects degraded (7.177%)
 1240/10450 objects misplaced (11.866%)
 1272 active+clean+remapped
 462  active+undersized
 313  active+undersized+degraded
 129  active+clean

### Erasure Code Profile
k=2
m=2
plugin=jerasure
technique=reed_sol_van

### Pool details
root@dokkalfar01:~# ceph osd pool get cephfsHDD_data all
size: 4
min_size: 3
pg_num: 1023
pgp_num: 972
crush_rule: dataHDD
hashpspool: true
allow_ec_overwrites: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
erasure_code_profile: default
fast_read: 0
pg_autoscale_mode: on
eio: false
bulk: false


### Example health details of unhappy pgs
pg 3.282 is stuck undersized for 27h, current state 
active+undersized+degraded, last acting [29,15,5,NONE]
pg 3.285 is stuck undersized for 27h, current state 
active+undersized+degraded, last acting [0,17,28,NONE]
pg 3.286 is stuck undersized for 27h, current state 
active+undersized+degraded, last acting [3,17,26,NONE]
pg 3.288 is stuck undersized for 27h, current state 
active+undersized+degraded, last acting [13,NONE,0,24]
pg 3.28e is stuck undersized for 27h, current state 
active+undersized+degraded, last acting [28,NONE,5,14]
pg 3.297 is stuck undersized for 27h, current state 
active+undersized+degraded, last acting [25,5,13,NONE]



---
Mark Holliman
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh

The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Default erasure code profile not working for 3 node cluster?

2022-07-25 Thread Levin Ng
Hi Mark,


K=2 + M=2 EC profile with set to host failure domain will require at least 4 
node. “The simplest erasure coded pool is equivalent to 
RAID5 and requires 
at least three hosts:”.  This is assume your EC Profile is K=2+M=1 which is 
technically minimum configuration, but  generally NOT recommended for data 
durability.

Regards, Levin






From: Mark S. Holliman 
Date: Monday, 25 July 2022 at 21:15
To: ceph-users@ceph.io 
Subject: [ceph-users] Default erasure code profile not working for 3 node 
cluster?
Dear All,

I've recently setup a 3 node Ceph Quincy (17.2) cluster to serve a pair of 
CephFS mounts for a Slurm cluster. Each ceph node has 6 x SSD and 6 x HDD, and 
I've setup the pools and crush rules to create separate CephFS filesystems 
using the different disk classes. I used the default erasure-code-profile to 
create the pools (see details below), as the documentation states that it works 
on a 3 node cluster. The system looked healthy after the initial setup, but now 
a few weeks in I'm seeing signs of problems: a growing count of pgs not 
deep-scrubbed in time, significant numbers of pgs in 
"active+undersized"/"active+undersized+degraded", most pgs in a 
"active+clean+remapped" state, and no recovery activity.

I looked at some of the pgs in the stuck states, and noticed that they all list 
a "NONE" OSD in their 'last acting' list, which points to this issue: 
https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/#erasure-coded-pgs-are-not-active-clean
 .

It's likely that is what is causing the pgs to get stuck in a degraded state 
and the ever growing list of late deep scrubs. But I'm confused why the 
documentation states that the default erasure code should work on a 3 node 
cluster - 
https://docs.ceph.com/en/latest/rados/operations/erasure-code/#creating-a-sample-erasure-coded-pool
  Is this documentation in error? Or is there something else going on with my 
setup? What is an ideal erasure code profile for a 3 node system?

Cheers,
  Mark

### Commands used to create the CephFS filesystem ###
ceph osd pool create cephfsHDD_data 1024 1024 erasure
ceph osd pool create cephfsHDD_metadata 64 64
ceph osd erasure-code-profile set dataHDD crush-device-class=hdd
ceph osd crush rule create-erasure dataHDD dataHDD
ceph osd pool set cephfsHDD_data crush_rule dataHDD
ceph osd pool set cephfsHDD_data allow_ec_overwrites true
ceph fs new cephfsHDD cephfsHDD_metadata cephfsHDD_data -force

### Example Status
health: HEALTH_WARN
Degraded data redundancy: 750/10450 objects degraded (7.177%), 313 
pgs degraded, 775 pgs undersized
887 pgs not deep-scrubbed in time
887 pgs not scrubbed in time
  services:
mon: 3 daemons...
mgr: ...
mds: 2/2 daemons up, 2 standby
osd: 36 osds: 36 up (since 27h), 36 in (since 5w); 1272 remapped pgs
  data:
volumes: 2/2 healthy
pools:   5 pools, 2176 pgs
objects: 2.82k objects, 361 MiB
usage:   40 GiB used, 262 TiB / 262 TiB avail
pgs: 750/10450 objects degraded (7.177%)
 1240/10450 objects misplaced (11.866%)
 1272 active+clean+remapped
 462  active+undersized
 313  active+undersized+degraded
 129  active+clean

### Erasure Code Profile
k=2
m=2
plugin=jerasure
technique=reed_sol_van

### Pool details
root@dokkalfar01:~# ceph osd pool get cephfsHDD_data all
size: 4
min_size: 3
pg_num: 1023
pgp_num: 972
crush_rule: dataHDD
hashpspool: true
allow_ec_overwrites: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
erasure_code_profile: default
fast_read: 0
pg_autoscale_mode: on
eio: false
bulk: false


### Example health details of unhappy pgs
pg 3.282 is stuck undersized for 27h, current state 
active+undersized+degraded, last acting [29,15,5,NONE]
pg 3.285 is stuck undersized for 27h, current state 
active+undersized+degraded, last acting [0,17,28,NONE]
pg 3.286 is stuck undersized for 27h, current state 
active+undersized+degraded, last acting [3,17,26,NONE]
pg 3.288 is stuck undersized for 27h, current state 
active+undersized+degraded, last acting [13,NONE,0,24]
pg 3.28e is stuck undersized for 27h, current state 
active+undersized+degraded, last acting [28,NONE,5,14]
pg 3.297 is stuck undersized for 27h, current state 
active+undersized+degraded, last acting [25,5,13,NONE]



---
Mark Holliman
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh

The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336.

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-use

[ceph-users] Re: Default erasure code profile not working for 3 node cluster?

2022-07-25 Thread Danny Webb
The only thing I can see from your setup is you've not set a failure domain in 
your crush rule, so it would default to host.  And a 2/2 erasure code wouldn't 
work in that scenario as each stripe of the EC must be in it's own failure 
domain.   If you wanted it to work with that setup you'd need to change the 
crush failure domain to OSD and not host (but you'd not have the ability to 
lose a host then).  If you wanted to use a failure domain of host you'd need to 
set your k / m value to 2/1.  And with that you'd still not be able to lose a 
host and still have a writable cluster.

From: Mark S. Holliman 
Sent: 25 July 2022 14:13
To: ceph-users@ceph.io 
Subject: [ceph-users] Default erasure code profile not working for 3 node 
cluster?

CAUTION: This email originates from outside THG

Dear All,

I've recently setup a 3 node Ceph Quincy (17.2) cluster to serve a pair of 
CephFS mounts for a Slurm cluster. Each ceph node has 6 x SSD and 6 x HDD, and 
I've setup the pools and crush rules to create separate CephFS filesystems 
using the different disk classes. I used the default erasure-code-profile to 
create the pools (see details below), as the documentation states that it works 
on a 3 node cluster. The system looked healthy after the initial setup, but now 
a few weeks in I'm seeing signs of problems: a growing count of pgs not 
deep-scrubbed in time, significant numbers of pgs in 
"active+undersized"/"active+undersized+degraded", most pgs in a 
"active+clean+remapped" state, and no recovery activity.

I looked at some of the pgs in the stuck states, and noticed that they all list 
a "NONE" OSD in their 'last acting' list, which points to this issue: 
https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/#erasure-coded-pgs-are-not-active-clean
 .

It's likely that is what is causing the pgs to get stuck in a degraded state 
and the ever growing list of late deep scrubs. But I'm confused why the 
documentation states that the default erasure code should work on a 3 node 
cluster - 
https://docs.ceph.com/en/latest/rados/operations/erasure-code/#creating-a-sample-erasure-coded-pool
 Is this documentation in error? Or is there something else going on with my 
setup? What is an ideal erasure code profile for a 3 node system?

Cheers,
Mark

### Commands used to create the CephFS filesystem ###
ceph osd pool create cephfsHDD_data 1024 1024 erasure
ceph osd pool create cephfsHDD_metadata 64 64
ceph osd erasure-code-profile set dataHDD crush-device-class=hdd
ceph osd crush rule create-erasure dataHDD dataHDD
ceph osd pool set cephfsHDD_data crush_rule dataHDD
ceph osd pool set cephfsHDD_data allow_ec_overwrites true
ceph fs new cephfsHDD cephfsHDD_metadata cephfsHDD_data -force

### Example Status
health: HEALTH_WARN
Degraded data redundancy: 750/10450 objects degraded (7.177%), 313 pgs 
degraded, 775 pgs undersized
887 pgs not deep-scrubbed in time
887 pgs not scrubbed in time
services:
mon: 3 daemons...
mgr: ...
mds: 2/2 daemons up, 2 standby
osd: 36 osds: 36 up (since 27h), 36 in (since 5w); 1272 remapped pgs
data:
volumes: 2/2 healthy
pools: 5 pools, 2176 pgs
objects: 2.82k objects, 361 MiB
usage: 40 GiB used, 262 TiB / 262 TiB avail
pgs: 750/10450 objects degraded (7.177%)
1240/10450 objects misplaced (11.866%)
1272 active+clean+remapped
462 active+undersized
313 active+undersized+degraded
129 active+clean

### Erasure Code Profile
k=2
m=2
plugin=jerasure
technique=reed_sol_van

### Pool details
root@dokkalfar01:~# ceph osd pool get cephfsHDD_data all
size: 4
min_size: 3
pg_num: 1023
pgp_num: 972
crush_rule: dataHDD
hashpspool: true
allow_ec_overwrites: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
erasure_code_profile: default
fast_read: 0
pg_autoscale_mode: on
eio: false
bulk: false


### Example health details of unhappy pgs
pg 3.282 is stuck undersized for 27h, current state active+undersized+degraded, 
last acting [29,15,5,NONE]
pg 3.285 is stuck undersized for 27h, current state active+undersized+degraded, 
last acting [0,17,28,NONE]
pg 3.286 is stuck undersized for 27h, current state active+undersized+degraded, 
last acting [3,17,26,NONE]
pg 3.288 is stuck undersized for 27h, current state active+undersized+degraded, 
last acting [13,NONE,0,24]
pg 3.28e is stuck undersized for 27h, current state active+undersized+degraded, 
last acting [28,NONE,5,14]
pg 3.297 is stuck undersized for 27h, current state active+undersized+degraded, 
last acting [25,5,13,NONE]



---
Mark Holliman
Wide Field Astronomy Unit
Institute for Astronomy
University of Edinburgh

The University of Edinburgh is a cha

[ceph-users] Re: Default erasure code profile not working for 3 node cluster?

2022-07-25 Thread Mark S. Holliman
Danny, Levin,

Thanks, both your answers helped (and are exactly what I suspected was the 
case). Looking back at the documentation I can see where my confusion began, as 
it isn't clear there that the "simplest" and "default" erasure code profiles 
are different. I'll report a documentation bug with the hope that they clarify 
things (I know of at least one other admin who hit the same issue I'm seeing, 
so I'm not the only one...).

Cheers,
  Mark

From: Danny Webb 
Sent: 25 July 2022 14:32
To: Mark S. Holliman ; ceph-users@ceph.io
Subject: Re: Default erasure code profile not working for 3 node cluster?

The only thing I can see from your setup is you've not set a failure domain in 
your crush rule, so it would default to host.  And a 2/2 erasure code wouldn't 
work in that scenario as each stripe of the EC must be in it's own failure 
domain.   If you wanted it to work with that setup you'd need to change the 
crush failure domain to OSD and not host (but you'd not have the ability to 
lose a host then).  If you wanted to use a failure domain of host you'd need to 
set your k / m value to 2/1.  And with that you'd still not be able to lose a 
host and still have a writable cluster.

From: Mark S. Holliman mailto:m...@roe.ac.uk>>
Sent: 25 July 2022 14:13
To: ceph-users@ceph.io 
mailto:ceph-users@ceph.io>>
Subject: [ceph-users] Default erasure code profile not working for 3 node 
cluster?

CAUTION: This email originates from outside THG

Dear All,

I've recently setup a 3 node Ceph Quincy (17.2) cluster to serve a pair of 
CephFS mounts for a Slurm cluster. Each ceph node has 6 x SSD and 6 x HDD, and 
I've setup the pools and crush rules to create separate CephFS filesystems 
using the different disk classes. I used the default erasure-code-profile to 
create the pools (see details below), as the documentation states that it works 
on a 3 node cluster. The system looked healthy after the initial setup, but now 
a few weeks in I'm seeing signs of problems: a growing count of pgs not 
deep-scrubbed in time, significant numbers of pgs in 
"active+undersized"/"active+undersized+degraded", most pgs in a 
"active+clean+remapped" state, and no recovery activity.

I looked at some of the pgs in the stuck states, and noticed that they all list 
a "NONE" OSD in their 'last acting' list, which points to this issue: 
https://docs.ceph.com/en/quincy/rados/troubleshooting/troubleshooting-pg/#erasure-coded-pgs-are-not-active-clean
 .

It's likely that is what is causing the pgs to get stuck in a degraded state 
and the ever growing list of late deep scrubs. But I'm confused why the 
documentation states that the default erasure code should work on a 3 node 
cluster - 
https://docs.ceph.com/en/latest/rados/operations/erasure-code/#creating-a-sample-erasure-coded-pool
 Is this documentation in error? Or is there something else going on with my 
setup? What is an ideal erasure code profile for a 3 node system?

Cheers,
Mark

### Commands used to create the CephFS filesystem ###
ceph osd pool create cephfsHDD_data 1024 1024 erasure
ceph osd pool create cephfsHDD_metadata 64 64
ceph osd erasure-code-profile set dataHDD crush-device-class=hdd
ceph osd crush rule create-erasure dataHDD dataHDD
ceph osd pool set cephfsHDD_data crush_rule dataHDD
ceph osd pool set cephfsHDD_data allow_ec_overwrites true
ceph fs new cephfsHDD cephfsHDD_metadata cephfsHDD_data -force

### Example Status
health: HEALTH_WARN
Degraded data redundancy: 750/10450 objects degraded (7.177%), 313 pgs 
degraded, 775 pgs undersized
887 pgs not deep-scrubbed in time
887 pgs not scrubbed in time
services:
mon: 3 daemons...
mgr: ...
mds: 2/2 daemons up, 2 standby
osd: 36 osds: 36 up (since 27h), 36 in (since 5w); 1272 remapped pgs
data:
volumes: 2/2 healthy
pools: 5 pools, 2176 pgs
objects: 2.82k objects, 361 MiB
usage: 40 GiB used, 262 TiB / 262 TiB avail
pgs: 750/10450 objects degraded (7.177%)
1240/10450 objects misplaced (11.866%)
1272 active+clean+remapped
462 active+undersized
313 active+undersized+degraded
129 active+clean

### Erasure Code Profile
k=2
m=2
plugin=jerasure
technique=reed_sol_van

### Pool details
root@dokkalfar01:~# ceph osd pool get cephfsHDD_data all
size: 4
min_size: 3
pg_num: 1023
pgp_num: 972
crush_rule: dataHDD
hashpspool: true
allow_ec_overwrites: true
nodelete: false
nopgchange: false
nosizechange: false
write_fadvise_dontneed: false
noscrub: false
nodeep-scrub: false
use_gmt_hitset: 1
erasure_code_profile: default
fast_read: 0
pg_autoscale_mode: on
eio: false
bulk: false


### Example health details of unhappy pgs
pg 3.282 is stuck undersized for 27h, current state active+undersized+degraded, 
last acting [29,15,5,NONE]
pg 3.285 is stuck undersized for 27h, current state active+undersized+degraded, 
last acting [0,17,28,NONE]
pg 3.286 is stuck undersized for 27h, current state active+undersized+degraded, 
last acting [3,17,26,NONE]
pg 3.288 is stuck undersized for 27h, 

[ceph-users] Re: LibCephFS Python Mount Failure

2022-07-25 Thread Bogdan Adrian Velica
Hi Adam,

I think this might be related to the user you are running the script as,
try running the scrip as ceph user (or the user you are running your ceph
with). Also make sure the variable os.environ.get is used (i might be
mistaking here). do a print or something first to see the key is loaded.
Just my 2 cents...

Best of luck,

 --
Bogdan Velica
Ceph Support Engineer

croit GmbH, Freseniusstr. 31h, 81247 Munich
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io

On Mon, Jul 25, 2022 at 4:06 PM Adam Carrgilson (NBI) <
adam.carrgil...@nbi.ac.uk> wrote:

> Hi all,
>
> I'm trying to put together a script to gather CephFS quota utilisation.
> I'm using the CephFS Python library from here:
> https://docs.ceph.com/en/latest/cephfs/api/libcephfs-py/
> and I've followed the rather a good guide on how to use it here:
> https://jayjeetc.medium.com/up-and-running-with-libcephfs-7629455f0cdc#934a
>
> I have been able to get this working, however; I want this to be able to
> be portable to run it on our monitoring agents, and specifically, I want to
> be able to use a limited permission account, so read-only permissions and
> network limitations.
> I originally couldn't find a method to specify a custom keyfile to use
> through the library, but with some assistance, I've found that I can use
> the Python command: os.environ["CEPH_ARGS"] = "--keyring=/path/to/keyring"
> to provide the library with that which works great (with my admin account).
>
> And therein lies the error, I've created an account just for this use
> case, but when I provide the keyfile with those credentials, I get the
> response: OSError(13, 'error calling ceph_init')
> I can give the limited permission account capabilities matching the admin
> account, but it continues to fail in the same way, pointing to it not being
> a permissions issue.
>
> Is there something obvious that I've done wrong, or an alternative method
> that might be a better approach, how can I get this to function for my
> monitoring?
>
> I've installed python-cephfs through my systems package manager, it's
> version 14.2.22, and I'm connecting through to a Nautilus system which is
> likewise version 14.2.22.
>
> Many Thanks,
> Adam.
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Map RBD to multiple nodes (line NFS)

2022-07-25 Thread Thomas Schneider

Hi,

I have this use case:
Multi-node DB must write backup to a device that is accessible by any node.

The backup is currently provided as RBD, and this RBD is mapped on any 
node belonging to the multi-node DB.


Is it possible that any node has access to the same files, independant 
of which node has written the file to RBD, like a NFS?

If yes, how must the RBD be configured here?
If no, is there any possibility in Ceph to provide such a shared storage?


Regards
Thomas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] failed OSD daemon

2022-07-25 Thread Magnus Hagdorn
Hi there,
on our pacific (16.2.9) cluster one of the OSD daemons has died and
fails to restart. The OSD exposes a NVMe drive and is one of 4
identical machines. We are using podman to orchestrate the ceph
daemons. The underlying OS is managed. The system worked fine without
any issues until recently, the other 3 machines are still working fine.
No errors are reported by the NVMe drive. The systemd unit fails after
restart, I have rebooted the system which didn't help. We end up with
an awful lot of stuff in the log which is difficult to sift through.
The OSD is part of pool with replication level 5 containing the
metadata for a cephfs.
Any suggestion what to look for?
Cheers
magnus
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh 
Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Map RBD to multiple nodes (line NFS)

2022-07-25 Thread Wesley Dillingham
You probably want CephFS instead RBD. Overview here:
https://docs.ceph.com/en/quincy/cephfs/

Respectfully,

*Wes Dillingham*
w...@wesdillingham.com
LinkedIn 


On Mon, Jul 25, 2022 at 11:00 AM Thomas Schneider <74cmo...@gmail.com>
wrote:

> Hi,
>
> I have this use case:
> Multi-node DB must write backup to a device that is accessible by any node.
>
> The backup is currently provided as RBD, and this RBD is mapped on any
> node belonging to the multi-node DB.
>
> Is it possible that any node has access to the same files, independant
> of which node has written the file to RBD, like a NFS?
> If yes, how must the RBD be configured here?
> If no, is there any possibility in Ceph to provide such a shared storage?
>
>
> Regards
> Thomas
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: weird performance issue on ceph

2022-07-25 Thread Mark Nelson

Hi Zoltan,


We have a very similar setup with one of our upstream community 
performance test clusters.  60 4TB PM983 drives spread across 10 nodes.  
We get similar numbers to what you are initially seeing (scaled down to 
60 drives) though with somewhat lower random read IOPS (we tend to max 
out at around 2M with 60 drives on this HW). I haven't seen any issues 
with quincy like what you are describing, but on this cluster most of 
the tests have been on bare metal.  One issue we have noticed with the 
PM983 drives is that they may be more susceptible to non-optimal write 
patterns causing slowdowns vs other NVMe drives in the lab.  We actually 
had to issue a last minute PR for quincy to change the disk allocation 
behavior to deal with it.  See:



https://github.com/ceph/ceph/pull/45771

https://github.com/ceph/ceph/pull/45884


I don't *think* this is the issue you are hitting since the fix in 
#45884 should have taken care of it, but it might be something to keep 
in the back of your mind.  Otherwise, the fact that you are seeing such 
a dramatic difference across both small and large read/write benchmarks 
makes me think there is something else going on.  Is there any chance 
that some other bottleneck is being imposed when the pods and volumes 
are deleted and recreated? Might be worth looking at memory and CPU 
usage of the OSDs in all of the cases and RocksDB flushing/compaction 
stats from the OSD logs.  Also a quick check with collectl/iostat/sar 
during the slow case to make sure none of the drives are showing latency 
and built up IOs in the device queues.


If you want to go deeper down the rabbit hole you can try running my 
wallclock profiler against one of your OSDs in the fast/slow cases, but 
you'll have to make sure it has access to debug symbols:



https://github.com/markhpc/uwpmp.git


run it like:


./uwpmp -n 1 -p  -b libdw > output.txt


If the libdw backend is having problems you can use -b libdwarf instead, 
but it's much slower and takes longer to collect as many samples (you 
might want to do -n 1000 instead).



Mark


On 7/25/22 11:17, Zoltan Langi wrote:
Hi people, we got an interesting issue here and I would like to ask if 
anyone seen anything like this before.



First: our system:

The ceph version is 17.2.1 but we also seen the same behaviour on 16.2.9.

Our kernel version is 5.13.0-51 and our NVMe disks are Samsung PM983.

In our deployment we got 12 nodes in total, 72 disks and 2 osd per 
disk makes 144 osd in total.


The depoyment was done by ceph-rook with default values, 6 CPU cores 
allocated to the OSD each and 4GB of memory allocated to each OSD.



The issue we are experiencing: We create for example 100 volumes via 
ceph-csi and attach it to kubernetes pods via rbd. We talk about 100 
volumes in total, 2GB each. We run fio performance tests (read, write, 
mixed) on them so the volumes are being used heavily. Ceph delivers 
good performance, no problems as all.


Performance we get for example: read iops 3371027 write iops: 727714 
read bw: 79.9 GB/s write bw: 31.2 GB/s



After the tests are complete, these volumes just sitting there doing 
nothing for a longer period of time for example 48 hours. After that, 
we clean the pods up, clean the volumes up and delete them.


Recreate the volumes and pods once more, same spec (2GB each 100 pods) 
then run the same tests once again. We don’t even have half the 
performance of that we have measured before leaving the pods sitting 
there doing notning for 2 days.



Performance we get after deleting the volumes and recreating them, 
rerun the tests: read iops: 1716239 write iops: 370631 read bw: 37.8 
GB/s write bw: 7.47 GB/s


We can clearly see that it’s a big performance loss.


If we clean up the ceph deployment, wipe the disks out completely and 
redeploy, the cluster once again delivering great performance.



We haven’t seen such a behaviour with ceph version 14.x


Has anyone seen such a thing? Thanks in advance!

Zoltan
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph orch commands non-responsive after mgr/mon reboots 16.2.9

2022-07-25 Thread Tim Olow
I just wanted to follow up on this issue as it corrected itself today.  I 
started a drain/remove on two hosts a few weeks back, after the rolling restart 
of mgr/mon on the cluster it seems that the ops queue either became locked or 
overwhelmed with requests.  I had a degraded PG during the rolling reboot of 
the mon/mgr and that seems to have blocked ceph orch, balancer, 
autoscale-status cli commands from returning.  I could see in the manager debug 
logs that balancer was indeed running and returning results internally from the 
internal scheduled process but the cli would hang indefinitely.   This morning 
the last degraded/offline PG got resolved and all commands are running again.

Moving forward is there a method to view the ops queue or monitor if the queue 
gets full and starts to deprioritize CLI commands?

Tim


On 7/22/22, 6:32 PM, "Tim Olow"  wrote:

Howdy,

I seem to be facing a problem on my 16.2.9 ceph cluster.  After a staggered 
reboot of my 3 infra nodes all of ceph orch commands are hanging much like in 
this previous reported issue [1]

I have paused orch and rebuilt a manager by hand as outlined here [2], and 
the issue continues to persist.   I am unable to scale up or down of services, 
restart daemons, etc.

ceph orch ls –verbose

[{'flags': 8,
  'help': 'List services known to orchestrator',
  'module': 'mgr',
  'perm': 'r',
  'sig': [argdesc(, req=True, 
name=prefix, n=1, numseen=0, prefix=orch),
  argdesc(, req=True, 
name=prefix, n=1, numseen=0, prefix=ls),
  argdesc(, req=False, 
name=service_type, n=1, numseen=0),
  argdesc(, req=False, 
name=service_name, n=1, numseen=0),
  argdesc(, req=False, name=export, 
n=1, numseen=0),
  argdesc(, req=False, 
name=format, n=1, numseen=0, 
strings=plain|json|json-pretty|yaml|xml-pretty|xml),
  argdesc(, req=False, 
name=refresh, n=1, numseen=0)]}]
Submitting command:  {'prefix': 'orch ls', 'target': ('mon-mgr', '')}
submit {"prefix": "orch ls", "target": ["mon-mgr", ""]} to mon-mgr




Debug output on the manager:

debug 2022-07-22T23:27:12.509+ 7fc180230700  0 log_channel(audit) log 
[DBG] : from='client.1084220 -' entity='client.admin' cmd=[{"prefix": "orch 
ls", "target": ["mon-mgr", ""]}]: dispatch

I have collected a startup of the manager and uploaded it for review [3]


Many Thanks,

Tim


[1] https://www.spinics.net/lists/ceph-users/msg68398.html
[2] 
https://docs.ceph.com/en/quincy/cephadm/troubleshooting/#manually-deploying-a-mgr-daemon
[3] https://pastebin.com/Dvb8sEbz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: weird performance issue on ceph

2022-07-25 Thread Mark Nelson
I don't think so if this is just plain old RBD.  RBD  shouldn't require 
a bunch of RocksDB iterator seeks in the read/write hot path and writes 
should pretty quickly clear out tombstones as part of the memtable flush 
and compaction process even in the slow case.  Maybe in some kind of 
pathologically bad read-only corner case with no onode cache but it 
would be bad for more reasons than what's happening in that tracker 
ticket imho (even reading onodes from rocksdb block cache is 
significantly slower than BlueStore's onode cache).


If RBD mirror (or snapshots) are involved that could be a different 
story though.  I believe to deal with deletes in that case we have to go 
through iteration/deletion loops that have same root issue as what's 
going on in the tracker ticket and it can end up impacting client IO.  
Gabi and Paul and testing/reworking how the snapmapper works and I've 
started a sort of a catch-all PR for improving our RocksDB tunings/glue 
here:



https://github.com/ceph/ceph/pull/47221


Mark

On 7/25/22 12:48, Frank Schilder wrote:

Could it be related to this performance death trap: 
https://tracker.ceph.com/issues/55324 ?
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: Mark Nelson 
Sent: 25 July 2022 18:50
To: ceph-users@ceph.io
Subject: [ceph-users] Re: weird performance issue on ceph

Hi Zoltan,


We have a very similar setup with one of our upstream community
performance test clusters.  60 4TB PM983 drives spread across 10 nodes.
We get similar numbers to what you are initially seeing (scaled down to
60 drives) though with somewhat lower random read IOPS (we tend to max
out at around 2M with 60 drives on this HW). I haven't seen any issues
with quincy like what you are describing, but on this cluster most of
the tests have been on bare metal.  One issue we have noticed with the
PM983 drives is that they may be more susceptible to non-optimal write
patterns causing slowdowns vs other NVMe drives in the lab.  We actually
had to issue a last minute PR for quincy to change the disk allocation
behavior to deal with it.  See:


https://github.com/ceph/ceph/pull/45771

https://github.com/ceph/ceph/pull/45884


I don't *think* this is the issue you are hitting since the fix in
#45884 should have taken care of it, but it might be something to keep
in the back of your mind.  Otherwise, the fact that you are seeing such
a dramatic difference across both small and large read/write benchmarks
makes me think there is something else going on.  Is there any chance
that some other bottleneck is being imposed when the pods and volumes
are deleted and recreated? Might be worth looking at memory and CPU
usage of the OSDs in all of the cases and RocksDB flushing/compaction
stats from the OSD logs.  Also a quick check with collectl/iostat/sar
during the slow case to make sure none of the drives are showing latency
and built up IOs in the device queues.

If you want to go deeper down the rabbit hole you can try running my
wallclock profiler against one of your OSDs in the fast/slow cases, but
you'll have to make sure it has access to debug symbols:


https://github.com/markhpc/uwpmp.git


run it like:


./uwpmp -n 1 -p  -b libdw > output.txt


If the libdw backend is having problems you can use -b libdwarf instead,
but it's much slower and takes longer to collect as many samples (you
might want to do -n 1000 instead).


Mark


On 7/25/22 11:17, Zoltan Langi wrote:

Hi people, we got an interesting issue here and I would like to ask if
anyone seen anything like this before.


First: our system:

The ceph version is 17.2.1 but we also seen the same behaviour on 16.2.9.

Our kernel version is 5.13.0-51 and our NVMe disks are Samsung PM983.

In our deployment we got 12 nodes in total, 72 disks and 2 osd per
disk makes 144 osd in total.

The depoyment was done by ceph-rook with default values, 6 CPU cores
allocated to the OSD each and 4GB of memory allocated to each OSD.


The issue we are experiencing: We create for example 100 volumes via
ceph-csi and attach it to kubernetes pods via rbd. We talk about 100
volumes in total, 2GB each. We run fio performance tests (read, write,
mixed) on them so the volumes are being used heavily. Ceph delivers
good performance, no problems as all.

Performance we get for example: read iops 3371027 write iops: 727714
read bw: 79.9 GB/s write bw: 31.2 GB/s


After the tests are complete, these volumes just sitting there doing
nothing for a longer period of time for example 48 hours. After that,
we clean the pods up, clean the volumes up and delete them.

Recreate the volumes and pods once more, same spec (2GB each 100 pods)
then run the same tests once again. We don’t even have half the
performance of that we have measured before leaving the pods sitting
there doing notning for 2 days.


Performance we get after deleting the volumes and recreating them,
rerun the tests: read iops: 1716239 write iops: 3

[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
Pretty desperate here.  Can someone suggest what I might be able to do to
get these OSDs back up.  It looks like my recovery had stalled.


On Mon, Jul 25, 2022 at 7:26 AM Anthony D'Atri 
wrote:

> Do your values for public and cluster network include the new addresses on
> all nodes?
>

This cluster only has one network.  There is no separation between
public and cluster.  Three of the nodes momentarily came up using a
different IP address.

I've also noticed on one of the nodes that did not move or have any IP
issue, on a single node, from the dashboard, it names the same device for
two different osd's:

2 cn01 out destroyed hdd TOSHIBA_MG04SCA40EE_21M0A0CKFWZB Unknown sdb osd.2

3 cn01 out destroyed ssd SAMSUNG_MZILT3T8HBLS/007_S5G0NE0R200159 Unknown
sdb osd.3


[ceph: root@cn01 /]# ceph-volume inventory

Device Path   Size rotates available Model name
/dev/sda  3.64 TB  TrueFalse MG04SCA40EE
/dev/sdb  3.49 TB  False   False MZILT3T8HBLS/007
/dev/sdc  3.64 TB  TrueFalse MG04SCA40EE
/dev/sdd  3.64 TB  TrueFalse MG04SCA40EE
/dev/sde  3.49 TB  False   False MZILT3T8HBLS/007
/dev/sdf  3.64 TB  TrueFalse MG04SCA40EE
/dev/sdg  698.64 GBTrueFalse SEAGATE ST375064

[ceph: root@cn01 /]# ceph osd info
osd.0 down out weight 0 up_from 30231 up_thru 30564 down_at 30688
last_clean_interval [25500,30228) [v2:
192.168.30.15:6818/2512683421,v1:192.168.30.15:6819/2512683421] [v2:
192.168.30.15:6824/2512683421,v1:192.168.30.15:6826/2512683421]
autoout,exists d14cf503-a303-4fa4-a713-9530b67d613a
osd.1 down out weight 0 up_from 30393 up_thru 30688 down_at 30697
last_clean_interval [25518,30321) [v2:
192.168.30.16:6834/1781855831,v1:192.168.30.16:6835/1781855831] [v2:
192.168.30.16:6836/1781855831,v1:192.168.30.16:6837/1781855831]
autoout,exists 0d521411-c835-4fa3-beca-3631b4ff6bf7
osd.2 down out weight 0 up_from 31316 up_thru 31293 down_at 31317
last_clean_interval [31218,31296) [v2:
192.168.30.11:6810/894589880,v1:192.168.30.11:6811/894589880] [v2:
192.168.30.11:6812/894589880,v1:192.168.30.11:6813/894589880]
destroyed,exists
osd.3 down out weight 0 up_from 31265 up_thru 31266 down_at 31268
last_clean_interval [31254,31256) [v2:
192.168.30.11:6818/1641948535,v1:192.168.30.11:6819/1641948535] [v2:
192.168.30.11:6820/1641948535,v1:192.168.30.11:6821/1641948535]
destroyed,exists
osd.4 up   in  weight 1 up_from 31356 up_thru 31581 down_at 31339
last_clean_interval [31320,31338) [v2:
192.168.30.11:6802/2785067179,v1:192.168.30.11:6803/2785067179] [v2:
192.168.30.11:6804/2785067179,v1:192.168.30.11:6805/2785067179] exists,up
3afd06db-b91d-44fe-9305-5eb95f7a59b9
osd.5 up   in  weight 1 up_from 31347 up_thru 31699 down_at 31339
last_clean_interval [31311,31338) [v2:
192.168.30.11:6818/1936771540,v1:192.168.30.11:6819/1936771540] [v2:
192.168.30.11:6820/1936771540,v1:192.168.30.11:6821/1936771540] exists,up
063c2ccf-02ce-4f5e-8252-dddfbb258a95
osd.6 up   in  weight 1 up_from 31218 up_thru 31711 down_at 31217
last_clean_interval [30978,31195) [v2:
192.168.30.12:6816/1585973160,v1:192.168.30.12:6817/1585973160] [v2:
192.168.30.12:6818/1585973160,v1:192.168.30.12:6819/1585973160] exists,up
94250ea2-f12e-4dc6-9135-b626086ccffd
osd.7 down out weight 0 up_from 30353 up_thru 30558 down_at 30688
last_clean_interval [25533,30349) [v2:
192.168.30.14:6816/4083104061,v1:192.168.30.14:6817/4083104061] [v2:
192.168.30.14:6840/4094104061,v1:192.168.30.14:6841/4094104061]
autoout,exists de351aec-b91e-4c22-a0bf-85369bc14579
osd.8 up   in  weight 1 up_from 31226 up_thru 31668 down_at 31225
last_clean_interval [30983,31195) [v2:
192.168.30.12:6824/1312484329,v1:192.168.30.12:6825/1312484329] [v2:
192.168.30.12:6826/1312484329,v1:192.168.30.12:6827/1312484329] exists,up
51f665b4-fa5b-4b17-8390-ed130145ef04
osd.9 up   in  weight 1 up_from 31351 up_thru 31673 down_at 31340
last_clean_interval [31315,31338) [v2:
192.168.30.11:6810/1446838877,v1:192.168.30.11:6811/1446838877] [v2:
192.168.30.11:6812/1446838877,v1:192.168.30.11:6813/1446838877] exists,up
985f1127-d126-4629-b8cd-03cf2d914d99
osd.10 up   in  weight 1 up_from 31219 up_thru 31639 down_at 31218
last_clean_interval [30980,31195) [v2:
192.168.30.12:6808/1587842953,v1:192.168.30.12:6809/1587842953] [v2:
192.168.30.12:6810/1587842953,v1:192.168.30.12:6811/1587842953] exists,up
c7fca03e-4bd5-4485-a090-658ca967d5f6
osd.11 up   in  weight 1 up_from 31234 up_thru 31659 down_at 31223
last_clean_interval [30978,31195) [v2:
192.168.30.12:6840/3403200742,v1:192.168.30.12:6841/3403200742] [v2:
192.168.30.12:6842/3403200742,v1:192.168.30.12:6843/3403200742] exists,up
81074bd7-ad9f-4e56-8885-cca4745f6c95
osd.12 up   in  weight 1 up_from 31230 up_thru 31717 down_at 31223
last_clean_interval [30975,31195) [v2:
192.168.30.13:6816/4268732910,v1:192.168.30.13:6817/4268732910] [v2:
192.168.30.13:6818/4268732910

[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
Here's some more info:

HEALTH_WARN 2 failed cephadm daemon(s); 3 hosts fail cephadm check; 2
filesystems are degraded; 1 MDSs report slow metadata IOs; 2/5 mons down,
quorum cn02,cn03,cn01; 10 osds down; 3 hosts (17 osds) down; Reduced data
availability: 13 pgs inactive, 9 pgs down; Degraded data redundancy:
8515690/30862245 objects degraded (27.593%), 326 pgs degraded, 447 pgs
undersized
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
daemon osd.3 on cn01.ceph is in error state
daemon osd.2 on cn01.ceph is in error state
[WRN] CEPHADM_HOST_CHECK_FAILED: 3 hosts fail cephadm check
host cn04.ceph (192.168.30.14) failed check: Failed to connect to
cn04.ceph (192.168.30.14).
Please make sure that the host is reachable and accepts connections using
the cephadm SSH key

To add the cephadm SSH key to the host:
> ceph cephadm get-pub-key > ~/ceph.pub
> ssh-copy-id -f -i ~/ceph.pub root@192.168.30.14

To check that the host is reachable open a new shell with the --no-hosts
flag:
> cephadm shell --no-hosts

Then run the following:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
> chmod 0600 ~/cephadm_private_key
> ssh -F ssh_config -i ~/cephadm_private_key root@192.168.30.14
host cn06.ceph (192.168.30.16) failed check: Failed to connect to
cn06.ceph (192.168.30.16).
Please make sure that the host is reachable and accepts connections using
the cephadm SSH key

To add the cephadm SSH key to the host:
> ceph cephadm get-pub-key > ~/ceph.pub
> ssh-copy-id -f -i ~/ceph.pub root@192.168.30.16

To check that the host is reachable open a new shell with the --no-hosts
flag:
> cephadm shell --no-hosts

Then run the following:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
> chmod 0600 ~/cephadm_private_key
> ssh -F ssh_config -i ~/cephadm_private_key root@192.168.30.16
host cn05.ceph (192.168.30.15) failed check: Failed to connect to
cn05.ceph (192.168.30.15).
Please make sure that the host is reachable and accepts connections using
the cephadm SSH key

To add the cephadm SSH key to the host:
> ceph cephadm get-pub-key > ~/ceph.pub
> ssh-copy-id -f -i ~/ceph.pub root@192.168.30.15

To check that the host is reachable open a new shell with the --no-hosts
flag:
> cephadm shell --no-hosts

Then run the following:
> ceph cephadm get-ssh-config > ssh_config
> ceph config-key get mgr/cephadm/ssh_identity_key > ~/cephadm_private_key
> chmod 0600 ~/cephadm_private_key
> ssh -F ssh_config -i ~/cephadm_private_key root@192.168.30.15
[WRN] FS_DEGRADED: 2 filesystems are degraded
fs coldlogix is degraded
fs btc is degraded
[WRN] MDS_SLOW_METADATA_IO: 1 MDSs report slow metadata IOs
mds.coldlogix.cn01.uriofo(mds.0): 2 slow metadata IOs are blocked > 30
secs, oldest blocked for 2096 secs
[WRN] MON_DOWN: 2/5 mons down, quorum cn02,cn03,cn01
mon.cn05 (rank 0) addr [v2:192.168.30.15:3300/0,v1:192.168.30.15:6789/0]
is down (out of quorum)
mon.cn04 (rank 3) addr [v2:192.168.30.14:3300/0,v1:192.168.30.14:6789/0]
is down (out of quorum)
[WRN] OSD_DOWN: 10 osds down
osd.0 (root=default,host=cn05) is down
osd.1 (root=default,host=cn06) is down
osd.7 (root=default,host=cn04) is down
osd.13 (root=default,host=cn06) is down
osd.15 (root=default,host=cn05) is down
osd.18 (root=default,host=cn04) is down
osd.20 (root=default,host=cn04) is down
osd.33 (root=default,host=cn06) is down
osd.34 (root=default,host=cn06) is down
osd.36 (root=default,host=cn05) is down
[WRN] OSD_HOST_DOWN: 3 hosts (17 osds) down
host cn04 (root=default) (6 osds) is down
host cn05 (root=default) (5 osds) is down
host cn06 (root=default) (6 osds) is down
[WRN] PG_AVAILABILITY: Reduced data availability: 13 pgs inactive, 9 pgs
down
pg 9.3a is down, acting [8]
pg 9.7a is down, acting [8]
pg 9.ba is down, acting [8]
pg 9.fa is down, acting [8]
pg 11.3 is stuck inactive for 39h, current state
undersized+degraded+peered, last acting [11]
pg 11.11 is down, acting [19,9]
pg 11.1f is stuck inactive for 13h, current state
undersized+degraded+peered, last acting [10]
pg 12.36 is down, acting [21,16]
pg 12.59 is down, acting [26,5]
pg 12.66 is down, acting [5]
pg 19.4 is stuck inactive for 39h, current state
undersized+degraded+peered, last acting [6]
pg 19.1c is down, acting [21,16,11]
pg 21.1 is stuck inactive for 36m, current state unknown, last acting []
[WRN] PG_DEGRADED: Degraded data redundancy: 8515690/30862245 objects
degraded (27.593%), 326 pgs degraded, 447 pgs undersized
pg 9.75 is stuck undersized for 34m, current state
active+undersized+remapped, last acting [4,8,35]
pg 9.76 is stuck undersized for 35m, current state
active+undersized+degraded, last acting [35,10,21]
pg 9.77 is stuck undersized for 34m, current state
active+undersized+remapped, last acting [32,35,4]
pg 9.

[ceph-users] Re: octopus v15.2.17 QE Validation status

2022-07-25 Thread Adam King
orch approved. The test_cephadm_repos test failure is just a problem with
the test I believe, not any actual ceph code. The other selinux denial I
don't think is new.

Thanks,
  - Adam King

On Sun, Jul 24, 2022 at 11:33 AM Yuri Weinstein  wrote:

> Still seeking approvals for:
>
> rados - Travis, Ernesto, Adam
> rgw - Casey
> fs, kcephfs, multimds - Venky, Patrick
> ceph-ansible - Brad pls take a look
>
> Josh, upgrade/client-upgrade-nautilus-octopus failed, do we need to fix
> it, pls take a look/approve.
>
>
> On Fri, Jul 22, 2022 at 10:06 AM Neha Ojha  wrote:
>
>> On Thu, Jul 21, 2022 at 8:47 AM Ilya Dryomov  wrote:
>> >
>> > On Thu, Jul 21, 2022 at 4:24 PM Yuri Weinstein 
>> wrote:
>> > >
>> > > Details of this release are summarized here:
>> > >
>> > > https://tracker.ceph.com/issues/56484
>> > > Release Notes - https://github.com/ceph/ceph/pull/47198
>> > >
>> > > Seeking approvals for:
>> > >
>> > > rados - Neha, Travis, Ernesto, Adam
>>
>> rados approved!
>> known issue https://tracker.ceph.com/issues/55854
>>
>> Thanks,
>> Neha
>>
>> >
>> > > rgw - Casey
>> > > fs, kcephfs, multimds - Venky, Patrick
>> > > rbd - Ilya, Deepika
>> > > krbd  Ilya, Deepika
>> >
>> > rbd and krbd approved.
>> >
>> > Thanks,
>> >
>> > Ilya
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>>
>> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: octopus v15.2.17 QE Validation status

2022-07-25 Thread Casey Bodley
On Sun, Jul 24, 2022 at 11:33 AM Yuri Weinstein  wrote:
>
> Still seeking approvals for:
>
> rados - Travis, Ernesto, Adam
> rgw - Casey

rgw approved

> fs, kcephfs, multimds - Venky, Patrick
> ceph-ansible - Brad pls take a look
>
> Josh, upgrade/client-upgrade-nautilus-octopus failed, do we need to fix it, 
> pls take a look/approve.
>
>
> On Fri, Jul 22, 2022 at 10:06 AM Neha Ojha  wrote:
>>
>> On Thu, Jul 21, 2022 at 8:47 AM Ilya Dryomov  wrote:
>> >
>> > On Thu, Jul 21, 2022 at 4:24 PM Yuri Weinstein  wrote:
>> > >
>> > > Details of this release are summarized here:
>> > >
>> > > https://tracker.ceph.com/issues/56484
>> > > Release Notes - https://github.com/ceph/ceph/pull/47198
>> > >
>> > > Seeking approvals for:
>> > >
>> > > rados - Neha, Travis, Ernesto, Adam
>>
>> rados approved!
>> known issue https://tracker.ceph.com/issues/55854
>>
>> Thanks,
>> Neha
>>
>> >
>> > > rgw - Casey
>> > > fs, kcephfs, multimds - Venky, Patrick
>> > > rbd - Ilya, Deepika
>> > > krbd  Ilya, Deepika
>> >
>> > rbd and krbd approved.
>> >
>> > Thanks,
>> >
>> > Ilya
>> > ___
>> > ceph-users mailing list -- ceph-users@ceph.io
>> > To unsubscribe send an email to ceph-users-le...@ceph.io
>> >
>>
> ___
> Dev mailing list -- d...@ceph.io
> To unsubscribe send an email to dev-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [Warning Possible spam] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
I noticed this on the initial run of ceph health, but I no longer see it.
When you say "don't use ceph adm", can you explain why this is bad?

This is ceph health outside of cephadm shell:

HEALTH_WARN 1 filesystem is degraded; 2 MDSs report slow metadata IOs; 2/5
mons down, quorum cn02,cn03,cn01; 10 osds down; 3 hosts (17 osds) down;
Reduced data ava
ilability: 13 pgs inactive, 9 pgs down; Degraded data redundancy:
8515690/30862245 objects degraded (27.593%), 326 pgs degraded, 447 pgs
undersized
[WRN] FS_DEGRADED: 1 filesystem is degraded
fs coldlogix is degraded
[WRN] MDS_SLOW_METADATA_IO: 2 MDSs report slow metadata IOs
mds.coldlogix.cn01.uriofo(mds.0): 2 slow metadata IOs are blocked > 30
secs, oldest blocked for 3701 secs
mds.btc.cn02.ouvaus(mds.0): 1 slow metadata IOs are blocked > 30 secs,
oldest blocked for 382 secs
[WRN] MON_DOWN: 2/5 mons down, quorum cn02,cn03,cn01
mon.cn05 (rank 0) addr [v2:192.168.30.15:3300/0,v1:192.168.30.15:6789/0]
is down (out of quorum)
mon.cn04 (rank 3) addr [v2:192.168.30.14:3300/0,v1:192.168.30.14:6789/0]
is down (out of quorum)
[WRN] OSD_DOWN: 10 osds down
osd.0 (root=default,host=cn05) is down
osd.1 (root=default,host=cn06) is down
osd.7 (root=default,host=cn04) is down
osd.13 (root=default,host=cn06) is down
osd.15 (root=default,host=cn05) is down
osd.18 (root=default,host=cn04) is down
osd.20 (root=default,host=cn04) is down
osd.33 (root=default,host=cn06) is down
osd.34 (root=default,host=cn06) is down
osd.36 (root=default,host=cn05) is down
[WRN] OSD_HOST_DOWN: 3 hosts (17 osds) down
host cn04 (root=default) (6 osds) is down
host cn05 (root=default) (5 osds) is down
host cn06 (root=default) (6 osds) is down
[WRN] PG_AVAILABILITY: Reduced data availability: 13 pgs inactive, 9 pgs
down
pg 9.3a is down, acting [8]
pg 9.7a is down, acting [8]
pg 9.ba is down, acting [8]
pg 9.fa is down, acting [8]
pg 11.3 is stuck inactive for 39h, current state
undersized+degraded+peered, last acting [11]
pg 11.11 is down, acting [19,9]
pg 11.1f is stuck inactive for 13h, current state
undersized+degraded+peered, last acting [10]
pg 12.36 is down, acting [21,16]
pg 12.59 is down, acting [26,5]
pg 12.66 is down, acting [5]
pg 19.4 is stuck inactive for 39h, current state
undersized+degraded+peered, last acting [6]
pg 19.1c is down, acting [21,16,11]
pg 21.1 is stuck inactive for 2m, current state unknown, last acting []
[WRN] PG_DEGRADED: Degraded data redundancy: 8515690/30862245 objects
degraded (27.593%), 326 pgs degraded, 447 pgs undersized
pg 9.75 is stuck undersized for 61m, current state
active+undersized+remapped, last acting [4,8,35]
pg 9.76 is stuck undersized for 62m, current state
active+undersized+degraded, last acting [35,10,21]
pg 9.77 is stuck undersized for 61m, current state
active+undersized+remapped, last acting [32,35,4]
pg 9.78 is stuck undersized for 62m, current state
active+undersized+degraded, last acting [14,10]
pg 9.79 is stuck undersized for 62m, current state
active+undersized+degraded, last acting [21,32]
pg 9.7b is stuck undersized for 61m, current state
active+undersized+degraded, last acting [8,12,5]
pg 9.7c is stuck undersized for 61m, current state
active+undersized+degraded, last acting [4,35,10]
pg 9.7d is stuck undersized for 62m, current state
active+undersized+degraded, last acting [5,19,10]
pg 9.7e is stuck undersized for 62m, current state
active+undersized+remapped, last acting [21,10,17]
pg 9.80 is stuck undersized for 61m, current state
active+undersized+degraded, last acting [8,4,17]
pg 9.81 is stuck undersized for 62m, current state
active+undersized+degraded, last acting [14,26]
pg 9.82 is stuck undersized for 62m, current state
active+undersized+degraded, last acting [26,16]
pg 9.83 is stuck undersized for 61m, current state
active+undersized+degraded, last acting [8,4]
pg 9.84 is stuck undersized for 61m, current state
active+undersized+degraded, last acting [4,35,6]
pg 9.85 is stuck undersized for 61m, current state
active+undersized+degraded, last acting [32,12,9]
pg 9.86 is stuck undersized for 61m, current state
active+undersized+degraded, last acting [35,5,8]
pg 9.87 is stuck undersized for 61m, current state
active+undersized+degraded, last acting [9,12]
pg 9.88 is stuck undersized for 62m, current state
active+undersized+remapped, last acting [19,32,35]
pg 9.89 is stuck undersized for 61m, current state
active+undersized+degraded, last acting [10,14,4]
pg 9.8a is stuck undersized for 62m, current state
active+undersized+degraded, last acting [21,19]
pg 9.8b is stuck undersized for 61m, current state
active+undersized+degraded, last acting [8,35]
pg 9.8c is stuck undersized for 58m, current state
active+undersized+remapped, last acting [10,19,5]
pg 9.8d is stuck undersized for 61m, current state
active+undersized+re

[ceph-users] Re: [Warning Possible spam] Re: Issues after a shutdown

2022-07-25 Thread Adam King
Do the journal logs for any of the OSDs that are marked down give any
useful info on why they're failing to start back up? If the host level ip
issues have gone away I think that would be the next place to check.

On Mon, Jul 25, 2022 at 5:03 PM Jeremy Hansen 
wrote:

> I noticed this on the initial run of ceph health, but I no longer see it.
> When you say "don't use ceph adm", can you explain why this is bad?
>
> This is ceph health outside of cephadm shell:
>
> HEALTH_WARN 1 filesystem is degraded; 2 MDSs report slow metadata IOs; 2/5
> mons down, quorum cn02,cn03,cn01; 10 osds down; 3 hosts (17 osds) down;
> Reduced data ava
> ilability: 13 pgs inactive, 9 pgs down; Degraded data redundancy:
> 8515690/30862245 objects degraded (27.593%), 326 pgs degraded, 447 pgs
> undersized
> [WRN] FS_DEGRADED: 1 filesystem is degraded
> fs coldlogix is degraded
> [WRN] MDS_SLOW_METADATA_IO: 2 MDSs report slow metadata IOs
> mds.coldlogix.cn01.uriofo(mds.0): 2 slow metadata IOs are blocked > 30
> secs, oldest blocked for 3701 secs
> mds.btc.cn02.ouvaus(mds.0): 1 slow metadata IOs are blocked > 30 secs,
> oldest blocked for 382 secs
> [WRN] MON_DOWN: 2/5 mons down, quorum cn02,cn03,cn01
> mon.cn05 (rank 0) addr [v2:
> 192.168.30.15:3300/0,v1:192.168.30.15:6789/0]
> is down (out of quorum)
> mon.cn04 (rank 3) addr [v2:
> 192.168.30.14:3300/0,v1:192.168.30.14:6789/0]
> is down (out of quorum)
> [WRN] OSD_DOWN: 10 osds down
> osd.0 (root=default,host=cn05) is down
> osd.1 (root=default,host=cn06) is down
> osd.7 (root=default,host=cn04) is down
> osd.13 (root=default,host=cn06) is down
> osd.15 (root=default,host=cn05) is down
> osd.18 (root=default,host=cn04) is down
> osd.20 (root=default,host=cn04) is down
> osd.33 (root=default,host=cn06) is down
> osd.34 (root=default,host=cn06) is down
> osd.36 (root=default,host=cn05) is down
> [WRN] OSD_HOST_DOWN: 3 hosts (17 osds) down
> host cn04 (root=default) (6 osds) is down
> host cn05 (root=default) (5 osds) is down
> host cn06 (root=default) (6 osds) is down
> [WRN] PG_AVAILABILITY: Reduced data availability: 13 pgs inactive, 9 pgs
> down
> pg 9.3a is down, acting [8]
> pg 9.7a is down, acting [8]
> pg 9.ba is down, acting [8]
> pg 9.fa is down, acting [8]
> pg 11.3 is stuck inactive for 39h, current state
> undersized+degraded+peered, last acting [11]
> pg 11.11 is down, acting [19,9]
> pg 11.1f is stuck inactive for 13h, current state
> undersized+degraded+peered, last acting [10]
> pg 12.36 is down, acting [21,16]
> pg 12.59 is down, acting [26,5]
> pg 12.66 is down, acting [5]
> pg 19.4 is stuck inactive for 39h, current state
> undersized+degraded+peered, last acting [6]
> pg 19.1c is down, acting [21,16,11]
> pg 21.1 is stuck inactive for 2m, current state unknown, last acting []
> [WRN] PG_DEGRADED: Degraded data redundancy: 8515690/30862245 objects
> degraded (27.593%), 326 pgs degraded, 447 pgs undersized
> pg 9.75 is stuck undersized for 61m, current state
> active+undersized+remapped, last acting [4,8,35]
> pg 9.76 is stuck undersized for 62m, current state
> active+undersized+degraded, last acting [35,10,21]
> pg 9.77 is stuck undersized for 61m, current state
> active+undersized+remapped, last acting [32,35,4]
> pg 9.78 is stuck undersized for 62m, current state
> active+undersized+degraded, last acting [14,10]
> pg 9.79 is stuck undersized for 62m, current state
> active+undersized+degraded, last acting [21,32]
> pg 9.7b is stuck undersized for 61m, current state
> active+undersized+degraded, last acting [8,12,5]
> pg 9.7c is stuck undersized for 61m, current state
> active+undersized+degraded, last acting [4,35,10]
> pg 9.7d is stuck undersized for 62m, current state
> active+undersized+degraded, last acting [5,19,10]
> pg 9.7e is stuck undersized for 62m, current state
> active+undersized+remapped, last acting [21,10,17]
> pg 9.80 is stuck undersized for 61m, current state
> active+undersized+degraded, last acting [8,4,17]
> pg 9.81 is stuck undersized for 62m, current state
> active+undersized+degraded, last acting [14,26]
> pg 9.82 is stuck undersized for 62m, current state
> active+undersized+degraded, last acting [26,16]
> pg 9.83 is stuck undersized for 61m, current state
> active+undersized+degraded, last acting [8,4]
> pg 9.84 is stuck undersized for 61m, current state
> active+undersized+degraded, last acting [4,35,6]
> pg 9.85 is stuck undersized for 61m, current state
> active+undersized+degraded, last acting [32,12,9]
> pg 9.86 is stuck undersized for 61m, current state
> active+undersized+degraded, last acting [35,5,8]
> pg 9.87 is stuck undersized for 61m, current state
> active+undersized+degraded, last acting [9,12]
> pg 9.88 is stuck undersized for 62m, current state
> active+undersized+remapped, last acting [19,32,35]
> pg 9.89 is stuck unders

[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
MTU is the same across all hosts:

- cn01.ceph.la1.clx.corp-
enp2s0: flags=4163  mtu 9000
inet 192.168.30.11  netmask 255.255.255.0  broadcast 192.168.30.255
inet6 fe80::3e8c:f8ff:feed:728d  prefixlen 64  scopeid 0x20
ether 3c:8c:f8:ed:72:8d  txqueuelen 1000  (Ethernet)
RX packets 3163785  bytes 213625 (1.9 GiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 6890933  bytes 40233267272 (37.4 GiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

- cn02.ceph.la1.clx.corp-
enp2s0: flags=4163  mtu 9000
inet 192.168.30.12  netmask 255.255.255.0  broadcast 192.168.30.255
inet6 fe80::3e8c:f8ff:feed:ff0c  prefixlen 64  scopeid 0x20
ether 3c:8c:f8:ed:ff:0c  txqueuelen 1000  (Ethernet)
RX packets 3976256  bytes 2761764486 (2.5 GiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 9270324  bytes 56984933585 (53.0 GiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

- cn03.ceph.la1.clx.corp-
enp2s0: flags=4163  mtu 9000
inet 192.168.30.13  netmask 255.255.255.0  broadcast 192.168.30.255
inet6 fe80::3e8c:f8ff:feed:feba  prefixlen 64  scopeid 0x20
ether 3c:8c:f8:ed:fe:ba  txqueuelen 1000  (Ethernet)
RX packets 13081847  bytes 93614795356 (87.1 GiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 4001854  bytes 2536322435 (2.3 GiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

- cn04.ceph.la1.clx.corp-
enp2s0: flags=4163  mtu 9000
inet 192.168.30.14  netmask 255.255.255.0  broadcast 192.168.30.255
inet6 fe80::3e8c:f8ff:feed:6f89  prefixlen 64  scopeid 0x20
ether 3c:8c:f8:ed:6f:89  txqueuelen 1000  (Ethernet)
RX packets 60018  bytes 5622542 (5.3 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 59889  bytes 17463794 (16.6 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

- cn05.ceph.la1.clx.corp-
enp2s0: flags=4163  mtu 9000
inet 192.168.30.15  netmask 255.255.255.0  broadcast 192.168.30.255
inet6 fe80::3e8c:f8ff:feed:7245  prefixlen 64  scopeid 0x20
ether 3c:8c:f8:ed:72:45  txqueuelen 1000  (Ethernet)
RX packets 69163  bytes 8085511 (7.7 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 73539  bytes 17069869 (16.2 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

- cn06.ceph.la1.clx.corp-
enp2s0: flags=4163  mtu 9000
inet 192.168.30.16  netmask 255.255.255.0  broadcast 192.168.30.255
inet6 fe80::3e8c:f8ff:feed:feab  prefixlen 64  scopeid 0x20
ether 3c:8c:f8:ed:fe:ab  txqueuelen 1000  (Ethernet)
RX packets 23570  bytes 2251531 (2.1 MiB)
RX errors 0  dropped 0  overruns 0  frame 0
TX packets 22268  bytes 16186794 (15.4 MiB)
TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

10G.

On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond 
wrote:

> Is the MTU in n the new rack set correctly?
>
> On Mon, 25 Jul 2022, 11:30 Jeremy Hansen, 
> wrote:
>
>> I transitioned some servers to a new rack and now I'm having major issues
>> with Ceph upon bringing things back up.
>>
>> I believe the issue may be related to the ceph nodes coming back up with
>> different IPs before VLANs were set.  That's just a guess because I can't
>> think of any other reason this would happen.
>>
>> Current state:
>>
>> Every 2.0s: ceph -s
>>cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022
>>
>>   cluster:
>> id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d
>> health: HEALTH_WARN
>> 1 filesystem is degraded
>> 2 MDSs report slow metadata IOs
>> 2/5 mons down, quorum cn02,cn03,cn01
>> 9 osds down
>> 3 hosts (17 osds) down
>> Reduced data availability: 97 pgs inactive, 9 pgs down
>> Degraded data redundancy: 13860144/30824413 objects degraded
>> (44.965%), 411 pgs degraded, 482 pgs undersized
>>
>>   services:
>> mon: 5 daemons, quorum cn02,cn03,cn01 (age 62m), out of quorum: cn05,
>> cn04
>> mgr: cn02.arszct(active, since 5m)
>> mds: 2/2 daemons up, 2 standby
>> osd: 35 osds: 15 up (since 62m), 24 in (since 58m); 222 remapped pgs
>>
>>   data:
>> volumes: 1/2 healthy, 1 recovering
>> pools:   8 pools, 545 pgs
>> objects: 7.71M objects, 6.7 TiB
>> usage:   15 TiB used, 39 TiB / 54 TiB avail
>> pgs: 0.367% pgs unknown
>>  17.431% pgs not active
>>  13860144/30824413 objects degraded (44.965%)
>>  1137693/30824413 objects misplaced (3.691%)
>>  280 active+undersized+degraded
>>  67  undersized+degraded+remapped+backfilling+peered
>>  57  active+undersized+remapped
>>  45  active+clean+remapped
>>  44  a

[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
Does ceph do any kind of io fencing if it notices an anomaly?  Do I need to
do something to re-enable these hosts if they get marked as bad?

On Mon, Jul 25, 2022 at 2:56 PM Jeremy Hansen 
wrote:

> MTU is the same across all hosts:
>
> - cn01.ceph.la1.clx.corp-
> enp2s0: flags=4163  mtu 9000
> inet 192.168.30.11  netmask 255.255.255.0  broadcast 192.168.30.255
> inet6 fe80::3e8c:f8ff:feed:728d  prefixlen 64  scopeid 0x20
> ether 3c:8c:f8:ed:72:8d  txqueuelen 1000  (Ethernet)
> RX packets 3163785  bytes 213625 (1.9 GiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 6890933  bytes 40233267272 (37.4 GiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> - cn02.ceph.la1.clx.corp-
> enp2s0: flags=4163  mtu 9000
> inet 192.168.30.12  netmask 255.255.255.0  broadcast 192.168.30.255
> inet6 fe80::3e8c:f8ff:feed:ff0c  prefixlen 64  scopeid 0x20
> ether 3c:8c:f8:ed:ff:0c  txqueuelen 1000  (Ethernet)
> RX packets 3976256  bytes 2761764486 (2.5 GiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 9270324  bytes 56984933585 (53.0 GiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> - cn03.ceph.la1.clx.corp-
> enp2s0: flags=4163  mtu 9000
> inet 192.168.30.13  netmask 255.255.255.0  broadcast 192.168.30.255
> inet6 fe80::3e8c:f8ff:feed:feba  prefixlen 64  scopeid 0x20
> ether 3c:8c:f8:ed:fe:ba  txqueuelen 1000  (Ethernet)
> RX packets 13081847  bytes 93614795356 (87.1 GiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 4001854  bytes 2536322435 (2.3 GiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> - cn04.ceph.la1.clx.corp-
> enp2s0: flags=4163  mtu 9000
> inet 192.168.30.14  netmask 255.255.255.0  broadcast 192.168.30.255
> inet6 fe80::3e8c:f8ff:feed:6f89  prefixlen 64  scopeid 0x20
> ether 3c:8c:f8:ed:6f:89  txqueuelen 1000  (Ethernet)
> RX packets 60018  bytes 5622542 (5.3 MiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 59889  bytes 17463794 (16.6 MiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> - cn05.ceph.la1.clx.corp-
> enp2s0: flags=4163  mtu 9000
> inet 192.168.30.15  netmask 255.255.255.0  broadcast 192.168.30.255
> inet6 fe80::3e8c:f8ff:feed:7245  prefixlen 64  scopeid 0x20
> ether 3c:8c:f8:ed:72:45  txqueuelen 1000  (Ethernet)
> RX packets 69163  bytes 8085511 (7.7 MiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 73539  bytes 17069869 (16.2 MiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> - cn06.ceph.la1.clx.corp-
> enp2s0: flags=4163  mtu 9000
> inet 192.168.30.16  netmask 255.255.255.0  broadcast 192.168.30.255
> inet6 fe80::3e8c:f8ff:feed:feab  prefixlen 64  scopeid 0x20
> ether 3c:8c:f8:ed:fe:ab  txqueuelen 1000  (Ethernet)
> RX packets 23570  bytes 2251531 (2.1 MiB)
> RX errors 0  dropped 0  overruns 0  frame 0
> TX packets 22268  bytes 16186794 (15.4 MiB)
> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>
> 10G.
>
> On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond 
> wrote:
>
>> Is the MTU in n the new rack set correctly?
>>
>> On Mon, 25 Jul 2022, 11:30 Jeremy Hansen, 
>> wrote:
>>
>>> I transitioned some servers to a new rack and now I'm having major issues
>>> with Ceph upon bringing things back up.
>>>
>>> I believe the issue may be related to the ceph nodes coming back up with
>>> different IPs before VLANs were set.  That's just a guess because I can't
>>> think of any other reason this would happen.
>>>
>>> Current state:
>>>
>>> Every 2.0s: ceph -s
>>>cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022
>>>
>>>   cluster:
>>> id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d
>>> health: HEALTH_WARN
>>> 1 filesystem is degraded
>>> 2 MDSs report slow metadata IOs
>>> 2/5 mons down, quorum cn02,cn03,cn01
>>> 9 osds down
>>> 3 hosts (17 osds) down
>>> Reduced data availability: 97 pgs inactive, 9 pgs down
>>> Degraded data redundancy: 13860144/30824413 objects degraded
>>> (44.965%), 411 pgs degraded, 482 pgs undersized
>>>
>>>   services:
>>> mon: 5 daemons, quorum cn02,cn03,cn01 (age 62m), out of quorum: cn05,
>>> cn04
>>> mgr: cn02.arszct(active, since 5m)
>>> mds: 2/2 daemons up, 2 standby
>>> osd: 35 osds: 15 up (since 62m), 24 in (since 58m); 222 remapped pgs
>>>
>>>   data:
>>> volumes: 1/2 healthy, 1 recovering
>>> pools:   8 pools, 545 pgs
>>> objects: 7.71M objects, 6.7 TiB
>>> usage:   15 TiB used, 39 TiB / 54 TiB avail
>>> pgs: 0.367% pgs unknown
>>>  17.4

[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
That results in packet loss:

[root@cn01 ~]# ping -M do -s 8972 192.168.30.14
PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data.
^C
--- 192.168.30.14 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2062ms

That's very weird...  but this gives me something to figure out.  Hmmm.
Thank you.

On Mon, Jul 25, 2022 at 3:01 PM Sean Redmond 
wrote:

> Looks good, just confirm it with a large ping with don't fragment flag set
> between each host.
>
> ping -M do -s 8972 [destination IP]
>
>
> On Mon, 25 Jul 2022, 22:56 Jeremy Hansen, 
> wrote:
>
>> MTU is the same across all hosts:
>>
>> - cn01.ceph.la1.clx.corp-
>> enp2s0: flags=4163  mtu 9000
>> inet 192.168.30.11  netmask 255.255.255.0  broadcast
>> 192.168.30.255
>> inet6 fe80::3e8c:f8ff:feed:728d  prefixlen 64  scopeid 0x20
>> ether 3c:8c:f8:ed:72:8d  txqueuelen 1000  (Ethernet)
>> RX packets 3163785  bytes 213625 (1.9 GiB)
>> RX errors 0  dropped 0  overruns 0  frame 0
>> TX packets 6890933  bytes 40233267272 (37.4 GiB)
>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> - cn02.ceph.la1.clx.corp-
>> enp2s0: flags=4163  mtu 9000
>> inet 192.168.30.12  netmask 255.255.255.0  broadcast
>> 192.168.30.255
>> inet6 fe80::3e8c:f8ff:feed:ff0c  prefixlen 64  scopeid 0x20
>> ether 3c:8c:f8:ed:ff:0c  txqueuelen 1000  (Ethernet)
>> RX packets 3976256  bytes 2761764486 (2.5 GiB)
>> RX errors 0  dropped 0  overruns 0  frame 0
>> TX packets 9270324  bytes 56984933585 (53.0 GiB)
>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> - cn03.ceph.la1.clx.corp-
>> enp2s0: flags=4163  mtu 9000
>> inet 192.168.30.13  netmask 255.255.255.0  broadcast
>> 192.168.30.255
>> inet6 fe80::3e8c:f8ff:feed:feba  prefixlen 64  scopeid 0x20
>> ether 3c:8c:f8:ed:fe:ba  txqueuelen 1000  (Ethernet)
>> RX packets 13081847  bytes 93614795356 (87.1 GiB)
>> RX errors 0  dropped 0  overruns 0  frame 0
>> TX packets 4001854  bytes 2536322435 (2.3 GiB)
>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> - cn04.ceph.la1.clx.corp-
>> enp2s0: flags=4163  mtu 9000
>> inet 192.168.30.14  netmask 255.255.255.0  broadcast
>> 192.168.30.255
>> inet6 fe80::3e8c:f8ff:feed:6f89  prefixlen 64  scopeid 0x20
>> ether 3c:8c:f8:ed:6f:89  txqueuelen 1000  (Ethernet)
>> RX packets 60018  bytes 5622542 (5.3 MiB)
>> RX errors 0  dropped 0  overruns 0  frame 0
>> TX packets 59889  bytes 17463794 (16.6 MiB)
>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> - cn05.ceph.la1.clx.corp-
>> enp2s0: flags=4163  mtu 9000
>> inet 192.168.30.15  netmask 255.255.255.0  broadcast
>> 192.168.30.255
>> inet6 fe80::3e8c:f8ff:feed:7245  prefixlen 64  scopeid 0x20
>> ether 3c:8c:f8:ed:72:45  txqueuelen 1000  (Ethernet)
>> RX packets 69163  bytes 8085511 (7.7 MiB)
>> RX errors 0  dropped 0  overruns 0  frame 0
>> TX packets 73539  bytes 17069869 (16.2 MiB)
>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> - cn06.ceph.la1.clx.corp-
>> enp2s0: flags=4163  mtu 9000
>> inet 192.168.30.16  netmask 255.255.255.0  broadcast
>> 192.168.30.255
>> inet6 fe80::3e8c:f8ff:feed:feab  prefixlen 64  scopeid 0x20
>> ether 3c:8c:f8:ed:fe:ab  txqueuelen 1000  (Ethernet)
>> RX packets 23570  bytes 2251531 (2.1 MiB)
>> RX errors 0  dropped 0  overruns 0  frame 0
>> TX packets 22268  bytes 16186794 (15.4 MiB)
>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>
>> 10G.
>>
>> On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond 
>> wrote:
>>
>>> Is the MTU in n the new rack set correctly?
>>>
>>> On Mon, 25 Jul 2022, 11:30 Jeremy Hansen, 
>>> wrote:
>>>
 I transitioned some servers to a new rack and now I'm having major
 issues
 with Ceph upon bringing things back up.

 I believe the issue may be related to the ceph nodes coming back up with
 different IPs before VLANs were set.  That's just a guess because I
 can't
 think of any other reason this would happen.

 Current state:

 Every 2.0s: ceph -s
cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022

   cluster:
 id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d
 health: HEALTH_WARN
 1 filesystem is degraded
 2 MDSs report slow metadata IOs
 2/5 mons down, quorum cn02,cn03,cn01
 9 osds down
 3 hosts (17 osds) down
 Reduced data availability: 97 pgs inactive, 9 pgs down
 Degraded data redundancy: 13860144/30824413 objects degraded
 (44.965%), 411 pgs degraded, 482 pgs undersized

[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Sean Redmond
Yea, assuming you can ping with a lower MTU, check the MTU on your
switching.

On Mon, 25 Jul 2022, 23:05 Jeremy Hansen, 
wrote:

> That results in packet loss:
>
> [root@cn01 ~]# ping -M do -s 8972 192.168.30.14
> PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data.
> ^C
> --- 192.168.30.14 ping statistics ---
> 3 packets transmitted, 0 received, 100% packet loss, time 2062ms
>
> That's very weird...  but this gives me something to figure out.  Hmmm.
> Thank you.
>
> On Mon, Jul 25, 2022 at 3:01 PM Sean Redmond 
> wrote:
>
>> Looks good, just confirm it with a large ping with don't fragment flag
>> set between each host.
>>
>> ping -M do -s 8972 [destination IP]
>>
>>
>> On Mon, 25 Jul 2022, 22:56 Jeremy Hansen, 
>> wrote:
>>
>>> MTU is the same across all hosts:
>>>
>>> - cn01.ceph.la1.clx.corp-
>>> enp2s0: flags=4163  mtu 9000
>>> inet 192.168.30.11  netmask 255.255.255.0  broadcast
>>> 192.168.30.255
>>> inet6 fe80::3e8c:f8ff:feed:728d  prefixlen 64  scopeid 0x20
>>> ether 3c:8c:f8:ed:72:8d  txqueuelen 1000  (Ethernet)
>>> RX packets 3163785  bytes 213625 (1.9 GiB)
>>> RX errors 0  dropped 0  overruns 0  frame 0
>>> TX packets 6890933  bytes 40233267272 (37.4 GiB)
>>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>
>>> - cn02.ceph.la1.clx.corp-
>>> enp2s0: flags=4163  mtu 9000
>>> inet 192.168.30.12  netmask 255.255.255.0  broadcast
>>> 192.168.30.255
>>> inet6 fe80::3e8c:f8ff:feed:ff0c  prefixlen 64  scopeid 0x20
>>> ether 3c:8c:f8:ed:ff:0c  txqueuelen 1000  (Ethernet)
>>> RX packets 3976256  bytes 2761764486 (2.5 GiB)
>>> RX errors 0  dropped 0  overruns 0  frame 0
>>> TX packets 9270324  bytes 56984933585 (53.0 GiB)
>>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>
>>> - cn03.ceph.la1.clx.corp-
>>> enp2s0: flags=4163  mtu 9000
>>> inet 192.168.30.13  netmask 255.255.255.0  broadcast
>>> 192.168.30.255
>>> inet6 fe80::3e8c:f8ff:feed:feba  prefixlen 64  scopeid 0x20
>>> ether 3c:8c:f8:ed:fe:ba  txqueuelen 1000  (Ethernet)
>>> RX packets 13081847  bytes 93614795356 (87.1 GiB)
>>> RX errors 0  dropped 0  overruns 0  frame 0
>>> TX packets 4001854  bytes 2536322435 (2.3 GiB)
>>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>
>>> - cn04.ceph.la1.clx.corp-
>>> enp2s0: flags=4163  mtu 9000
>>> inet 192.168.30.14  netmask 255.255.255.0  broadcast
>>> 192.168.30.255
>>> inet6 fe80::3e8c:f8ff:feed:6f89  prefixlen 64  scopeid 0x20
>>> ether 3c:8c:f8:ed:6f:89  txqueuelen 1000  (Ethernet)
>>> RX packets 60018  bytes 5622542 (5.3 MiB)
>>> RX errors 0  dropped 0  overruns 0  frame 0
>>> TX packets 59889  bytes 17463794 (16.6 MiB)
>>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>
>>> - cn05.ceph.la1.clx.corp-
>>> enp2s0: flags=4163  mtu 9000
>>> inet 192.168.30.15  netmask 255.255.255.0  broadcast
>>> 192.168.30.255
>>> inet6 fe80::3e8c:f8ff:feed:7245  prefixlen 64  scopeid 0x20
>>> ether 3c:8c:f8:ed:72:45  txqueuelen 1000  (Ethernet)
>>> RX packets 69163  bytes 8085511 (7.7 MiB)
>>> RX errors 0  dropped 0  overruns 0  frame 0
>>> TX packets 73539  bytes 17069869 (16.2 MiB)
>>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>
>>> - cn06.ceph.la1.clx.corp-
>>> enp2s0: flags=4163  mtu 9000
>>> inet 192.168.30.16  netmask 255.255.255.0  broadcast
>>> 192.168.30.255
>>> inet6 fe80::3e8c:f8ff:feed:feab  prefixlen 64  scopeid 0x20
>>> ether 3c:8c:f8:ed:fe:ab  txqueuelen 1000  (Ethernet)
>>> RX packets 23570  bytes 2251531 (2.1 MiB)
>>> RX errors 0  dropped 0  overruns 0  frame 0
>>> TX packets 22268  bytes 16186794 (15.4 MiB)
>>> TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
>>>
>>> 10G.
>>>
>>> On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond 
>>> wrote:
>>>
 Is the MTU in n the new rack set correctly?

 On Mon, 25 Jul 2022, 11:30 Jeremy Hansen, <
 farnsworth.mcfad...@gmail.com> wrote:

> I transitioned some servers to a new rack and now I'm having major
> issues
> with Ceph upon bringing things back up.
>
> I believe the issue may be related to the ceph nodes coming back up
> with
> different IPs before VLANs were set.  That's just a guess because I
> can't
> think of any other reason this would happen.
>
> Current state:
>
> Every 2.0s: ceph -s
>cn01.ceph.la1.clx.corp: Mon Jul 25 10:13:05 2022
>
>   cluster:
> id: bfa2ad58-c049-11eb-9098-3c8cf8ed728d
> health: HEALTH_WARN
> 1 filesystem is degraded
> 2 MDSs report slow metadata IOs
> 2/5 mons d

[ceph-users] Re: octopus v15.2.17 QE Validation status

2022-07-25 Thread Neha Ojha
Hello Frank,

15.2.17 includes
https://github.com/ceph/ceph/pull/46611/commits/263e0fa6b3e6e1d6e7b382923a1d586d9d1ffa1b,
which adds capability in the ceph-objectstore-tool to trim the dup ops
that led to memory growth in https://tracker.ceph.com/issues/53729.
The complete fix is being tested in
https://github.com/ceph/ceph/pull/47047, but given that 15.2.17 is the
EOL release for Octopus, we don't want to risk merging a premature
patch in it.
We will be shipping the complete fix in a Pacific point release, so
users will have an option to upgrade to a fixed version of Pacific.

I hope this helps.

Thanks,
Neha


On Mon, Jul 25, 2022 at 1:49 PM Frank Schilder  wrote:
>
> Just a question about this bug-fix release. Will it contain the patch for the 
> pg-dup memory leak (https://tracker.ceph.com/issues/53729, 
> https://www.clyso.com/blog/osds-with-unlimited-ram-growth/)? Its a really 
> nasty problem and I'm waiting for this to show up in octopus.
>
> Thanks and best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Neha Ojha 
> Sent: 22 July 2022 19:06
> To: Yuri Weinstein
> Cc: dev; ceph-users
> Subject: [ceph-users] Re: octopus v15.2.17 QE Validation status
>
> On Thu, Jul 21, 2022 at 8:47 AM Ilya Dryomov  wrote:
> >
> > On Thu, Jul 21, 2022 at 4:24 PM Yuri Weinstein  wrote:
> > >
> > > Details of this release are summarized here:
> > >
> > > https://tracker.ceph.com/issues/56484
> > > Release Notes - https://github.com/ceph/ceph/pull/47198
> > >
> > > Seeking approvals for:
> > >
> > > rados - Neha, Travis, Ernesto, Adam
>
> rados approved!
> known issue https://tracker.ceph.com/issues/55854
>
> Thanks,
> Neha
>
> >
> > > rgw - Casey
> > > fs, kcephfs, multimds - Venky, Patrick
> > > rbd - Ilya, Deepika
> > > krbd  Ilya, Deepika
> >
> > rbd and krbd approved.
> >
> > Thanks,
> >
> > Ilya
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Issues after a shutdown

2022-07-25 Thread Jeremy Hansen
I use Ubiquiti equipment, mainly because I'm not a network admin...  I
rebooted the 10G switches and now everything is working and recovering.  I
hate when there's not a definitive answer but that's kind of the deal when
you use Ubiquiti stuff.  Thank you Sean and Frank.  Frank, you were right.
It made no sense because from a very basic point of view the network seemed
fine, but Sean's ping revealed that it clearly wasn't.

Thank you!
-jeremy


On Mon, Jul 25, 2022 at 3:08 PM Sean Redmond 
wrote:

> Yea, assuming you can ping with a lower MTU, check the MTU on your
> switching.
>
> On Mon, 25 Jul 2022, 23:05 Jeremy Hansen, 
> wrote:
>
>> That results in packet loss:
>>
>> [root@cn01 ~]# ping -M do -s 8972 192.168.30.14
>> PING 192.168.30.14 (192.168.30.14) 8972(9000) bytes of data.
>> ^C
>> --- 192.168.30.14 ping statistics ---
>> 3 packets transmitted, 0 received, 100% packet loss, time 2062ms
>>
>> That's very weird...  but this gives me something to figure out.  Hmmm.
>> Thank you.
>>
>> On Mon, Jul 25, 2022 at 3:01 PM Sean Redmond 
>> wrote:
>>
>>> Looks good, just confirm it with a large ping with don't fragment flag
>>> set between each host.
>>>
>>> ping -M do -s 8972 [destination IP]
>>>
>>>
>>> On Mon, 25 Jul 2022, 22:56 Jeremy Hansen, 
>>> wrote:
>>>
 MTU is the same across all hosts:

 - cn01.ceph.la1.clx.corp-
 enp2s0: flags=4163  mtu 9000
 inet 192.168.30.11  netmask 255.255.255.0  broadcast
 192.168.30.255
 inet6 fe80::3e8c:f8ff:feed:728d  prefixlen 64  scopeid
 0x20
 ether 3c:8c:f8:ed:72:8d  txqueuelen 1000  (Ethernet)
 RX packets 3163785  bytes 213625 (1.9 GiB)
 RX errors 0  dropped 0  overruns 0  frame 0
 TX packets 6890933  bytes 40233267272 (37.4 GiB)
 TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 - cn02.ceph.la1.clx.corp-
 enp2s0: flags=4163  mtu 9000
 inet 192.168.30.12  netmask 255.255.255.0  broadcast
 192.168.30.255
 inet6 fe80::3e8c:f8ff:feed:ff0c  prefixlen 64  scopeid
 0x20
 ether 3c:8c:f8:ed:ff:0c  txqueuelen 1000  (Ethernet)
 RX packets 3976256  bytes 2761764486 (2.5 GiB)
 RX errors 0  dropped 0  overruns 0  frame 0
 TX packets 9270324  bytes 56984933585 (53.0 GiB)
 TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 - cn03.ceph.la1.clx.corp-
 enp2s0: flags=4163  mtu 9000
 inet 192.168.30.13  netmask 255.255.255.0  broadcast
 192.168.30.255
 inet6 fe80::3e8c:f8ff:feed:feba  prefixlen 64  scopeid
 0x20
 ether 3c:8c:f8:ed:fe:ba  txqueuelen 1000  (Ethernet)
 RX packets 13081847  bytes 93614795356 (87.1 GiB)
 RX errors 0  dropped 0  overruns 0  frame 0
 TX packets 4001854  bytes 2536322435 (2.3 GiB)
 TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 - cn04.ceph.la1.clx.corp-
 enp2s0: flags=4163  mtu 9000
 inet 192.168.30.14  netmask 255.255.255.0  broadcast
 192.168.30.255
 inet6 fe80::3e8c:f8ff:feed:6f89  prefixlen 64  scopeid
 0x20
 ether 3c:8c:f8:ed:6f:89  txqueuelen 1000  (Ethernet)
 RX packets 60018  bytes 5622542 (5.3 MiB)
 RX errors 0  dropped 0  overruns 0  frame 0
 TX packets 59889  bytes 17463794 (16.6 MiB)
 TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 - cn05.ceph.la1.clx.corp-
 enp2s0: flags=4163  mtu 9000
 inet 192.168.30.15  netmask 255.255.255.0  broadcast
 192.168.30.255
 inet6 fe80::3e8c:f8ff:feed:7245  prefixlen 64  scopeid
 0x20
 ether 3c:8c:f8:ed:72:45  txqueuelen 1000  (Ethernet)
 RX packets 69163  bytes 8085511 (7.7 MiB)
 RX errors 0  dropped 0  overruns 0  frame 0
 TX packets 73539  bytes 17069869 (16.2 MiB)
 TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 - cn06.ceph.la1.clx.corp-
 enp2s0: flags=4163  mtu 9000
 inet 192.168.30.16  netmask 255.255.255.0  broadcast
 192.168.30.255
 inet6 fe80::3e8c:f8ff:feed:feab  prefixlen 64  scopeid
 0x20
 ether 3c:8c:f8:ed:fe:ab  txqueuelen 1000  (Ethernet)
 RX packets 23570  bytes 2251531 (2.1 MiB)
 RX errors 0  dropped 0  overruns 0  frame 0
 TX packets 22268  bytes 16186794 (15.4 MiB)
 TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

 10G.

 On Mon, Jul 25, 2022 at 2:51 PM Sean Redmond 
 wrote:

> Is the MTU in n the new rack set correctly?
>
> On Mon, 25 Jul 2022, 11:30 Jeremy Hansen, <
> farnsworth.mcfad...@gmail.com> wrote:
>
>> I transitioned some servers to a new rack and now I'm h

[ceph-users] Re: octopus v15.2.17 QE Validation status

2022-07-25 Thread Neha Ojha
On Mon, Jul 25, 2022 at 3:48 PM Neha Ojha  wrote:
>
> Hello Frank,
>
> 15.2.17 includes
> https://github.com/ceph/ceph/pull/46611/commits/263e0fa6b3e6e1d6e7b382923a1d586d9d1ffa1b,
> which adds capability in the ceph-objectstore-tool to trim the dup ops
> that led to memory growth in https://tracker.ceph.com/issues/53729.
> The complete fix is being tested in
> https://github.com/ceph/ceph/pull/47047, but given that 15.2.17 is the

slight correction, https://github.com/ceph/ceph/pull/47046 is the PR I
meant to link

> EOL release for Octopus, we don't want to risk merging a premature
> patch in it.
> We will be shipping the complete fix in a Pacific point release, so
> users will have an option to upgrade to a fixed version of Pacific.
>
> I hope this helps.
>
> Thanks,
> Neha
>
>
> On Mon, Jul 25, 2022 at 1:49 PM Frank Schilder  wrote:
> >
> > Just a question about this bug-fix release. Will it contain the patch for 
> > the pg-dup memory leak (https://tracker.ceph.com/issues/53729, 
> > https://www.clyso.com/blog/osds-with-unlimited-ram-growth/)? Its a really 
> > nasty problem and I'm waiting for this to show up in octopus.
> >
> > Thanks and best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: Neha Ojha 
> > Sent: 22 July 2022 19:06
> > To: Yuri Weinstein
> > Cc: dev; ceph-users
> > Subject: [ceph-users] Re: octopus v15.2.17 QE Validation status
> >
> > On Thu, Jul 21, 2022 at 8:47 AM Ilya Dryomov  wrote:
> > >
> > > On Thu, Jul 21, 2022 at 4:24 PM Yuri Weinstein  
> > > wrote:
> > > >
> > > > Details of this release are summarized here:
> > > >
> > > > https://tracker.ceph.com/issues/56484
> > > > Release Notes - https://github.com/ceph/ceph/pull/47198
> > > >
> > > > Seeking approvals for:
> > > >
> > > > rados - Neha, Travis, Ernesto, Adam
> >
> > rados approved!
> > known issue https://tracker.ceph.com/issues/55854
> >
> > Thanks,
> > Neha
> >
> > >
> > > > rgw - Casey
> > > > fs, kcephfs, multimds - Venky, Patrick
> > > > rbd - Ilya, Deepika
> > > > krbd  Ilya, Deepika
> > >
> > > rbd and krbd approved.
> > >
> > > Thanks,
> > >
> > > Ilya
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> > >
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Quincy full osd(s)

2022-07-25 Thread Nigel Williams
Hi Wesley, thank you for the follow up.

Anthony D'Atri kindly helped me out with some guidance and advice and we
believe the problem is resolved now.

This was a brand new install of a Quincy cluster and I made the mistake of
presuming that autoscale would adjust the PGs as required, however it never
kicked into action. I then neglected to check the pools were configured
correctly and was surprised that after copying 50TB to the cluster it was
full. I went down the wrong rabbit-holes to diagnose, Anthony got me back
on track, thanks Anthony!

During this process Anthony also identified a documentation/setup issue and
will be doing some PRs to improve the documentation, the gist of it is that
the device-class needs to be set on the .mgr/.nfs pools in order for
autoscale-status to produce any output.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Two osd's assigned to one device

2022-07-25 Thread Jeremy Hansen
I have a situation (not sure how it happened), but Ceph believe I have two
OSD's assigned to a single device.

I tried to delete osd.2 and osd.3, but it just hangs.  I'm also trying to
zap sdc, which claims it does not have an osd, but I'm unable to zap it.
Any suggestions?


/dev/sdb
HDD
TOSHIBA
MG04SCA40EE
3.6 TiB
osd.2 osd.3
/dev/sdc
SSD
SAMSUNG
MZILT3T8HBLS/007
3.5 TiB
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] 1 stray daemon(s) not managed by cephadm

2022-07-25 Thread Jeremy Hansen
How do I track down what is the stray daemon?

Thanks
-jeremy
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 1 stray daemon(s) not managed by cephadm

2022-07-25 Thread Adam King
Usually it's pretty explicit in "ceph health detail". What does it say
there?

On Mon, Jul 25, 2022 at 9:05 PM Jeremy Hansen 
wrote:

> How do I track down what is the stray daemon?
>
> Thanks
> -jeremy
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io