[ceph-users] recovery from catastrophic mon and mds failure after reboot and ip address change

2022-06-27 Thread Florian Jonas

Dear experts,

we have a small computing cluster with 21 OSDs and 3 monitors and 3MDS 
running on ceph version 13.2.10 on ubuntu 18.04. A few  days ago we had 
an unexpected reboot of all machines, as well as a change of the IP 
address of one machine, which was hosting a MDS as well as a monitor. I 
am not exactly sure what played out during that night, but we lost 
quorum of all three monitors and no filesystem was visible anymore, so 
we are starting to get quite worried about data loss. We tried 
destroying and recreating the monitor of which the ip address changed, 
but it did not help (which however might have been a mistake).


Long story short, we tried to recover restoring by adapting the changed 
ip address in the config and tried to recover the monitors using the 
information from the OSDs, following the procedure outline here:


https://docs.ceph.com/en/latest/rados/troubleshooting/troubleshooting-mon/#mon-store-recovery-using-osds

We are now in a situation where ceph status shows the following:

  cluster:
    id: 61fd9a61-89d6-4383-a2e6-ec4f4a13830f
    health: HEALTH_WARN
    43 slow ops, oldest one blocked for 57132 sec, daemons 
[mon.dip01,mon.pc078,mon.pc147] have slow ops.


  services:
    mon: 3 daemons, quorum pc147,pc078,dip01
    mgr: dip01(active)
    osd: 22 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:

The monitors show a quorum (i think that's a good start), but we do not 
see any of the pools that were previously there and also no filesystem 
is visible. Running the command "ceph fs status" shows all MDS are in 
standby and no filesystem is activated.


I looked into the HEALTH_WARNING, by checking the journalctl -xe on the 
monitor machines and one finds errors of the type:


Jun 24 09:10:30 dip01 ceph-mon[69148]: 2022-06-24 09:10:30.978 
7f0173e02700 -1 mon.dip01@2(peon) e15 get_health_metrics reporting 4 
slow ops, oldest is osd_boot(osd.12 booted 0 features 
4611087854031667195 v13031)


In order to check what is going on with the osd_boot error, i checked 
the logs on the osd machines and found warning such as:


2022-06-24 09:16:42.383 7fdc165d5c00  0  
/build/ceph-13.2.10/src/cls/cephfs/cls_cephfs.cc:197: loading cephfs

2022-06-24 09:16:42.383 7fdc165d5c00  0 _get_class not permitted to load kvs
2022-06-24 09:16:42.383 7fdc165d5c00  0  
/build/ceph-13.2.10/src/cls/hello/cls_hello.cc:296: loading cls_hello

2022-06-24 09:16:42.383 7fdc165d5c00  0 _get_class not permitted to load lua
2022-06-24 09:16:42.387 7fdc165d5c00  0 _get_class not permitted to load sdk
2022-06-24 09:16:42.387 7fdc165d5c00  1 osd.6 13035 warning: got an 
error loading one or more classes: (1) Operation not permitted
2022-06-24 09:16:42.387 7fdc165d5c00  0 osd.6 13035 crush map has 
features 288514051259236352, adjusting msgr requires for clients
2022-06-24 09:16:42.387 7fdc165d5c00  0 osd.6 13035 crush map has 
features 288514051259236352 was 8705, adjusting msgr requires for mons
2022-06-24 09:16:42.387 7fdc165d5c00  0 osd.6 13035 crush map has 
features 1009089991638532096, adjusting msgr requires for osds
2022-06-24 09:16:42.387 7fdc165d5c00  1 osd.6 13035 
check_osdmap_features require_osd_release 0 ->

2022-06-24 09:16:44.527 7fdc165d5c00  0 osd.6 13035 load_pgs
2022-06-24 09:16:50.375 7fdc165d5c00  0 osd.6 13035 load_pgs opened 67 pgs
2022-06-24 09:16:50.375 7fdc165d5c00  0 osd.6 13035 using 
weightedpriority op queue with priority op cut off at 64.
2022-06-24 09:16:50.375 7fdc165d5c00 -1 osd.6 13035 log_to_monitors 
{default=true}
2022-06-24 09:16:50.383 7fdc165d5c00  0 osd.6 13035 done with init, 
starting boot process

2022-06-24 09:16:50.383 7fdc165d5c00  1 osd.6 13035 start_boot
2022-06-24 09:16:50.495 7fdbec933700  1 osd.6 pg_epoch: 13035 pg[5.1( v 
2785'2 (0'0,2785'2] local-lis/les=12997/12999 n=1 ec=2782/2782 lis/c 
12997/12997 les/c/f 12999/12999/0 12997/12997/12954) [6,17,14] r=0 
lpr=13021 crt=2785'2 lcod 0'0 mlcod 0'0 unknown mbc={}] state: 
transitioning to Primary


The 21 OSDs themselves show as "exists,new" in ceph osd status, even 
though they remained untouched during the whole incident (which I hope 
means they still contain all our data somewhere)


We only started operating our distributed filesystem about one year ago, 
and I must admit with this problem we are a bit out of our depth, so we 
would very much would appreciate any leads/help we can get on getting 
our filesystem up and running again. Alternatively, if all else fails, 
we would also appreciate any information about the possibility of 
recovering the data from the 21 OSDs, which amounts to over 60TB.


Attached you find our ceph.conf file, as well as the logs from one 
example monitor and one osd node. If you need any other information let 
us know.


Thank you in advance for you help, I know your time is valuable!

Best regards,

Florian Jonas

p.s. to the moderators: This message is a resubmit with smaller log

[ceph-users] Re: Conversion to Cephadm

2022-06-27 Thread Redouane Kachach Elhichou
>From the error message:

2022-06-25 21:51:59,798 7f4748727b80 INFO /usr/bin/ceph-mon: stderr too many
arguments:
[--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true]

it seems that you are not using the cephadm that corresponds to your ceph
version. Please, try to get cephadm for octopus.

-Redo

On Sun, Jun 26, 2022 at 4:07 AM Brent Kennedy  wrote:

> I successfully converted to cephadm after upgrading the cluster to octopus.
> I am on CentOS 7 and am attempting to convert some of the nodes over to
> rocky, but when I try to add a rocky node in and start the mgr or mon
> service, it tries to start an octopus container and the service comes back
> with an error.  Is there a way to force it to start a quincy container on
> the new host?
>
>
>
> I tried to start an upgrade, which did deploy the manager nodes to the new
> hosts, but it failed converting the monitors and now one is dead ( a centos
> 7 one ).  It seems it can spin up quincy containers on the new nodes, but
> because it failed upgrading, it still trying to deploy the octopus ones to
> the new node.
>
>
>
> Cephadm log on new node:
>
>
>
> 2022-06-25 21:51:34,427 7f4748727b80 DEBUG stat: Copying blob
> sha256:7a0437f04f83f084b7ed68ad9c4a4947e12fc4e1b006b38129bac89114ec3621
>
> 2022-06-25 21:51:34,647 7f4748727b80 DEBUG stat: Copying blob
> sha256:7a0437f04f83f084b7ed68ad9c4a4947e12fc4e1b006b38129bac89114ec3621
>
> 2022-06-25 21:51:34,652 7f4748727b80 DEBUG stat: Copying blob
> sha256:731c3beff4deece7d4e54bc26ecf6d99988b19ea8414524277d83bc5a5d6eb70
>
> 2022-06-25 21:51:59,006 7f4748727b80 DEBUG stat: Copying config
> sha256:2cf504fded3980c76b59a354fca8f301941f86e369215a08752874d1ddb69b73
>
> 2022-06-25 21:51:59,008 7f4748727b80 DEBUG stat: Writing manifest to image
> destination
>
> 2022-06-25 21:51:59,008 7f4748727b80 DEBUG stat: Storing signatures
>
> 2022-06-25 21:51:59,239 7f4748727b80 DEBUG stat: 167 167
>
> 2022-06-25 21:51:59,703 7f4748727b80 DEBUG /usr/bin/ceph-mon: too many
> arguments:
> [--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true]
>
> 2022-06-25 21:51:59,797 7f4748727b80 INFO Non-zero exit code 1 from
> /bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host
> --entrypoint /usr/bin/ceph-mon --init -e
> CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=tpixmon5 -e
> CEPH_USE_RANDOM_NONCE=1 -v
> /var/log/ceph/33ca8009-79d6-45cf-a67e-9753ab4dc861:/var/log/ceph:z -v
>
> /var/lib/ceph/33ca8009-79d6-45cf-a67e-9753ab4dc861/mon.tpixmon5:/var/lib/cep
> h/mon/ceph-tpixmon5:z -v /tmp/ceph-tmp7xmra8lk:/tmp/keyring:z -v
> /tmp/ceph-tmp7mid2k57:/tmp/config:z docker.io/ceph/ceph:v15 --mkfs -i
> tpixmon5 --fsid 33ca8009-79d6-45cf-a67e-9753ab4dc861 -c /tmp/config
> --keyring /tmp/keyring --setuser ceph --setgroup ceph
> --default-log-to-file=false --default-log-to-journald=true
> --default-log-to-stderr=false --default-mon-cluster-log-to-file=false
> --default-mon-cluster-log-to-journald=true
> --default-mon-cluster-log-to-stderr=false
>
> 2022-06-25 21:51:59,798 7f4748727b80 INFO /usr/bin/ceph-mon: stderr too
> many
> arguments:
> [--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true]
>
>
>
> Podman Images:
>
> REPOSITORY   TAG IMAGE ID  CREATEDSIZE
>
> quay.io/ceph/ceph  e1d6a67b021e  2 weeks ago1.32 GB
>
> docker.io/ceph/ceph  v15 2cf504fded39  13 months ago  1.05 GB
>
>
>
> I don't even know what that top one is because its not tagged and it
> keeping
> pulling it.  Why would it be pulling a docker.io image ( only place to get
> octopus images? )?
>
>
>
> I also tried to force upgrade the older failed monitor but the cephadm tool
> says that the OS is too old.  Its just odd to me that we would say go to
> containers cause the OS wont matter and then it actually still matters
> cause
> podman versions tied to newer images.
>
>
>
> -Brent
>
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: scrubbing+deep+repair PGs since Upgrade

2022-06-27 Thread Marcus Müller
Hi Stefan,

thanks for the fast reply. I did some research and have the following output:

~ $ rados list-inconsistent-pg {pool-name1}
[]
 ~ $ rados list-inconsistent-pg {pool-name2}
[]
 ~ $ rados list-inconsistent-pg {pool-name3}
[]

—

 ~ $ rados list-inconsistent-obj 7.989
{"epoch":3006349,"inconsistents":[]}

~ $ rados list-inconsistent-obj 7.28f
{"epoch":3006337,"inconsistents":[]}

 ~ $ rados list-inconsistent-obj 7.603
{"epoch":3006329,"inconsistents":[]}


 ~ $ ceph config dump |grep osd_scrub_auto_repair 

Is empty 

 $ ceph daemon mon.ceph4 config get osd_scrub_auto_repair
{
"osd_scrub_auto_repair": "true"
}

What does this tell me know? Setting can be changed to false of course, but as 
list-inconsistent-obj shows something, I would like to find the reason for that 
first.

Regards
Marcus


> Am 27.06.2022 um 08:56 schrieb Stefan Kooman :
> 
> On 6/27/22 08:48, Marcus Müller wrote:
>> Hi all,
>> we recently upgraded from Ceph Luminous (12.x) to Ceph Octopus (15.x) (of 
>> course with Mimic and Nautilus in between). Since this upgrade we see we 
>> constant number of active+clean+scrubbing+deep+repair PGs. We never had this 
>> in the past, now every time (like 10 or 20 PGs at the same time with the 
>> +repair flag).
>> Does anyone know how to debug this more in detail ?
> 
> ceph daemon mon.$mon-id config get osd_scrub_auto_repair
> 
> ^^ This is disabled by default (Octopus 15.2.16), but you might have this 
> setting changed to true?
> 
> ceph config dump |grep osd_scrub_auto_repair to check if it's a global 
> setting.
> 
> Do the following commands return any info?
> 
> rados list-inconsistent-pg
> rados list-inconsistent-obj
> 
> Gr. Stefan

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] recommended Linux distro for Ceph Pacific small cluster

2022-06-27 Thread Bobby
Hi,

What is the recommended Linux distro for Ceph Pacific. I would like to set
up a small cluster having around 4-5 OSDs, one monitor node and one client
node.
Earlier I have been using CentOS. Is it recommended to continue with
CentOS? or should I go for another distro? Please do comment.

Looking forward to the reply.

Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephadm orch thinks hosts are offline

2022-06-27 Thread Thomas Roth

Hi Adam,

no, this is the 'feature' where the reboot of a mgr hosts causes all known 
hosts to become unmanaged.


> # lxbk0375 # ceph cephadm check-host lxbk0374 10.20.2.161
> mgr.server reply reply (1) Operation not permitted check-host failed:
> Host 'lxbk0374' not found. Use 'ceph orch host ls' to see all managed hosts.

In some email on this issue I can't find atm, someone describes a workaround that allows to restart 
the entire orchestrator business.

But that sounded risky.

Regards
Thomsa


On 23/06/2022 19.42, Adam King wrote:

Hi Thomas,

What happens if you run "ceph cephadm check-host " for one of the
hosts that is offline (and if that fails "ceph cephadm check-host
 ")? Usually, the hosts get marked offline when some ssh
connection to them fails. The check-host command will attempt a connection
and maybe let us see why it's failing, or, if there is no longer an issue
connecting to the host, should mark the host online again.

Thanks,
   - Adam King

On Thu, Jun 23, 2022 at 12:30 PM Thomas Roth  wrote:


Hi all,

found this bug https://tracker.ceph.com/issues/51629  (Octopus 15.2.13),
reproduced it in Pacific and
now again in Quincy:
- new cluster
- 3 mgr nodes
- reboot active mgr node
- (only in Quincy:) standby mgr node takes over, rebooted node becomse
standby
- `ceph orch host ls` shows all hosts as `offline`
- add a new host: not offline

In my setup, hostnames and IPs are well known, thus

# ceph orch host ls
HOST  ADDR LABELS  STATUS
lxbk0374  10.20.2.161  _admin  Offline
lxbk0375  10.20.2.162  Offline
lxbk0376  10.20.2.163  Offline
lxbk0377  10.20.2.164  Offline
lxbk0378  10.20.2.165  Offline
lxfs416   10.20.2.178  Offline
lxfs417   10.20.2.179  Offline
lxfs418   10.20.2.222  Offline
lxmds22   10.20.6.67
lxmds23   10.20.6.72   Offline
lxmds24   10.20.6.74   Offline


(All lxbk are mon nodes, the first 3 are mgr, 'lxmds22' was added after
the fatal reboot.)


Does this matter at all?
The old bug report is one year old, now with prio 'Low'. And some people
must have rebooted the one or
other host in their clusters...

There is a cephfs on our cluster, operations seem to be unaffected.


Cheers
Thomas

--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io





--

Thomas Roth
Department: Informationstechnologie
Location: SB3 2.291
Phone: +49-6159-71 1453  Fax: +49-6159-71 2986


GSI Helmholtzzentrum für Schwerionenforschung GmbH
Planckstraße 1, 64291 Darmstadt, Germany, www.gsi.de

Commercial Register / Handelsregister: Amtsgericht Darmstadt, HRB 1528
Managing Directors / Geschäftsführung:
Professor Dr. Paolo Giubellino, Dr. Ulrich Breuer, Jörg Blaurock
Chairman of the Supervisory Board / Vorsitzender des GSI-Aufsichtsrats:
State Secretary / Staatssekretär Dr. Volkmar Dietz

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Multiple subnet single cluster

2022-06-27 Thread Tahder Xunil
Hi,

Just confused about my setup as got issues with using the ceph-ansible as
getting an error in regards to the rados gateways. I supposedly will
implement one rgw per subnet (and got 3 public subnets 192.168.50.x/24,
192.168.100.x/24 and 192.168.150.x/24 which has 2 servers each with 16
OSDs) but the cluster network is common to the 3 subnets. It seems working
well the ceph cluster except stuck on the rados implementation.

Any insights what where to start with? Or the configurations to alter? Or
perhaps a new tool like cephadm.


Regards,
Mario
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Set device-class via service specification file

2022-06-27 Thread Robert Reihs
Hi,
We are setting up a test cluster with cephadm. We would like to
set different device classes for the osd's . Is there a possibility to set
this via the service specification yaml file. This is the configuration for
the osd service:

---
service_type: osd
service_id: osd_mon_disk_layout_fast
placement:
  hosts:
- fsn1-ceph-01
- fsn1-ceph-02
- fsn1-ceph-03
spec:
  data_devices:
paths:
  - /dev/vdb
  encrypted: true
  journal_devices:
paths:
  - /dev/vdc
  db_devices:
paths:
  - /dev/vdc

We would use this than in the crush rule. Or is there another way to set
this up?
Thanks
Best
Robert
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Set device-class via service specification file

2022-06-27 Thread David Orman
Hi Robert,

We had the same question and ended up creating a PR for this:
https://github.com/ceph/ceph/pull/46480 - there are backports, as well, so
I'd expect it will be in the next release or two.

David

On Mon, Jun 27, 2022 at 8:07 AM Robert Reihs  wrote:

> Hi,
> We are setting up a test cluster with cephadm. We would like to
> set different device classes for the osd's . Is there a possibility to set
> this via the service specification yaml file. This is the configuration for
> the osd service:
> 
> ---
> service_type: osd
> service_id: osd_mon_disk_layout_fast
> placement:
>   hosts:
> - fsn1-ceph-01
> - fsn1-ceph-02
> - fsn1-ceph-03
> spec:
>   data_devices:
> paths:
>   - /dev/vdb
>   encrypted: true
>   journal_devices:
> paths:
>   - /dev/vdc
>   db_devices:
> paths:
>   - /dev/vdc
>
> We would use this than in the crush rule. Or is there another way to set
> this up?
> Thanks
> Best
> Robert
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] runaway mon DB

2022-06-27 Thread Wyll Ingersoll

Running Ceph Pacific 16.2.7

We have a very large cluster with 3 monitors.  One of the monitor DBs is > 2x 
the size of the other 2 and is growing constantly (store.db fills up) and 
eventually fills up the /var partition on that server.  The monitor in question 
is not​ the leader.  The cluster itself is quite full but currently we cannot 
remove any data due to it's current mission requirements, so it is constantly 
in a state of rebalance and bumping up against the "toofull" limits.

How can we keep the monitor DB from growing so fast?
Why is it only on a secondary monitor not the primary?
Can we force a monitor to compact it's DB while the system is actively 
repairing ?

Thanks,
  Wyllys Ingersoll

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Conversion to Cephadm

2022-06-27 Thread Eugen Block

Hi,

there are some defaults for container images when used with cephadm.  
If you didn't change anything you probably get docker.io... when  
running:


ceph config dump | grep image
globalbasic container_image 
docker.io/ceph/ceph@sha256...


This is a pacific one-node test cluster. If you want to set it to  
quay.io you can change it like this:


# ceph config set global container_image quay.io/.../ceph-something


I successfully converted to cephadm after upgrading the cluster to octopus.
I am on CentOS 7 and am attempting to convert some of the nodes over to
rocky, but when I try to add a rocky node in and start the mgr or mon
service, it tries to start an octopus container and the service comes back
with an error.  Is there a way to force it to start a quincy container on
the new host?


Just to be clear, you upgraded to Octopus successfully, then tried to  
add new nodes with a newer OS and it tries to start an Octopus  
container, but that's expected, isn't it? Can you share more details  
which errors occur when you try to start Octopus containers?



Zitat von Brent Kennedy :


I successfully converted to cephadm after upgrading the cluster to octopus.
I am on CentOS 7 and am attempting to convert some of the nodes over to
rocky, but when I try to add a rocky node in and start the mgr or mon
service, it tries to start an octopus container and the service comes back
with an error.  Is there a way to force it to start a quincy container on
the new host?



I tried to start an upgrade, which did deploy the manager nodes to the new
hosts, but it failed converting the monitors and now one is dead ( a centos
7 one ).  It seems it can spin up quincy containers on the new nodes, but
because it failed upgrading, it still trying to deploy the octopus ones to
the new node.



Cephadm log on new node:



2022-06-25 21:51:34,427 7f4748727b80 DEBUG stat: Copying blob
sha256:7a0437f04f83f084b7ed68ad9c4a4947e12fc4e1b006b38129bac89114ec3621

2022-06-25 21:51:34,647 7f4748727b80 DEBUG stat: Copying blob
sha256:7a0437f04f83f084b7ed68ad9c4a4947e12fc4e1b006b38129bac89114ec3621

2022-06-25 21:51:34,652 7f4748727b80 DEBUG stat: Copying blob
sha256:731c3beff4deece7d4e54bc26ecf6d99988b19ea8414524277d83bc5a5d6eb70

2022-06-25 21:51:59,006 7f4748727b80 DEBUG stat: Copying config
sha256:2cf504fded3980c76b59a354fca8f301941f86e369215a08752874d1ddb69b73

2022-06-25 21:51:59,008 7f4748727b80 DEBUG stat: Writing manifest to image
destination

2022-06-25 21:51:59,008 7f4748727b80 DEBUG stat: Storing signatures

2022-06-25 21:51:59,239 7f4748727b80 DEBUG stat: 167 167

2022-06-25 21:51:59,703 7f4748727b80 DEBUG /usr/bin/ceph-mon: too many
arguments:
[--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true]

2022-06-25 21:51:59,797 7f4748727b80 INFO Non-zero exit code 1 from
/bin/podman run --rm --ipc=host --stop-signal=SIGTERM --net=host
--entrypoint /usr/bin/ceph-mon --init -e
CONTAINER_IMAGE=docker.io/ceph/ceph:v15 -e NODE_NAME=tpixmon5 -e
CEPH_USE_RANDOM_NONCE=1 -v
/var/log/ceph/33ca8009-79d6-45cf-a67e-9753ab4dc861:/var/log/ceph:z -v
/var/lib/ceph/33ca8009-79d6-45cf-a67e-9753ab4dc861/mon.tpixmon5:/var/lib/cep
h/mon/ceph-tpixmon5:z -v /tmp/ceph-tmp7xmra8lk:/tmp/keyring:z -v
/tmp/ceph-tmp7mid2k57:/tmp/config:z docker.io/ceph/ceph:v15 --mkfs -i
tpixmon5 --fsid 33ca8009-79d6-45cf-a67e-9753ab4dc861 -c /tmp/config
--keyring /tmp/keyring --setuser ceph --setgroup ceph
--default-log-to-file=false --default-log-to-journald=true
--default-log-to-stderr=false --default-mon-cluster-log-to-file=false
--default-mon-cluster-log-to-journald=true
--default-mon-cluster-log-to-stderr=false

2022-06-25 21:51:59,798 7f4748727b80 INFO /usr/bin/ceph-mon: stderr too many
arguments:
[--default-log-to-journald=true,--default-mon-cluster-log-to-journald=true]



Podman Images:

REPOSITORY   TAG IMAGE ID  CREATEDSIZE

quay.io/ceph/ceph  e1d6a67b021e  2 weeks ago1.32 GB

docker.io/ceph/ceph  v15 2cf504fded39  13 months ago  1.05 GB



I don't even know what that top one is because its not tagged and it keeping
pulling it.  Why would it be pulling a docker.io image ( only place to get
octopus images? )?



I also tried to force upgrade the older failed monitor but the cephadm tool
says that the OS is too old.  Its just odd to me that we would say go to
containers cause the OS wont matter and then it actually still matters cause
podman versions tied to newer images.



-Brent

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io




___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Curt
Hello,

I had already increased/changed those variables previously.  I increased
the pg_num to 128. Which increased the number of PG's backfilling, but
speed is still only at 30 MiB/s avg and has been backfilling 23 pg for the
last several hours.  Should I increase it higher than 128?

I'm still trying to figure out if this is just how ceph is or if there is a
bottleneck somewhere.  Like if I sftp a 10G file between servers it's done
in a couple min or less.  Am I thinking of this wrong?

Thanks,
Curt

On Mon, Jun 27, 2022 at 12:33 PM Frank Schilder  wrote:

> Hi Curt,
>
> as far as I understood, a 2+2 EC pool is recovering, which makes 1 OSD per
> host busy. My experience is, that the algorithm for selecting PGs to
> backfill/recover is not very smart. It could simply be that it doesn't find
> more PGs without violating some of these settings:
>
> osd_max_backfills
> osd_recovery_max_active
>
> I have never observed the second parameter to change anything (try any
> ways). However, the first one has a large impact. You could try increasing
> this slowly until recovery moves faster. Another parameter you might want
> to try is
>
> osd_recovery_sleep_[hdd|ssd]
>
> Be careful as this will impact client IO. I could reduce the sleep for my
> HDDs to 0.05. With your workload pattern, this might be something you can
> tune as well.
>
> Having said that, I think you should increase your PG count on the EC pool
> as soon as the cluster is healthy. You have only about 20 PGs per OSD and
> large PGs will take unnecessarily long to recover. A higher PG count will
> also make it easier for the scheduler to find PGs for recovery/backfill.
> Aim for a number between 100 and 200. Give the pool(s) with most data
> (#objects) the most PGs.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Curt 
> Sent: 24 June 2022 19:04
> To: Anthony D'Atri; ceph-users@ceph.io
> Subject: [ceph-users] Re: Ceph recovery network speed
>
> 2 PG's shouldn't take hours to backfill in my opinion.  Just 2TB enterprise
> HD's.
>
> Take this log entry below, 72 minutes and still backfilling undersized?
> Should it be that slow?
>
> pg 12.15 is stuck undersized for 72m, current state
> active+undersized+degraded+remapped+backfilling, last acting
> [34,10,29,NONE]
>
> Thanks,
> Curt
>
>
> On Fri, Jun 24, 2022 at 8:53 PM Anthony D'Atri 
> wrote:
>
> > Your recovery is slow *because* there are only 2 PGs backfilling.
> >
> > What kind of OSD media are you using?
> >
> > > On Jun 24, 2022, at 09:46, Curt  wrote:
> > >
> > > Hello,
> > >
> > > I'm trying to understand why my recovery is so slow with only 2 pg
> > > backfilling.  I'm only getting speeds of 3-4/MiB/s on a 10G network.  I
> > > have tested the speed between machines with a few tools and all confirm
> > 10G
> > > speed.  I've tried changing various settings of priority and recovery
> > sleep
> > > hdd, but still the same. Is this a configuration issue or something
> else?
> > >
> > > It's just a small cluster right now with 4 hosts, 11 osd's per.  Please
> > let
> > > me know if you need more information.
> > >
> > > Thanks,
> > > Curt
> > > ___
> > > ceph-users mailing list -- ceph-users@ceph.io
> > > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: bunch of " received unsolicited reservation grant from osd" messages in log

2022-06-27 Thread Neha Ojha
This issue should be addressed by https://github.com/ceph/ceph/pull/46860.

Thanks,
Neha

On Fri, Jun 24, 2022 at 2:53 AM Kenneth Waegeman
 wrote:
>
> Hi,
>
> I’ve updated the cluster to 17.2.0, but the log is still filled with these 
> entries:
>
> 2022-06-24T11:45:12.408944+02:00 osd031 ceph-osd[22024]: osd.508 pg_epoch: 
> 685367 pg[5.166s0( v 685201'4130317 (680710'4123328,685201'4130317] 
> local-lis/les=683269/683270 n=375262 ec=1104/1104 lis/c=683269/683269 
> les/c/f=683270/683270/19430 sis=683269) 
> [508,181,357,22,592,228,250,383,28,213,586]p508(0) r=0 lpr=683269 
> crt=685201'4130317 lcod 685201'4130316 mlcod 685201'4130316 active+clean 
> TIME_FOR_DEEP 
> ps=[591~1,5a3~1,5a5~1,5a7~1,5a9~1,5ab~1,5ad~1,5af~1,5b1~2,5bb~1,5c5~14,5ed~c,5fd~1f]]
>  scrubber : handle_scrub_reserve_grant: received unsolicited 
> reservation grant from osd 357(2) (0x55e16d92f600)
> 2022-06-24T11:45:12.412196+02:00 osd031 ceph-osd[22024]: osd.508 pg_epoch: 
> 685367 pg[5.166s0( v 685201'4130317 (680710'4123328,685201'4130317] 
> local-lis/les=683269/683270 n=375262 ec=1104/1104 lis/c=683269/683269 
> les/c/f=683270/683270/19430 sis=683269) 
> [508,181,357,22,592,228,250,383,28,213,586]p508(0) r=0 lpr=683269 
> crt=685201'4130317 lcod 685201'4130316 mlcod 685201'4130316 active+clean 
> TIME_FOR_DEEP 
> ps=[591~1,5a3~1,5a5~1,5a7~1,5a9~1,5ab~1,5ad~1,5af~1,5b1~2,5bb~1,5c5~14,5ed~c,5fd~1f]]
>  scrubber : handle_scrub_reserve_grant: received unsolicited 
> reservation grant from osd 586(10) (0x55e1b57354a0)
> 2022-06-24T11:45:12.417867+02:00 osd031 ceph-osd[21674]: osd.560 pg_epoch: 
> 685367 pg[5.6e2s0( v 685198'4133308 (680724'4126463,685198'4133308] 
> local-lis/les=675991/675992 n=375710 ec=1104/1104 lis/c=675991/675991 
> les/c/f=675992/675992/19430 sis=675991) 
> [560,259,440,156,324,358,338,218,191,335,256]p560(0) r=0 lpr=675991 
> crt=685198'4133308 lcod 685198'4133307 mlcod 685198'4133307 active+clean 
> TIME_FOR_DEEP 
> ps=[591~1,5a3~1,5a5~1,5a7~1,5a9~1,5ab~1,5ad~1,5af~1,5b1~2,5bb~1,5c5~14,5ed~c,5fd~1f]]
>  scrubber : handle_scrub_reserve_grant: received unsolicited 
> reservation grant from osd 259(1) (0x559a5f371080)
> 2022-06-24T11:45:12.453294+02:00 osd031 ceph-osd[22024]: osd.508 pg_epoch: 
> 685367 pg[5.166s0( v 685201'4130317 (680710'4123328,685201'4130317] 
> local-lis/les=683269/683270 n=375262 ec=1104/1104 lis/c=683269/683269 
> les/c/f=683270/683270/19430 sis=683269) 
> [508,181,357,22,592,228,250,383,28,213,586]p508(0) r=0 lpr=683269 
> crt=685201'4130317 lcod 685201'4130316 mlcod 685201'4130316 active+clean 
> TIME_FOR_DEEP 
> ps=[591~1,5a3~1,5a5~1,5a7~1,5a9~1,5ab~1,5ad~1,5af~1,5b1~2,5bb~1,5c5~14,5ed~c,5fd~1f]]
>  scrubber : handle_scrub_reserve_grant: received unsolicited 
> reservation grant from osd 213(9) (0x55e1a9e922c0)
>
> Is the bug still there, or is this something else?
>
> Thanks!!
>
> Kenneth
>
>
>
> On 19 Dec 2021, at 11:05, Ronen Friedman  wrote:
>
>
>
> On Sat, Dec 18, 2021 at 7:06 PM Ronen Friedman  wrote:
>>
>> Hi all,
>>
>> This was indeed a bug, which I've already fixed in 'master'.
>> I'll look for the backporting status tomorrow.
>>
>> Ronen
>>
>
> The fix is part of a larger change (which fixes a more severe issue). Pending 
> (non-trivial) backport.
> I'll try to speed this up.
>
> Ronen
>
>
>
>
>>
>> On Fri, Dec 17, 2021 at 1:49 PM Kenneth Waegeman  
>> wrote:
>>>
>>> Hi all,
>>>
>>> I'm also seeing these messages spamming the logs after update from
>>> octopus to pacific 16.2.7.
>>>
>>> Any clue yet what this means?
>>>
>>> Thanks!!
>>>
>>> Kenneth
>>>
>>> On 29/10/2021 22:21, Alexander Y. Fomichev wrote:
>>> > Hello.
>>> > After upgrading to 'pacific' I found log spammed by messages like this:
>>> > ... active+clean]  scrubber pg(46.7aas0) handle_scrub_reserve_grant:
>>> > received unsolicited reservation grant from osd 138(1) (0x560e77c51600)
>>> >
>>> > If I understand it correctly this is exactly what it looks, and this is 
>>> > not
>>> > good. Running with debug osd 1/5 don't help much  and google bring me
>>> > nothing and I stuck. Could anybody give a hint what's happening or where
>>> >   to dig.
>>> >
>>> ___
>>> ceph-users mailing list -- ceph-users@ceph.io
>>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>>
>

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Curt
On Mon, Jun 27, 2022 at 8:52 PM Frank Schilder  wrote:

> I think this is just how ceph is. Maybe you should post the output of
> "ceph status", "ceph osd pool stats" and "ceph df" so that we can get an
> idea whether what you look at is expected or not. As I wrote before, object
> recovery is throttled and the recovery bandwidth depends heavily on object
> size. The interesting question is, how many objects per second are
> recovered/rebalanced
>
 data:
pools:   11 pools, 369 pgs
objects: 2.45M objects, 9.2 TiB
usage:   20 TiB used, 60 TiB / 80 TiB avail
pgs: 512136/9729081 objects misplaced (5.264%)
 343 active+clean
 22  active+remapped+backfilling

  io:
client:   2.0 MiB/s rd, 344 KiB/s wr, 142 op/s rd, 69 op/s wr
recovery: 34 MiB/s, 8 objects/s

Pool 12 is the only one with any stats.

pool EC-22-Pool id 12
  510048/9545052 objects misplaced (5.344%)
  recovery io 36 MiB/s, 9 objects/s
  client io 1.8 MiB/s rd, 404 KiB/s wr, 86 op/s rd, 72 op/s wr

--- RAW STORAGE ---
CLASSSIZE   AVAILUSED  RAW USED  %RAW USED
hdd80 TiB  60 TiB  20 TiB20 TiB  25.45
TOTAL  80 TiB  60 TiB  20 TiB20 TiB  25.45

--- POOLS ---
POOLID  PGS   STORED  OBJECTS USED  %USED  MAX
AVAIL
.mgr 11  152 MiB   38  457 MiB  0
 9.2 TiB
21BadPool3   328 KiB1   12 KiB  0
18 TiB
.rgw.root4   32  1.3 KiB4   48 KiB  0
 9.2 TiB
default.rgw.log  5   32  3.6 KiB  209  408 KiB  0
 9.2 TiB
default.rgw.control  6   32  0 B8  0 B  0
 9.2 TiB
default.rgw.meta 78  6.7 KiB   20  203 KiB  0
 9.2 TiB
rbd_rep_pool 8   32  2.0 MiB5  5.9 MiB  0
 9.2 TiB
default.rgw.buckets.index98  2.0 MiB   33  5.9 MiB  0
 9.2 TiB
default.rgw.buckets.non-ec  10   32  1.4 KiB0  4.3 KiB  0
 9.2 TiB
default.rgw.buckets.data11   32  232 GiB   61.02k  697 GiB   2.41
 9.2 TiB
EC-22-Pool  12  128  9.8 TiB2.39M   20 TiB  41.55
14 TiB



> Maybe provide the output of the first two commands for
> osd_recovery_sleep_hdd=0.05 and osd_recovery_sleep_hdd=0.1 each (wait a bit
> after setting these and then collect the output). Include the applied
> values for osd_max_backfills* and osd_recovery_max_active* for one of the
> OSDs in the pool (ceph config show osd.ID | grep -e osd_max_backfills -e
> osd_recovery_max_active).
>

I didn't notice any speed difference with sleep values changed, but I'll
grab the stats between changes when I have a chance.

ceph config show osd.19 | egrep 'osd_max_backfills|osd_recovery_max_active'
osd_max_backfills1000


override  mon[5]
osd_recovery_max_active  1000


override
osd_recovery_max_active_hdd  1000


override  mon[5]
osd_recovery_max_active_ssd  1000


override

>
> I don't really know if on such a small cluster one can expect more than
> what you see. It has nothing to do with network speed if you have a 10G
> line. However, recovery is something completely different from a full
> link-speed copy.
>
> I can tell you that boatloads of tiny objects are a huge pain for
> recovery, even on SSD. Ceph doesn't raid up sections of disks against each
> other, but object for object. This might be a feature request: that PG
> space allocation and recovery should follow the model of LVM extends
> (ideally match with LVM extends) to allow recovery/rebalancing larger
> chunks of storage in one go, containing parts of a large or many small
> objects.
>
> Best regards,
> =
> Frank Schilder
> AIT Risø Campus
> Bygning 109, rum S14
>
> 
> From: Curt 
> Sent: 27 June 2022 17:35:19
> To: Frank Schilder
> Cc: ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: Ceph recovery network speed
>
> Hello,
>
> I had already increased/changed those variables previously.  I increased
> the pg_num to 128. Which increased the number of PG's backfilling, but
> speed is still only at 30 MiB/s avg and has been backfilling 23 pg for the
> last several hours.  Should I increase it higher than 128?
>
> I'm still trying to figure out if this is just how ceph is or if there is
> a bottleneck somewhere.  Like if I sftp a 10G file between servers it's
> done in a couple min or less.  Am I thinking of this wrong?
>
> Thanks,
> Curt
>
> On Mon, Jun 27, 2022 at 12:33 PM Frank Schilder  fr...@dtu.dk>> wrote:
> Hi Curt,
>
> as far as I understood, a 2+2 EC pool is recovering, which makes 1 OSD per
> host busy. My experience is, that the algorithm for selecting PGs to
> backfill/recover is not very smart. It could simply be that it doesn't find
> more PGs without 

[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Robert Gallop
I saw a major boost after having the sleep_hdd set to 0.  Only after that
did I start staying at around 500MiB to 1.2GiB/sec and 1.5k obj/sec to 2.5k
obj/sec.

Eventually it tapered back down, but for me sleep was the key, and
specifically in my case:

osd_recovery_sleep_hdd

On Mon, Jun 27, 2022 at 11:17 AM Curt  wrote:

> On Mon, Jun 27, 2022 at 8:52 PM Frank Schilder  wrote:
>
> > I think this is just how ceph is. Maybe you should post the output of
> > "ceph status", "ceph osd pool stats" and "ceph df" so that we can get an
> > idea whether what you look at is expected or not. As I wrote before,
> object
> > recovery is throttled and the recovery bandwidth depends heavily on
> object
> > size. The interesting question is, how many objects per second are
> > recovered/rebalanced
> >
>  data:
> pools:   11 pools, 369 pgs
> objects: 2.45M objects, 9.2 TiB
> usage:   20 TiB used, 60 TiB / 80 TiB avail
> pgs: 512136/9729081 objects misplaced (5.264%)
>  343 active+clean
>  22  active+remapped+backfilling
>
>   io:
> client:   2.0 MiB/s rd, 344 KiB/s wr, 142 op/s rd, 69 op/s wr
> recovery: 34 MiB/s, 8 objects/s
>
> Pool 12 is the only one with any stats.
>
> pool EC-22-Pool id 12
>   510048/9545052 objects misplaced (5.344%)
>   recovery io 36 MiB/s, 9 objects/s
>   client io 1.8 MiB/s rd, 404 KiB/s wr, 86 op/s rd, 72 op/s wr
>
> --- RAW STORAGE ---
> CLASSSIZE   AVAILUSED  RAW USED  %RAW USED
> hdd80 TiB  60 TiB  20 TiB20 TiB  25.45
> TOTAL  80 TiB  60 TiB  20 TiB20 TiB  25.45
>
> --- POOLS ---
> POOLID  PGS   STORED  OBJECTS USED  %USED  MAX
> AVAIL
> .mgr 11  152 MiB   38  457 MiB  0
>  9.2 TiB
> 21BadPool3   328 KiB1   12 KiB  0
> 18 TiB
> .rgw.root4   32  1.3 KiB4   48 KiB  0
>  9.2 TiB
> default.rgw.log  5   32  3.6 KiB  209  408 KiB  0
>  9.2 TiB
> default.rgw.control  6   32  0 B8  0 B  0
>  9.2 TiB
> default.rgw.meta 78  6.7 KiB   20  203 KiB  0
>  9.2 TiB
> rbd_rep_pool 8   32  2.0 MiB5  5.9 MiB  0
>  9.2 TiB
> default.rgw.buckets.index98  2.0 MiB   33  5.9 MiB  0
>  9.2 TiB
> default.rgw.buckets.non-ec  10   32  1.4 KiB0  4.3 KiB  0
>  9.2 TiB
> default.rgw.buckets.data11   32  232 GiB   61.02k  697 GiB   2.41
>  9.2 TiB
> EC-22-Pool  12  128  9.8 TiB2.39M   20 TiB  41.55
> 14 TiB
>
>
>
> > Maybe provide the output of the first two commands for
> > osd_recovery_sleep_hdd=0.05 and osd_recovery_sleep_hdd=0.1 each (wait a
> bit
> > after setting these and then collect the output). Include the applied
> > values for osd_max_backfills* and osd_recovery_max_active* for one of the
> > OSDs in the pool (ceph config show osd.ID | grep -e osd_max_backfills -e
> > osd_recovery_max_active).
> >
>
> I didn't notice any speed difference with sleep values changed, but I'll
> grab the stats between changes when I have a chance.
>
> ceph config show osd.19 | egrep 'osd_max_backfills|osd_recovery_max_active'
> osd_max_backfills1000
>
>
> override  mon[5]
> osd_recovery_max_active  1000
>
>
> override
> osd_recovery_max_active_hdd  1000
>
>
> override  mon[5]
> osd_recovery_max_active_ssd  1000
>
>
> override
>
> >
> > I don't really know if on such a small cluster one can expect more than
> > what you see. It has nothing to do with network speed if you have a 10G
> > line. However, recovery is something completely different from a full
> > link-speed copy.
> >
> > I can tell you that boatloads of tiny objects are a huge pain for
> > recovery, even on SSD. Ceph doesn't raid up sections of disks against
> each
> > other, but object for object. This might be a feature request: that PG
> > space allocation and recovery should follow the model of LVM extends
> > (ideally match with LVM extends) to allow recovery/rebalancing larger
> > chunks of storage in one go, containing parts of a large or many small
> > objects.
> >
> > Best regards,
> > =
> > Frank Schilder
> > AIT Risø Campus
> > Bygning 109, rum S14
> >
> > 
> > From: Curt 
> > Sent: 27 June 2022 17:35:19
> > To: Frank Schilder
> > Cc: ceph-users@ceph.io
> > Subject: Re: [ceph-users] Re: Ceph recovery network speed
> >
> > Hello,
> >
> > I had already increased/changed those variables previously.  I increased
> > the pg_num to 128. Which increased the number of PG's backfilling, but
> > speed is still only at 30 MiB/s avg and has been backfilling 23 pg for
> the
> > last several hours.  Should I increase it higher than 128?
> >
> 

[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Curt
I would love to see those types of speeds. I tried setting it all the way
to 0 and nothing, I did that before I sent the first email, maybe it was
your old post I got it from.

osd_recovery_sleep_hdd   0.00


override  (mon[0.00])

On Mon, Jun 27, 2022 at 9:27 PM Robert Gallop 
wrote:

> I saw a major boost after having the sleep_hdd set to 0.  Only after that
> did I start staying at around 500MiB to 1.2GiB/sec and 1.5k obj/sec to 2.5k
> obj/sec.
>
> Eventually it tapered back down, but for me sleep was the key, and
> specifically in my case:
>
> osd_recovery_sleep_hdd
>
> On Mon, Jun 27, 2022 at 11:17 AM Curt  wrote:
>
>> On Mon, Jun 27, 2022 at 8:52 PM Frank Schilder  wrote:
>>
>> > I think this is just how ceph is. Maybe you should post the output of
>> > "ceph status", "ceph osd pool stats" and "ceph df" so that we can get an
>> > idea whether what you look at is expected or not. As I wrote before,
>> object
>> > recovery is throttled and the recovery bandwidth depends heavily on
>> object
>> > size. The interesting question is, how many objects per second are
>> > recovered/rebalanced
>> >
>>  data:
>> pools:   11 pools, 369 pgs
>> objects: 2.45M objects, 9.2 TiB
>> usage:   20 TiB used, 60 TiB / 80 TiB avail
>> pgs: 512136/9729081 objects misplaced (5.264%)
>>  343 active+clean
>>  22  active+remapped+backfilling
>>
>>   io:
>> client:   2.0 MiB/s rd, 344 KiB/s wr, 142 op/s rd, 69 op/s wr
>> recovery: 34 MiB/s, 8 objects/s
>>
>> Pool 12 is the only one with any stats.
>>
>> pool EC-22-Pool id 12
>>   510048/9545052 objects misplaced (5.344%)
>>   recovery io 36 MiB/s, 9 objects/s
>>   client io 1.8 MiB/s rd, 404 KiB/s wr, 86 op/s rd, 72 op/s wr
>>
>> --- RAW STORAGE ---
>> CLASSSIZE   AVAILUSED  RAW USED  %RAW USED
>> hdd80 TiB  60 TiB  20 TiB20 TiB  25.45
>> TOTAL  80 TiB  60 TiB  20 TiB20 TiB  25.45
>>
>> --- POOLS ---
>> POOLID  PGS   STORED  OBJECTS USED  %USED  MAX
>> AVAIL
>> .mgr 11  152 MiB   38  457 MiB  0
>>  9.2 TiB
>> 21BadPool3   328 KiB1   12 KiB  0
>> 18 TiB
>> .rgw.root4   32  1.3 KiB4   48 KiB  0
>>  9.2 TiB
>> default.rgw.log  5   32  3.6 KiB  209  408 KiB  0
>>  9.2 TiB
>> default.rgw.control  6   32  0 B8  0 B  0
>>  9.2 TiB
>> default.rgw.meta 78  6.7 KiB   20  203 KiB  0
>>  9.2 TiB
>> rbd_rep_pool 8   32  2.0 MiB5  5.9 MiB  0
>>  9.2 TiB
>> default.rgw.buckets.index98  2.0 MiB   33  5.9 MiB  0
>>  9.2 TiB
>> default.rgw.buckets.non-ec  10   32  1.4 KiB0  4.3 KiB  0
>>  9.2 TiB
>> default.rgw.buckets.data11   32  232 GiB   61.02k  697 GiB   2.41
>>  9.2 TiB
>> EC-22-Pool  12  128  9.8 TiB2.39M   20 TiB  41.55
>> 14 TiB
>>
>>
>>
>> > Maybe provide the output of the first two commands for
>> > osd_recovery_sleep_hdd=0.05 and osd_recovery_sleep_hdd=0.1 each (wait a
>> bit
>> > after setting these and then collect the output). Include the applied
>> > values for osd_max_backfills* and osd_recovery_max_active* for one of
>> the
>> > OSDs in the pool (ceph config show osd.ID | grep -e osd_max_backfills -e
>> > osd_recovery_max_active).
>> >
>>
>> I didn't notice any speed difference with sleep values changed, but I'll
>> grab the stats between changes when I have a chance.
>>
>> ceph config show osd.19 | egrep
>> 'osd_max_backfills|osd_recovery_max_active'
>> osd_max_backfills1000
>>
>>
>> override  mon[5]
>> osd_recovery_max_active  1000
>>
>>
>> override
>> osd_recovery_max_active_hdd  1000
>>
>>
>> override  mon[5]
>> osd_recovery_max_active_ssd  1000
>>
>>
>> override
>>
>> >
>> > I don't really know if on such a small cluster one can expect more than
>> > what you see. It has nothing to do with network speed if you have a 10G
>> > line. However, recovery is something completely different from a full
>> > link-speed copy.
>> >
>> > I can tell you that boatloads of tiny objects are a huge pain for
>> > recovery, even on SSD. Ceph doesn't raid up sections of disks against
>> each
>> > other, but object for object. This might be a feature request: that PG
>> > space allocation and recovery should follow the model of LVM extends
>> > (ideally match with LVM extends) to allow recovery/rebalancing larger
>> > chunks of storage in one go, containing parts of a large or many small
>> > objects.
>> >
>> > Best regards,
>> > =
>> > Frank Schilder
>> > AIT Risø Campus
>> > Bygning 109, rum S14
>> >
>> > 

[ceph-users] calling ceph command from a crush_location_hook - fails to find sys.stdin.isatty()

2022-06-27 Thread Wyll Ingersoll


[ceph pacific 16.2.9]

I have a crush_location_hook script which is a small python3 script that 
figures out the correct root/chassis/host location for a particular OSD.  Our 
map has 2 roots, one for an all-SSD, and another for HDDs, thus the need for 
the location hook. Without it, the SSD devices end up in the wrong crush 
location.  Prior to 16.2.9 release, they weren't being used because of a bug 
that was causing the OSDs to crash with the hook.  Now that we've upgraded to 
16.2.9 we want to use our location hook script again, but it fails in a 
different way.

The script works correctly when testing it standalone with the right 
parameters, but when it is called by the OSD process, it fails because when the 
ceph command references 'sys.stdin.isatty()' (at line 538 in /usr/bin/ceph), it 
isn't found because sys.stdin is NoneType.  I suspect this is because of how 
the OSD spawns the crush hook script, which then forks the ceph command.  
Somehow python (3.8) is not initializing the stdin, stdout, stderr members in 
the 'sys' module object.

Looking for guidance on how to get my location hook script to successfully use 
the "ceph" command to get the output of "ceph osd tree --format json"

thanks,
   Wyllys Ingersoll


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Ceph recovery network speed

2022-06-27 Thread Curt
On Mon, Jun 27, 2022 at 11:08 PM Frank Schilder  wrote:

> Do you, by any chance have SMR drives? This may not be stated on the
> drive, check what the internet has to say. I also would have liked to see
> the beginning of the ceph status, number of hosts, number of OSDs, up in
> down whatever. Can you also send the result of ceph osd df tree?
>
 As far as I can tell none of the drives are SMR drives.
I did have some inconsistent pop up, scrubs are still running.

cluster:
id: 1684fe88-aae0-11ec-9593-df430e3982a0
health: HEALTH_ERR
10 scrub errors
Possible data damage: 4 pgs inconsistent

  services:
mon: 5 daemons, quorum cephmgr,cephmon1,cephmon2,cephmon3,cephmgr2 (age
8w)
mgr: cephmon1.fxtvtu(active, since 2d), standbys: cephmon2.wrzwwn,
cephmgr2.hzsrdo, cephmgr.bazebq
osd: 44 osds: 44 up (since 3d), 44 in (since 3d); 28 remapped pgs
rgw: 2 daemons active (2 hosts, 1 zones)

  data:
pools:   11 pools, 369 pgs
objects: 2.45M objects, 9.2 TiB
usage:   21 TiB used, 59 TiB / 80 TiB avail
pgs: 503944/9729081 objects misplaced (5.180%)
 337 active+clean
 28  active+remapped+backfilling
 4   active+clean+inconsistent

  io:
client:   1000 KiB/s rd, 717 KiB/s wr, 81 op/s rd, 57 op/s wr
recovery: 34 MiB/s, 8 objects/s

ID  CLASS  WEIGHTREWEIGHT  SIZE RAW USE   DATA  OMAP META
  AVAIL%USE   VAR   PGS  STATUS  TYPE NAME
-1 80.05347 -   80 TiB21 TiB21 TiB   32 MiB   69
GiB   59 TiB  26.23  1.00-  root default
-5 20.01337 -   20 TiB   5.3 TiB   5.3 TiB  1.4 MiB   19
GiB   15 TiB  26.47  1.01-  host hyperion01
 1hdd   1.81940   1.0  1.8 TiB   749 GiB   747 GiB  224 KiB  2.2
GiB  1.1 TiB  40.19  1.53   36  up  osd.1
 3hdd   1.81940   1.0  1.8 TiB   531 GiB   530 GiB3 KiB  1.9
GiB  1.3 TiB  28.52  1.09   31  up  osd.3
 5hdd   1.81940   1.0  1.8 TiB   167 GiB   166 GiB   36 KiB  1.2
GiB  1.7 TiB   8.98  0.34   18  up  osd.5
 7hdd   1.81940   1.0  1.8 TiB   318 GiB   316 GiB   83 KiB  1.2
GiB  1.5 TiB  17.04  0.65   26  up  osd.7
 9hdd   1.81940   1.0  1.8 TiB  1017 GiB  1014 GiB  139 KiB  2.6
GiB  846 GiB  54.59  2.08   38  up  osd.9
11hdd   1.81940   1.0  1.8 TiB   569 GiB   567 GiB4 KiB  2.1
GiB  1.3 TiB  30.56  1.17   29  up  osd.11
13hdd   1.81940   1.0  1.8 TiB   293 GiB   291 GiB  338 KiB  1.5
GiB  1.5 TiB  15.72  0.60   23  up  osd.13
15hdd   1.81940   1.0  1.8 TiB   368 GiB   366 GiB  641 KiB  1.6
GiB  1.5 TiB  19.74  0.75   23  up  osd.15
17hdd   1.81940   1.0  1.8 TiB   369 GiB   367 GiB2 KiB  1.5
GiB  1.5 TiB  19.80  0.75   26  up  osd.17
19hdd   1.81940   1.0  1.8 TiB   404 GiB   403 GiB7 KiB  1.1
GiB  1.4 TiB  21.69  0.83   31  up  osd.19
45hdd   1.81940   1.0  1.8 TiB   639 GiB   637 GiB2 KiB  2.0
GiB  1.2 TiB  34.30  1.31   32  up  osd.45
-3 20.01337 -   20 TiB   5.2 TiB   5.2 TiB  2.0 MiB   18
GiB   15 TiB  26.15  1.00-  host hyperion02
 0hdd   1.81940   1.0  1.8 TiB   606 GiB   604 GiB  302 KiB  2.0
GiB  1.2 TiB  32.52  1.24   33  up  osd.0
 2hdd   1.81940   1.0  1.8 TiB58 GiB58 GiB  112 KiB  249
MiB  1.8 TiB   3.14  0.12   14  up  osd.2
 4hdd   1.81940   1.0  1.8 TiB   254 GiB   252 GiB   14 KiB  1.6
GiB  1.6 TiB  13.63  0.52   28  up  osd.4
 6hdd   1.81940   1.0  1.8 TiB   574 GiB   572 GiB1 KiB  1.8
GiB  1.3 TiB  30.81  1.17   26  up  osd.6
 8hdd   1.81940   1.0  1.8 TiB   201 GiB   200 GiB  618 KiB  743
MiB  1.6 TiB  10.77  0.41   23  up  osd.8
10hdd   1.81940   1.0  1.8 TiB   628 GiB   626 GiB4 KiB  2.2
GiB  1.2 TiB  33.72  1.29   37  up  osd.10
12hdd   1.81940   1.0  1.8 TiB   355 GiB   353 GiB  361 KiB  1.2
GiB  1.5 TiB  19.03  0.73   30  up  osd.12
14hdd   1.81940   1.0  1.8 TiB   1.1 TiB   1.1 TiB1 KiB  2.7
GiB  708 GiB  62.00  2.36   38  up  osd.14
16hdd   1.81940   1.0  1.8 TiB   240 GiB   239 GiB4 KiB  1.2
GiB  1.6 TiB  12.90  0.49   20  up  osd.16
18hdd   1.81940   1.0  1.8 TiB   300 GiB   298 GiB  542 KiB  1.6
GiB  1.5 TiB  16.08  0.61   21  up  osd.18
32hdd   1.81940   1.0  1.8 TiB   989 GiB   986 GiB   45 KiB  2.7
GiB  874 GiB  53.09  2.02   36  up  osd.32
-7 20.01337 -   20 TiB   5.2 TiB   5.2 TiB  2.9 MiB   17
GiB   15 TiB  26.06  0.99-  host hyperion03
22hdd   1.81940   1.0  1.8 TiB   449 GiB   448 GiB  443 KiB  1.5
GiB  1.4 TiB  24.10  0.92   31  up  osd.22
23hdd   1.81940   1.0  1.8 TiB   299 GiB   298 GiB5 Ki