[ceph-users] pg wait too long when osd restart

2023-03-10 Thread yite gu
Hi all, osd_heartbeat_grace = 20 and osd_pool_default_read_lease_ratio = 0.8 by default, so, pg will wait 16s when osd restart in the worst case. This wait time is too long, client i/o can not be unacceptable. I think adjusting the osd_pool_default_read_lease_ratio to lower is a good way. Have any

[ceph-users] Re: pg wait too long when osd restart

2023-03-10 Thread Josh Baergen
Hello, When you say "osd restart", what sort of restart are you referring to - planned (e.g. for upgrades or maintenance) or unplanned (OSD hang/crash, host issue, etc.)? If it's the former, then these parameters shouldn't matter provided that you're running a recent enough Ceph with default setti

[ceph-users] Re: upgrading from 15.2.17 to 16.2.11 - Health ERROR

2023-03-10 Thread xadhoom76
I cannnot find anything interesting in the cephadm.log now the error is HEALTH_ERR Module 'cephadm' has failed: 'cephadm' Idea how to fix it ? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...

[ceph-users] Re: upgrading from 15.2.17 to 16.2.11 - Health ERROR

2023-03-10 Thread xadhoom76
I find out with ceph orch ps cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2 srvcephprod04 stopped4m ago - cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b

[ceph-users] Re: upgrading from 15.2.17 to 16.2.11 - Health ERROR

2023-03-10 Thread xadhoom76
looking at ceph orch upgrade check I find out }, "cephadm.8d0364fef6c92fc3580b0d022e32241348e6f11a7694d2b957cdafcb9d059ff2": { "current_id": null, "current_name": null, "current_version": null }, Could this lead to the issue?

[ceph-users] pg wait too long when osd restart

2023-03-10 Thread yite gu
Hi all, osd_heartbeat_grace = 20 and osd_pool_default_read_lease_ratio = 0.8 by default, so, pg will wait 16s when osd restart in the worst case. This wait time is too long, client i/o can not be unacceptable. I think adjusting the osd_pool_default_read_lease_ratio to lower is a good way. Have any

[ceph-users] CephFS thrashing through the page cache

2023-03-10 Thread Ashu Pachauri
We have an internal use case where we back the storage of a proprietary database by a shared file system. We noticed something very odd when testing some workload with a local block device backed file system vs cephfs. We noticed that the amount of network IO done by cephfs is almost double compare

[ceph-users] Re: CephFS thrashing through the page cache

2023-03-10 Thread Ashu Pachauri
Also, I am able to reproduce the network read amplification when I try to do very small reads from larger files. e.g. for i in $(seq 1 1); do dd if=test_${i} of=/dev/null bs=5k count=10 done This piece of code generates a network traffic of 3.3 GB while it actually reads approx 500 MB of d

[ceph-users] Re: upgrading from 15.2.17 to 16.2.11 - Health ERROR

2023-03-10 Thread Adam King
The things in "ceph orch ps" output are gathered by checking the contents of the /var/lib/ceph// directory on the host. Those "cephadm." files get deployed normally though, and aren't usually reported in "ceph orch ps" as it should only report things that are directories rather than files. You coul