[ceph-users] Re: Slow recovery and inaccurate recovery figures since Quincy upgrade

2023-10-03 Thread Sridhar Seshasayee
Hello Iain,


Does anyone have any ideas of what could be the issue here or anywhere we
> can check what is going on??
>
>
You could be hitting the slow backfill/recovery issue with
mclock_scheduler.
Could you please provide the output of the following commands?

1. ceph versions
2. ceph config get osd. osd_op_queue
3. ceph config show osd. | grep osd_max_backfills
4. ceph config show osd. | grep osd_recovery_max_active
5. ceph config show-with-defaults osd. | grep osd_mclock where 'id' can
be any valid osd id

With the mclock_scheduler enabled and with 17.2.5, it is not possible to
override
recovery settings like 'osd_max_backfills' and other recovery related
config options.

To improve the recovery rate, you can temporarily switch the mClock profile
to 'high_recovery_ops'
on all the OSDs by issuing:

ceph config set osd osd_mclock_profile high_recovery_ops

During recovery with this profile, you may notice a dip in the client ops
performance which is expected.
Once the recovery is done, you can switch the mClock profile back to the
default 'high_client_ops' profile.

Please note that the upcoming Quincy release will address the slow backfill
issues along with other
usability improvements.

-Sridhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] S3 user with more than 1000 buckets

2023-10-03 Thread Thomas Bennett
Hi,

I'm running a Ceph 17.2.5 Rados Gateway and I have a user with more than
1000 buckets.

When the client tries to list all their buckets using s3cmd, rclone and
python boto3, they all three only ever return the first 1000 bucket names.
I can confirm the buckets are all there (and more than 1000) by checking
with the radosgw-admin command.

Have I missed a pagination limit for listing user buckets in the rados
gateway?

Thanks,
Tom
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: S3 user with more than 1000 buckets

2023-10-03 Thread Jonas Nemeiksis
Hi,

You should increase these default settings:

rgw_list_buckets_max_chunk // for buckets
rgw_max_listing_results // for objects

On Tue, Oct 3, 2023 at 12:59 PM Thomas Bennett  wrote:

> Hi,
>
> I'm running a Ceph 17.2.5 Rados Gateway and I have a user with more than
> 1000 buckets.
>
> When the client tries to list all their buckets using s3cmd, rclone and
> python boto3, they all three only ever return the first 1000 bucket names.
> I can confirm the buckets are all there (and more than 1000) by checking
> with the radosgw-admin command.
>
> Have I missed a pagination limit for listing user buckets in the rados
> gateway?
>
> Thanks,
> Tom
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 
Jonas
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Performance drop and retransmits with CephFS

2023-10-03 Thread Tom Wezepoel
Hi all,

Have a question regarding CephFS and write performance. Possibly I am
overlooking a setting.

We recently started using Ceph, where we want to use CephFS as a shared
storage system for a Sync-and-Share solution.
Now we are still in a testing phase, where we are also mainly looking at
the performance of the system, where we are seeing some strange issues.
We are using Ceph Quincy release 17.2.6, with a replica 3 data policy
across 21 hosts spread across 3 locations.

When I write multiple files of 1G, the writing performance drops from
400MiB/s to 18 MiB/s with also multiple retries.
However, when I empty the page caches every minute on the client, the
performance remains good. But that's not really a solution of course.
Have already played a lot with the sysctl settings, like vm.dirty etc, but
it makes no difference at all.

When I enable the fuse_disable_pagecache, the write performance does stay
reasonable at 70MiB/s,
but the read performance completely collapses from 600 MiB/s to 40 MiB/s
There is no difference in behavior between the kernel or fuse client.

Have already played around with client_oc_max_dirty, client_oc_max_objects,
client_oc_size , etc. But haven't found the right setting.
Anyone familiar with this who can give me some hints?

Thanks for your help! :-)

Kind regards, Tom
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Quarterly (CQ) - Issue #2

2023-10-03 Thread Zac Dover
The second issue of "Ceph Quarterly" is attached to this email. Ceph Quarterly 
(or "CQ") is an overview of the past three months of upstream Ceph development. 
We provide CQ in three formats: A4, letter, and plain text wrapped at 80 
columns.

Two news items arrived after the deadline for typesetting this issue. They are 
included here:

Grace Hopper Open Source Day 2023:

- On 22 Sep 2023, Ceph participated in Grace Hopper Open Source Day, an all-day 
hackathon for women and nonbinary developers. Laura Flores led the Ceph 
division, and Yaarit Hatuka, Shreyansh Sancheti,and Aishwarya Mathuria 
participated as mentors. From 12pm EST to 7:30pm EST, Laura showed more than 40 
attendees how to run a Ceph vstart cluster in an Ubuntu Docker container. 
Yaarit, Shreyansh,and Aishwarya spent the day working one-on-one with 
attendees, helping them troubleshoot and work through a curated list of 
low-hanging-fruit issues. By the day's end, Grace Hopper attendees submitted 
eight pull requests. As of the publication of this sentence, two have been 
merged and the others are expected to be merged soon.

- For more information about GHC Open Source Day, see 
https://ghc.anitab.org/awards-programs/open-source-day/

Ceph partners with RCOS:

- Ceph has partnered for the first time with the Rensselaer Center for Open 
Source (RCOS), an organization at Rensselaer Polytechnic Institute that helps 
students jumpstart their careers in software by giving them the opportunity to 
work on various open source projects for class credit.

- Laura Flores, representing Ceph, is mentoring three RPI students on a project 
to improve the output of the `ceph balancer status` command.

- For more information about RCOS, see https://rcos.io/

Zac Dover
Upstream DocumentationCeph FoundationCeph Quarterly
October 2023

Summary of Developments in Q3
-

CephFS:

A non-blocking I/O API for libcephfs has been added to Ceph:
https://github.com/ceph/ceph/pull/48038

A cause of potential deadlock in Python libcephfs has been fixed, which also
affected the mgr modules using it: https://github.com/ceph/ceph/pull/52290

MDS: acquisition throttles have been adjusted to more sensible defaults:
https://github.com/ceph/ceph/pull/52577

MDS: Buggy clients are now evicted in order to keep MDS available:
https://github.com/ceph/ceph/pull/52944


Cephadm:

Support for init containers has been added. Init containers allow custom
actions to run before the daemon container starts:
https://github.com/ceph/ceph/pull/52178

We announce the deployment of the NVMe-oF gateway:
https://github.com/ceph/ceph/pull/50423,
https://github.com/ceph/ceph/pull/52691

LV devices are now reported by ceph-volume in the inventory list, and can be
prepared as OSDs: https://github.com/ceph/ceph/pull/52877

cephadm is now split into multiple files in order to make it easier for humans
to read and understand.  A new procedure has been added to the documentation
that describes how to acquire this new version of cephadm:
https://github.com/ceph/ceph/pull/53052
https://docs.ceph.com/en/latest/cephadm/install/#install-cephadm


Crimson:

Support for multicore has been added to Crimson:
https://github.com/ceph/ceph/pull/51147,
https://github.com/ceph/ceph/pull/51770,
https://github.com/ceph/ceph/pull/51916,
https://github.com/ceph/ceph/pull/52306

Infrastructure to support erasure coding has been added to Crimson:
https://github.com/ceph/ceph/pull/52211

We announce the introduction of a performance test suite for Crimson:
https://github.com/ceph/ceph/pull/50458


Dashboard:

RGW multisite configuration can now be imported from a secondary cluster or
exported to a secondary cluster: https://github.com/ceph/ceph/pull/50706

We accounce several upgrades to the Cluster User Interface and the Cluster API:
https://github.com/ceph/ceph/pull/52351,
https://github.com/ceph/ceph/pull/52395,
https://github.com/ceph/ceph/pull/52903,
https://github.com/ceph/ceph/pull/52919,
https://github.com/ceph/ceph/pull/5,
https://github.com/ceph/ceph/pull/53022

More detail has been added to the RGW overview. This includes more granular
information about daemons, zoning, buckets, sers, used capacity (the capacity
used by all the pools in the cluster). Cards detailing these assets have been
added to the rgw overview dashboard.: https://github.com/ceph/ceph/pull/52317,
https://github.com/ceph/ceph/pull/52405,
https://github.com/ceph/ceph/pull/52915

It is now possible to manage CephFS subvolumes from the dashboard. This
includes creating subvolumes, editing subvolumes, removing subvolumes, creating
subvolume groups, editing subvolume groups, removing subvolume groups, removing
subvolume groups with snapshots, and displaying subvolume groups in the CephFS
subvolume tab: https://github.com/ceph/ceph/pull/52786,
https://github.com/ceph/ceph/pull/52861,
https://github.com/ceph/ceph/pull/52869,
https://github.com/ceph/ceph/pull/52886,
https://github.com/ceph/ceph/pull/52898,
https://git

[ceph-users] Re: S3 user with more than 1000 buckets

2023-10-03 Thread Thomas Bennett
Hi Jonas,

Thanks :) that solved my issue.

It would seem to me that this is heading towards something that the clients
s3 should paginate, but I couldn't find any documentation on how to
paginate bucket listings. All the information points to paginating object
listing - which makes sense.

Just for competition of this thread:

The rgw parameters are found at: Quincy radosgw config ref


I ran the following command to update the parameter for all running rgw
daemons:
ceph config set client.rgw rgw_list_buckets_max_chunk 1

And then confirmed the running daemons were configured:
ceph daemon /var/run/ceph/ceph-client.rgw.xxx.xxx.asok config show | grep
rgw_list_buckets_max_chunk
"rgw_list_buckets_max_chunk": "1",

Kind regards,
Tom

On Tue, 3 Oct 2023 at 13:30, Jonas Nemeiksis  wrote:

> Hi,
>
> You should increase these default settings:
>
> rgw_list_buckets_max_chunk // for buckets
> rgw_max_listing_results // for objects
>
> On Tue, Oct 3, 2023 at 12:59 PM Thomas Bennett  wrote:
>
>> Hi,
>>
>> I'm running a Ceph 17.2.5 Rados Gateway and I have a user with more than
>> 1000 buckets.
>>
>> When the client tries to list all their buckets using s3cmd, rclone and
>> python boto3, they all three only ever return the first 1000 bucket names.
>> I can confirm the buckets are all there (and more than 1000) by checking
>> with the radosgw-admin command.
>>
>> Have I missed a pagination limit for listing user buckets in the rados
>> gateway?
>>
>> Thanks,
>> Tom
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
> --
> Jonas
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: S3 user with more than 1000 buckets

2023-10-03 Thread Janne Johansson
Den tis 3 okt. 2023 kl 11:59 skrev Thomas Bennett :

> Hi,
>
> I'm running a Ceph 17.2.5 Rados Gateway and I have a user with more than
> 1000 buckets.
>
> When the client tries to list all their buckets using s3cmd, rclone and
> python boto3, they all three only ever return the first 1000 bucket names.
> I can confirm the buckets are all there (and more than 1000) by checking
> with the radosgw-admin command.
>
> Have I missed a pagination limit for listing user buckets in the rados
> gateway?
>
>
There is/was this bug that made the list not tell clients that there are
more than 1000 buckets, so the clients would not ask for next list of
pagination:

https://tracker.ceph.com/issues/57901

For Quincy, it was in 17.2.6 so upgrading to that version would also fix it.

https://docs.ceph.com/en/latest/releases/quincy/
search for

   -

   rgw: Fix truncated ListBuckets response (pr#49525
   , Joshua Baergen)


-- 
May the most significant bit of your life be positive.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: S3 user with more than 1000 buckets

2023-10-03 Thread Casey Bodley
On Tue, Oct 3, 2023 at 9:06 AM Thomas Bennett  wrote:
>
> Hi Jonas,
>
> Thanks :) that solved my issue.
>
> It would seem to me that this is heading towards something that the clients
> s3 should paginate, but I couldn't find any documentation on how to
> paginate bucket listings.

the s3 ListBuckets API
(https://docs.aws.amazon.com/AmazonS3/latest/API/API_ListBuckets.html)
doesn't support pagination, so there's no way for clients to do that

but rgw itself should be able to paginate over the 'chunks' to return
more than rgw_list_buckets_max_chunk entries in a single ListBuckets
request. i opened a bug report for this at
https://tracker.ceph.com/issues/63080

> All the information points to paginating object
> listing - which makes sense.
>
> Just for competition of this thread:
>
> The rgw parameters are found at: Quincy radosgw config ref
> 
>
> I ran the following command to update the parameter for all running rgw
> daemons:
> ceph config set client.rgw rgw_list_buckets_max_chunk 1
>
> And then confirmed the running daemons were configured:
> ceph daemon /var/run/ceph/ceph-client.rgw.xxx.xxx.asok config show | grep
> rgw_list_buckets_max_chunk
> "rgw_list_buckets_max_chunk": "1",
>
> Kind regards,
> Tom
>
> On Tue, 3 Oct 2023 at 13:30, Jonas Nemeiksis  wrote:
>
> > Hi,
> >
> > You should increase these default settings:
> >
> > rgw_list_buckets_max_chunk // for buckets
> > rgw_max_listing_results // for objects
> >
> > On Tue, Oct 3, 2023 at 12:59 PM Thomas Bennett  wrote:
> >
> >> Hi,
> >>
> >> I'm running a Ceph 17.2.5 Rados Gateway and I have a user with more than
> >> 1000 buckets.
> >>
> >> When the client tries to list all their buckets using s3cmd, rclone and
> >> python boto3, they all three only ever return the first 1000 bucket names.
> >> I can confirm the buckets are all there (and more than 1000) by checking
> >> with the radosgw-admin command.
> >>
> >> Have I missed a pagination limit for listing user buckets in the rados
> >> gateway?
> >>
> >> Thanks,
> >> Tom
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >
> >
> > --
> > Jonas
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: S3 user with more than 1000 buckets

2023-10-03 Thread Matt Benjamin
Hi Thomas,

If I'm not mistaken, the RGW will paginate ListBuckets essentially like
ListObjectsv1 if the S3 client provides the appropriate "marker" parameter
values.  COS does this too, I noticed.  I'm not sure which S3 clients can
be relied on to do this, though.

Matt

On Tue, Oct 3, 2023 at 9:06 AM Thomas Bennett  wrote:

> Hi Jonas,
>
> Thanks :) that solved my issue.
>
> It would seem to me that this is heading towards something that the clients
> s3 should paginate, but I couldn't find any documentation on how to
> paginate bucket listings. All the information points to paginating object
> listing - which makes sense.
>
> Just for competition of this thread:
>
> The rgw parameters are found at: Quincy radosgw config ref
> 
>
> I ran the following command to update the parameter for all running rgw
> daemons:
> ceph config set client.rgw rgw_list_buckets_max_chunk 1
>
> And then confirmed the running daemons were configured:
> ceph daemon /var/run/ceph/ceph-client.rgw.xxx.xxx.asok config show | grep
> rgw_list_buckets_max_chunk
> "rgw_list_buckets_max_chunk": "1",
>
> Kind regards,
> Tom
>
> On Tue, 3 Oct 2023 at 13:30, Jonas Nemeiksis  wrote:
>
> > Hi,
> >
> > You should increase these default settings:
> >
> > rgw_list_buckets_max_chunk // for buckets
> > rgw_max_listing_results // for objects
> >
> > On Tue, Oct 3, 2023 at 12:59 PM Thomas Bennett  wrote:
> >
> >> Hi,
> >>
> >> I'm running a Ceph 17.2.5 Rados Gateway and I have a user with more than
> >> 1000 buckets.
> >>
> >> When the client tries to list all their buckets using s3cmd, rclone and
> >> python boto3, they all three only ever return the first 1000 bucket
> names.
> >> I can confirm the buckets are all there (and more than 1000) by checking
> >> with the radosgw-admin command.
> >>
> >> Have I missed a pagination limit for listing user buckets in the rados
> >> gateway?
> >>
> >> Thanks,
> >> Tom
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >>
> >
> >
> > --
> > Jonas
> >
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>


-- 

Matt Benjamin
Red Hat, Inc.
315 West Huron Street, Suite 140A
Ann Arbor, Michigan 48103

http://www.redhat.com/en/technologies/storage

tel.  734-821-5101
fax.  734-769-8938
cel.  734-216-5309
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow recovery and inaccurate recovery figures since Quincy upgrade

2023-10-03 Thread Iain Stott
Hi Sridhar,

Thanks for the response, I have added the output you requested below, I have 
attached the output from the last command in a file as it was rather long. We 
did try to set high_recovery_ops but it didn't seem to have any visible effect.

root@gb4-li-cephgw-001 ~ # ceph versions
{
"mon": {
"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)": 3
},
"mgr": {
"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)": 3
},
"osd": {
"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)": 72
},
"mds": {},
"rgw": {
"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)": 3
},
"overall": {
"ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5) quincy 
(stable)": 81
}
}

root@gb4-li-cephgw-001 ~ # ceph config get osd.0 osd_op_queue
mclock_scheduler

root@gb4-li-cephgw-001 ~ # ceph config show osd.0 | grep osd_max_backfills
osd_max_backfills3  

  override  (mon[3]),default[10]

root@gb4-li-cephgw-001 ~ # ceph config show osd.0 | grep osd_recovery_max_active
osd_recovery_max_active  9  

  override  (mon[9]),default[0]
osd_recovery_max_active_hdd  10 

  default
osd_recovery_max_active_ssd  20 

  default

Thanks
Iain

From: Sridhar Seshasayee 
Sent: 03 October 2023 09:07
To: Iain Stott 
Cc: ceph-users@ceph.io ; dl-osadmins 

Subject: Re: [ceph-users] Slow recovery and inaccurate recovery figures since 
Quincy upgrade



CAUTION: This email originates from outside THG


Hello Iain,


Does anyone have any ideas of what could be the issue here or anywhere we can 
check what is going on??


You could be hitting the slow backfill/recovery issue with mclock_scheduler.
Could you please provide the output of the following commands?

1. ceph versions
2. ceph config get osd. osd_op_queue
3. ceph config show osd. | grep osd_max_backfills
4. ceph config show osd. | grep osd_recovery_max_active
5. ceph config show-with-defaults osd. | grep osd_mclock where 'id' can be 
any valid osd id

With the mclock_scheduler enabled and with 17.2.5, it is not possible to 
override
recovery settings like 'osd_max_backfills' and other recovery related config 
options.

To improve the recovery rate, you can temporarily switch the mClock profile to 
'high_recovery_ops'
on all the OSDs by issuing:

ceph config set osd osd_mclock_profile high_recovery_ops

During recovery with this profile, you may notice a dip in the client ops 
performance which is expected.
Once the recovery is done, you can switch the mClock profile back to the 
default 'high_client_ops' profile.

Please note that the upcoming Quincy release will address the slow backfill 
issues along with other
usability improvements.

-Sridhar

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] is the rbd mirror journal replayed on primary after a crash?

2023-10-03 Thread Scheurer François
Hello



Short question regarding journal-based rbd mirroring.


▪IO path with journaling w/o cache:

a. Create an event to describe the update
b. Asynchronously append event to journal object
c. Asynchronously update image once event is safe
d. Complete IO to client once update is safe


[cf. 
https://events.static.linuxfound.org/sites/events/files/slides/Disaster%20Recovery%20and%20Ceph%20Block%20Storage-%20Introducing%20Multi-Site%20Mirroring_0.pdf]


If a client crashes between b. and c., is there a mechanism to replay the IO 
from the journal on the primary image?

If not, then the primary and secondary images would get out-of-sync (because of 
the extra write(s) on secondary) and subsequent writes to the primary would 
corrupt the secondary. Is that correct?



Cheers

Francois Scheurer




--


EveryWare AG
François Scheurer
Senior Systems Engineer
Zurlindenstrasse 52a
CH-8003 Zürich

tel: +41 44 466 60 00
fax: +41 44 466 60 10
mail: francois.scheu...@everyware.ch
web: http://www.everyware.ch


smime.p7s
Description: S/MIME cryptographic signature
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Impacts on doubling the size of pgs in a rbd pool?

2023-10-03 Thread Hervé Ballans

Hi all,

Sorry for the reminder, but does anyone have any advice on how to deal 
with this?


Many thanks!
Hervé

Le 29/09/2023 à 11:34, Hervé Ballans a écrit :

Hi all,

I have a Ceph cluster on Quincy (17.2.6), with 3 pools (1 rbd and 1 
CephFS volume), each configured with 3 replicas.


$ sudo ceph osd pool ls detail
pool 7 'cephfs_data_home' replicated size 3 min_size 2 crush_rule 1 
object_hash rjenkins pg_num 512 pgp_num 512 autoscale_mode on 
last_change 6287147 lfor 0/5364613/5364611 flags hashpspool 
stripe_width 0 application cephfs
pool 8 'cephfs_metadata_home' replicated size 3 min_size 2 crush_rule 
3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on 
last_change 641 lfor 0/641/639 flags hashpspool 
stripe_width 0 application cephfs
pool 9 'rbd_backup_vms' replicated size 3 min_size 2 crush_rule 2 
object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on 
last_change 6365131 lfor 0/211948/249421 flags 
hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 10 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 1 pgp_num 1 autoscale_mode warn last_change 6365131 
flags hashpspool stripe_width 0 pg_num_min 1 application 
mgr,mgr_devicehealth


$ sudo ceph df
--- RAW STORAGE ---
CLASS SIZE    AVAIL USED  RAW USED  %RAW USED
hdd    306 TiB  186 TiB  119 TiB   119 TiB  39.00
nvme   4.4 TiB  4.3 TiB  118 GiB   118 GiB   2.63
TOTAL  310 TiB  191 TiB  119 TiB   119 TiB  38.49

--- POOLS ---
POOL  ID   PGS  STORED  OBJECTS    USED  %USED MAX AVAIL
cephfs_data_home   7   512  12 TiB   28.86M  12 TiB 12.85 27 TiB
cephfs_metadata_home   8    32  33 GiB    3.63M  33 GiB 0.79 1.3 TiB
rbd_backup_vms 9  1024  24 TiB    6.42M  24 TiB 58.65 5.6 TiB
.mgr  10 1  35 MiB    9  35 MiB 0 12 TiB

I am going to extend the rbd pool (rbd_backup_vms), currently used at 
60%.
This pool contains 60 disks, i.e. 20 disks by rack in the crushmap. 
This pool is used for storing VM disk images (available to a separate 
ProxmoxVE cluster)


For this purpose, I am going to add 42 disks of the same size as those 
currently in the pool, i.e. 14 additional disks on each rack.


Currently, this pool is configured with 1024 pgs.
Before this operation, I would like to extend the number of pgs, let's 
say 2048 (i.e. double).


I wonder about the overall impact of this change on the cluster. I 
guess that the heavy moves in the pgs will have a strong impact 
regarding the iops?


I have two questions:

1) Is it useful to make this modification before adding the new OSDs? 
(I'm afraid of warnings about full or nearfull pgs if not)


2) are there any configuration recommendations in order to minimize 
these anticipated impacts?


Thank you!

Cheers,
Hervé
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Impacts on doubling the size of pgs in a rbd pool?

2023-10-03 Thread Michel Jouvin

Hi Herve,

Why you don't use the automatic adjustment of the number of PGs. This 
makes life much easier and works well.


Cheers,

Michel

Le 03/10/2023 à 17:06, Hervé Ballans a écrit :

Hi all,

Sorry for the reminder, but does anyone have any advice on how to deal 
with this?


Many thanks!
Hervé

Le 29/09/2023 à 11:34, Hervé Ballans a écrit :

Hi all,

I have a Ceph cluster on Quincy (17.2.6), with 3 pools (1 rbd and 1 
CephFS volume), each configured with 3 replicas.


$ sudo ceph osd pool ls detail
pool 7 'cephfs_data_home' replicated size 3 min_size 2 crush_rule 1 
object_hash rjenkins pg_num 512 pgp_num 512 autoscale_mode on 
last_change 6287147 lfor 0/5364613/5364611 flags hashpspool 
stripe_width 0 application cephfs
pool 8 'cephfs_metadata_home' replicated size 3 min_size 2 crush_rule 
3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on 
last_change 641 lfor 0/641/639 flags hashpspool 
stripe_width 0 application cephfs
pool 9 'rbd_backup_vms' replicated size 3 min_size 2 crush_rule 2 
object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on 
last_change 6365131 lfor 0/211948/249421 flags 
hashpspool,selfmanaged_snaps stripe_width 0 application rbd
pool 10 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash 
rjenkins pg_num 1 pgp_num 1 autoscale_mode warn last_change 6365131 
flags hashpspool stripe_width 0 pg_num_min 1 application 
mgr,mgr_devicehealth


$ sudo ceph df
--- RAW STORAGE ---
CLASS SIZE    AVAIL USED  RAW USED  %RAW USED
hdd    306 TiB  186 TiB  119 TiB   119 TiB  39.00
nvme   4.4 TiB  4.3 TiB  118 GiB   118 GiB   2.63
TOTAL  310 TiB  191 TiB  119 TiB   119 TiB  38.49

--- POOLS ---
POOL  ID   PGS  STORED  OBJECTS    USED  %USED MAX AVAIL
cephfs_data_home   7   512  12 TiB   28.86M  12 TiB 12.85 27 TiB
cephfs_metadata_home   8    32  33 GiB    3.63M  33 GiB 0.79 1.3 TiB
rbd_backup_vms 9  1024  24 TiB    6.42M  24 TiB 58.65 5.6 TiB
.mgr  10 1  35 MiB    9  35 MiB 0 12 TiB

I am going to extend the rbd pool (rbd_backup_vms), currently used at 
60%.
This pool contains 60 disks, i.e. 20 disks by rack in the crushmap. 
This pool is used for storing VM disk images (available to a separate 
ProxmoxVE cluster)


For this purpose, I am going to add 42 disks of the same size as 
those currently in the pool, i.e. 14 additional disks on each rack.


Currently, this pool is configured with 1024 pgs.
Before this operation, I would like to extend the number of pgs, 
let's say 2048 (i.e. double).


I wonder about the overall impact of this change on the cluster. I 
guess that the heavy moves in the pgs will have a strong impact 
regarding the iops?


I have two questions:

1) Is it useful to make this modification before adding the new OSDs? 
(I'm afraid of warnings about full or nearfull pgs if not)


2) are there any configuration recommendations in order to minimize 
these anticipated impacts?


Thank you!

Cheers,
Hervé
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs health warn

2023-10-03 Thread Ben
Yes, I am. 8 active + 2 standby, no subtree pinning. What if I restart the
mds with trimming issues? Trying to figure out what happens with restarting.

Venky Shankar  于2023年10月3日周二 12:39写道:

> Hi Ben,
>
> Are you using multimds without subtree pinning?
>
> On Tue, Oct 3, 2023 at 10:00 AM Ben  wrote:
> >
> > Dear cephers:
> > more log captures(see below) show the full segments list(more than 3
> to
> > be trimmed stuck, growing over time). any ideas to get out of this?
> >
> > Thanks,
> > Ben
> >
> >
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expiring segment 195341004/893374309813, 180 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expired segment 195341184/893386318445, 145 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expired segment 195341329/893386757388, 1024 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expiring segment 195342353/893388361174, 1024 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expiring segment 195343377/893389870480, 790 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expiring segment 195344167/893390955408, 1024 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expiring segment 195345191/893392321470, 1024 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expiring segment 195346215/893393752928, 1024 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expired segment 195347239/893395131457, 2 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expiring segment 195347241/893395212055, 1024 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expiring segment 195348265/893396582755, 1024 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expiring segment 195349289/893398132191, 860 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expiring segment 195350149/893399338619, 42 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expired segment 195350192/893408004655, 33 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expired segment 195350226/893412331017, 23 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expired segment 195350249/893416563419, 20 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expired segment 195350269/893420702085, 244 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expired segment 195350513/893424694857, 74 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expired segment 195350587/893428947395, 843 events
> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
> > expired segment 195351430/893432893900, 1019 events
> > .
> > . (all expired items abbreviated)
> > .
> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim already
> > expired segment 216605661/827226016068, 100 events
> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim already
> > expired segment 216605761/827230263164, 153 events
> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim already
> > expired segment 216605914/827234408294, 35 events
> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim already
> > expired segment 216605949/827238527911, 1024 events
> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim already
> > expired segment 216606973/827241813316, 344 events
> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim already
> > expired segment 216607317/827242580233, 1024 events
> > debug 2023-09-30T15:10:56.521+ 7fc752167700 6 mds.3.journal
> LogSegment(
> > 216608341/827244781542).try_to_expire
> > debug 2023-09-30T15:10:56.521+ 7fc752167700 4 mds.3.sessionmap
> > save_if_dirty: writing 0
> > debug 2023-09-30T15:10:56.521+ 7fc752167700 6 mds.3.journal
> LogSegment(
> > 216608341/827244781542).try_to_expire success
> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log _expired
> > segment 216608341/827244781542, 717 events
> >
> > Ben  于2023年9月27日周三 23:53写道:
> >
> > > some further investigation about three mds with trimming behind
> problem:
> > > logs captured over two days show that, some log segments are stuck in
> > > trimming process. It looks like a bug with trimming log segment? Any
> > > thoughts?
> > > ==log capture
> > >
> > > 9/26:
> > >
> > > debug 2023-09-26T16:50:59.004+ 7fc74d95e700 10 mds.3.log
> > > _trim_expired_segments waiting for 197465903/720757956586 to expire
> > >
> > >
> > > debug 20

[ceph-users] Re: S3 user with more than 1000 buckets

2023-10-03 Thread Thomas Bennett
Thanks for all the responses, much appreciated.

Upping the chunk size fixes my problem in the short term but I upgrade to
17.2.6 :)

Kind regards,
Tom

On Tue, 3 Oct 2023 at 15:28, Matt Benjamin  wrote:

> Hi Thomas,
>
> If I'm not mistaken, the RGW will paginate ListBuckets essentially like
> ListObjectsv1 if the S3 client provides the appropriate "marker" parameter
> values.  COS does this too, I noticed.  I'm not sure which S3 clients can
> be relied on to do this, though.
>
> Matt
>
> On Tue, Oct 3, 2023 at 9:06 AM Thomas Bennett  wrote:
>
>> Hi Jonas,
>>
>> Thanks :) that solved my issue.
>>
>> It would seem to me that this is heading towards something that the
>> clients
>> s3 should paginate, but I couldn't find any documentation on how to
>> paginate bucket listings. All the information points to paginating object
>> listing - which makes sense.
>>
>> Just for competition of this thread:
>>
>> The rgw parameters are found at: Quincy radosgw config ref
>> 
>>
>> I ran the following command to update the parameter for all running rgw
>> daemons:
>> ceph config set client.rgw rgw_list_buckets_max_chunk 1
>>
>> And then confirmed the running daemons were configured:
>> ceph daemon /var/run/ceph/ceph-client.rgw.xxx.xxx.asok config show | grep
>> rgw_list_buckets_max_chunk
>> "rgw_list_buckets_max_chunk": "1",
>>
>> Kind regards,
>> Tom
>>
>> On Tue, 3 Oct 2023 at 13:30, Jonas Nemeiksis 
>> wrote:
>>
>> > Hi,
>> >
>> > You should increase these default settings:
>> >
>> > rgw_list_buckets_max_chunk // for buckets
>> > rgw_max_listing_results // for objects
>> >
>> > On Tue, Oct 3, 2023 at 12:59 PM Thomas Bennett  wrote:
>> >
>> >> Hi,
>> >>
>> >> I'm running a Ceph 17.2.5 Rados Gateway and I have a user with more
>> than
>> >> 1000 buckets.
>> >>
>> >> When the client tries to list all their buckets using s3cmd, rclone and
>> >> python boto3, they all three only ever return the first 1000 bucket
>> names.
>> >> I can confirm the buckets are all there (and more than 1000) by
>> checking
>> >> with the radosgw-admin command.
>> >>
>> >> Have I missed a pagination limit for listing user buckets in the rados
>> >> gateway?
>> >>
>> >> Thanks,
>> >> Tom
>> >> ___
>> >> ceph-users mailing list -- ceph-users@ceph.io
>> >> To unsubscribe send an email to ceph-users-le...@ceph.io
>> >>
>> >
>> >
>> > --
>> > Jonas
>> >
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
>>
>
>
> --
>
> Matt Benjamin
> Red Hat, Inc.
> 315 West Huron Street, Suite 140A
> Ann Arbor, Michigan 48103
>
> http://www.redhat.com/en/technologies/storage
>
> tel.  734-821-5101
> fax.  734-769-8938
> cel.  734-216-5309
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: rgw: disallowing bucket creation for specific users?

2023-10-03 Thread Matthias Ferdinand
On Sun, Oct 01, 2023 at 12:00:58PM +0200, Peter Goron wrote:
> Hi Matthias,
> 
> One possible way to achieve your need is to set a quota on number of
> buckets  at user level (see
> https://docs.ceph.com/en/reef/radosgw/admin/#quota-management). Quotas are
> under admin control.

thanks a lot, rather an elegant solution.

Matthias

> 
> Rgds,
> Peter
> 
> 
> Le dim. 1 oct. 2023, 10:51, Matthias Ferdinand  a
> écrit :
> 
> > Hi,
> >
> > I am still evaluating ceph rgw for specific use cases.
> >
> > My question is about keeping the realm of bucket names under control of
> > rgw admins.
> >
> > Normal S3 users have the ability to create new buckets as they see fit.
> > This opens opportunities for creating excessive amounts of buckets, or
> > for blocking nice bucket names for other uses, or even using
> > bucketname-typosquatting as an attack vector.
> >
> > In AWS, I can create some IAM users and provide per-bucket access to
> > them via bucket or IAM user policies. These IAM users can't create new
> > buckets on their own. Giving out only those IAM credentials to users and
> > applications, I can ensure no bucket namespace pollution occurs.
> >
> > Ceph rgw does not have IAM users (yet?). What could I use here to not
> > allow certain S3 users to create buckets on their own?
> >
> >
> > Regards
> > Matthias
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> >
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph osd down doesn't seem to work

2023-10-03 Thread Simon Oosthoek

Hi

I'm trying to mark one OSD as down, so we can clean it out and replace 
it. It keeps getting medium read errors, so it's bound to fail sooner 
rather than later. When I command ceph from the mon to mark the osd 
down, it doesn't actually do it. When the service on the osd stops, it 
is also marked out and I'm thinking (but perhaps incorrectly?) that it 
would be good to keep the OSD down+in, to try to read from it as long as 
possible. Why doesn't it get marked down and stay that way when I 
command it?


Context: Our cluster is in a bit of a less optimal state (see below), 
this is after one of OSD nodes had failed and took a week to get back up 
(long story). Due to a seriously unbalanced filling of our OSDs we kept 
having to reweight OSDs to keep below the 85% threshold. Several disks 
are starting to fail now (they're 4+ years old and failures are expected 
to occur more frequently).


I'm open to suggestions to help get us back to health_ok more quickly, 
but I think we'll get there eventually anyway...


Cheers

/Simon



# ceph -s
  cluster:
health: HEALTH_ERR
1 clients failing to respond to cache pressure
1/843763422 objects unfound (0.000%)
noout flag(s) set
14 scrub errors
Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
Degraded data redundancy: 13795525/7095598195 objects 
degraded (0.194%), 13 pgs degraded, 12 pgs undersized

70 pgs not deep-scrubbed in time
65 pgs not scrubbed in time

  services:
mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 11h)
mgr: cephmon3(active, since 35h), standbys: cephmon1
mds: 1/1 daemons up, 1 standby
osd: 264 osds: 264 up (since 2m), 264 in (since 75m); 227 remapped pgs
 flags noout
rgw: 8 daemons active (4 hosts, 1 zones)

  data:
volumes: 1/1 healthy
pools:   15 pools, 3681 pgs
objects: 843.76M objects, 1.2 PiB
usage:   2.0 PiB used, 847 TiB / 2.8 PiB avail
pgs: 13795525/7095598195 objects degraded (0.194%)
 54839263/7095598195 objects misplaced (0.773%)
 1/843763422 objects unfound (0.000%)
 3374 active+clean
 195  active+remapped+backfill_wait
 65   active+clean+scrubbing+deep
 20   active+remapped+backfilling
 11   active+clean+snaptrim
 10   active+undersized+degraded+remapped+backfill_wait
 2active+undersized+degraded+remapped+backfilling
 2active+clean+scrubbing
 1active+recovery_unfound+degraded
 1active+clean+inconsistent

  progress:
Global Recovery Event (8h)
  [==..] (remaining: 2h)
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Impacts on doubling the size of pgs in a rbd pool?

2023-10-03 Thread David C.
 Hi,

Michel,

the pool already appears to be in automatic autoscale ("autoscale_mode on").
If you're worried (if, for example, the platform is having trouble handling
a large data shift) then you can set the parameter to warn (like the
rjenkis pool).

If not, as Hervé says, the transition to 2048 pg will be smoother if it's
automatic.

To answer your questions:

1/ There's not much point in doing it before adding the OSDs. In any case,
there will be a significant but gradual replacement of the data. Even if
it's unlikely to see nearfull with the data you've notified.

2/ The recommendation would be to leave the default settings (pg autoscale,
osd_max_backfills, recovery, ...). If there really is a concern, then leave
it at 1024 and set autoscale_mode to warn.


Le mar. 3 oct. 2023 à 17:13, Michel Jouvin 
a écrit :

> Hi Herve,
>
> Why you don't use the automatic adjustment of the number of PGs. This
> makes life much easier and works well.
>
> Cheers,
>
> Michel
>
> Le 03/10/2023 à 17:06, Hervé Ballans a écrit :
> > Hi all,
> >
> > Sorry for the reminder, but does anyone have any advice on how to deal
> > with this?
> >
> > Many thanks!
> > Hervé
> >
> > Le 29/09/2023 à 11:34, Hervé Ballans a écrit :
> >> Hi all,
> >>
> >> I have a Ceph cluster on Quincy (17.2.6), with 3 pools (1 rbd and 1
> >> CephFS volume), each configured with 3 replicas.
> >>
> >> $ sudo ceph osd pool ls detail
> >> pool 7 'cephfs_data_home' replicated size 3 min_size 2 crush_rule 1
> >> object_hash rjenkins pg_num 512 pgp_num 512 autoscale_mode on
> >> last_change 6287147 lfor 0/5364613/5364611 flags hashpspool
> >> stripe_width 0 application cephfs
> >> pool 8 'cephfs_metadata_home' replicated size 3 min_size 2 crush_rule
> >> 3 object_hash rjenkins pg_num 32 pgp_num 32 autoscale_mode on
> >> last_change 641 lfor 0/641/639 flags hashpspool
> >> stripe_width 0 application cephfs
> >> pool 9 'rbd_backup_vms' replicated size 3 min_size 2 crush_rule 2
> >> object_hash rjenkins pg_num 1024 pgp_num 1024 autoscale_mode on
> >> last_change 6365131 lfor 0/211948/249421 flags
> >> hashpspool,selfmanaged_snaps stripe_width 0 application rbd
> >> pool 10 '.mgr' replicated size 3 min_size 2 crush_rule 0 object_hash
> >> rjenkins pg_num 1 pgp_num 1 autoscale_mode warn last_change 6365131
> >> flags hashpspool stripe_width 0 pg_num_min 1 application
> >> mgr,mgr_devicehealth
> >>
> >> $ sudo ceph df
> >> --- RAW STORAGE ---
> >> CLASS SIZEAVAIL USED  RAW USED  %RAW USED
> >> hdd306 TiB  186 TiB  119 TiB   119 TiB  39.00
> >> nvme   4.4 TiB  4.3 TiB  118 GiB   118 GiB   2.63
> >> TOTAL  310 TiB  191 TiB  119 TiB   119 TiB  38.49
> >>
> >> --- POOLS ---
> >> POOL  ID   PGS  STORED  OBJECTSUSED  %USED MAX AVAIL
> >> cephfs_data_home   7   512  12 TiB   28.86M  12 TiB 12.85 27 TiB
> >> cephfs_metadata_home   832  33 GiB3.63M  33 GiB 0.79 1.3 TiB
> >> rbd_backup_vms 9  1024  24 TiB6.42M  24 TiB 58.65 5.6 TiB
> >> .mgr  10 1  35 MiB9  35 MiB 0 12 TiB
> >>
> >> I am going to extend the rbd pool (rbd_backup_vms), currently used at
> >> 60%.
> >> This pool contains 60 disks, i.e. 20 disks by rack in the crushmap.
> >> This pool is used for storing VM disk images (available to a separate
> >> ProxmoxVE cluster)
> >>
> >> For this purpose, I am going to add 42 disks of the same size as
> >> those currently in the pool, i.e. 14 additional disks on each rack.
> >>
> >> Currently, this pool is configured with 1024 pgs.
> >> Before this operation, I would like to extend the number of pgs,
> >> let's say 2048 (i.e. double).
> >>
> >> I wonder about the overall impact of this change on the cluster. I
> >> guess that the heavy moves in the pgs will have a strong impact
> >> regarding the iops?
> >>
> >> I have two questions:
> >>
> >> 1) Is it useful to make this modification before adding the new OSDs?
> >> (I'm afraid of warnings about full or nearfull pgs if not)
> >>
> >> 2) are there any configuration recommendations in order to minimize
> >> these anticipated impacts?
> >>
> >> Thank you!
> >>
> >> Cheers,
> >> Hervé
> >> ___
> >> ceph-users mailing list -- ceph-users@ceph.io
> >> To unsubscribe send an email to ceph-users-le...@ceph.io
> >
> > ___
> > ceph-users mailing list -- ceph-users@ceph.io
> > To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph osd down doesn't seem to work

2023-10-03 Thread Josh Baergen
Hi Simon,

If the OSD is actually up, using 'ceph osd down` will cause it to flap
but come back immediately. To prevent this, you would want to 'ceph
osd set noup'. However, I don't think this is what you actually want:

> I'm thinking (but perhaps incorrectly?) that it would be good to keep the OSD 
> down+in, to try to read from it as long as possible

In this case, you actually want it up+out ('ceph osd out XXX'), though
if it's replicated then marking it out will switch primaries around so
that it's not actually read from anymore. It doesn't look like you
have that much recovery backfill left, so hopefully you'll be in a
clean state soon, though you'll have to deal with those 'inconsistent'
and 'recovery_unfound' PGs.

Josh

On Tue, Oct 3, 2023 at 10:14 AM Simon Oosthoek  wrote:
>
> Hi
>
> I'm trying to mark one OSD as down, so we can clean it out and replace
> it. It keeps getting medium read errors, so it's bound to fail sooner
> rather than later. When I command ceph from the mon to mark the osd
> down, it doesn't actually do it. When the service on the osd stops, it
> is also marked out and I'm thinking (but perhaps incorrectly?) that it
> would be good to keep the OSD down+in, to try to read from it as long as
> possible. Why doesn't it get marked down and stay that way when I
> command it?
>
> Context: Our cluster is in a bit of a less optimal state (see below),
> this is after one of OSD nodes had failed and took a week to get back up
> (long story). Due to a seriously unbalanced filling of our OSDs we kept
> having to reweight OSDs to keep below the 85% threshold. Several disks
> are starting to fail now (they're 4+ years old and failures are expected
> to occur more frequently).
>
> I'm open to suggestions to help get us back to health_ok more quickly,
> but I think we'll get there eventually anyway...
>
> Cheers
>
> /Simon
>
> 
>
> # ceph -s
>cluster:
>  health: HEALTH_ERR
>  1 clients failing to respond to cache pressure
>  1/843763422 objects unfound (0.000%)
>  noout flag(s) set
>  14 scrub errors
>  Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
>  Degraded data redundancy: 13795525/7095598195 objects
> degraded (0.194%), 13 pgs degraded, 12 pgs undersized
>  70 pgs not deep-scrubbed in time
>  65 pgs not scrubbed in time
>
>services:
>  mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 11h)
>  mgr: cephmon3(active, since 35h), standbys: cephmon1
>  mds: 1/1 daemons up, 1 standby
>  osd: 264 osds: 264 up (since 2m), 264 in (since 75m); 227 remapped pgs
>   flags noout
>  rgw: 8 daemons active (4 hosts, 1 zones)
>
>data:
>  volumes: 1/1 healthy
>  pools:   15 pools, 3681 pgs
>  objects: 843.76M objects, 1.2 PiB
>  usage:   2.0 PiB used, 847 TiB / 2.8 PiB avail
>  pgs: 13795525/7095598195 objects degraded (0.194%)
>   54839263/7095598195 objects misplaced (0.773%)
>   1/843763422 objects unfound (0.000%)
>   3374 active+clean
>   195  active+remapped+backfill_wait
>   65   active+clean+scrubbing+deep
>   20   active+remapped+backfilling
>   11   active+clean+snaptrim
>   10   active+undersized+degraded+remapped+backfill_wait
>   2active+undersized+degraded+remapped+backfilling
>   2active+clean+scrubbing
>   1active+recovery_unfound+degraded
>   1active+clean+inconsistent
>
>progress:
>  Global Recovery Event (8h)
>[==..] (remaining: 2h)
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph osd down doesn't seem to work

2023-10-03 Thread Anthony D'Atri
And unless you *need* a given ailing OSD to be up because it's the only copy of 
data, you may get better recovery/backfill results by stopping the service for 
that OSD entirely, so that the recovery reads all to to healthier OSDs.

> On Oct 3, 2023, at 12:21, Josh Baergen  wrote:
> 
> Hi Simon,
> 
> If the OSD is actually up, using 'ceph osd down` will cause it to flap
> but come back immediately. To prevent this, you would want to 'ceph
> osd set noup'. However, I don't think this is what you actually want:
> 
>> I'm thinking (but perhaps incorrectly?) that it would be good to keep the 
>> OSD down+in, to try to read from it as long as possible
> 
> In this case, you actually want it up+out ('ceph osd out XXX'), though
> if it's replicated then marking it out will switch primaries around so
> that it's not actually read from anymore. It doesn't look like you
> have that much recovery backfill left, so hopefully you'll be in a
> clean state soon, though you'll have to deal with those 'inconsistent'
> and 'recovery_unfound' PGs.
> 
> Josh
> 
> On Tue, Oct 3, 2023 at 10:14 AM Simon Oosthoek  
> wrote:
>> 
>> Hi
>> 
>> I'm trying to mark one OSD as down, so we can clean it out and replace
>> it. It keeps getting medium read errors, so it's bound to fail sooner
>> rather than later. When I command ceph from the mon to mark the osd
>> down, it doesn't actually do it. When the service on the osd stops, it
>> is also marked out and I'm thinking (but perhaps incorrectly?) that it
>> would be good to keep the OSD down+in, to try to read from it as long as
>> possible. Why doesn't it get marked down and stay that way when I
>> command it?
>> 
>> Context: Our cluster is in a bit of a less optimal state (see below),
>> this is after one of OSD nodes had failed and took a week to get back up
>> (long story). Due to a seriously unbalanced filling of our OSDs we kept
>> having to reweight OSDs to keep below the 85% threshold. Several disks
>> are starting to fail now (they're 4+ years old and failures are expected
>> to occur more frequently).
>> 
>> I'm open to suggestions to help get us back to health_ok more quickly,
>> but I think we'll get there eventually anyway...
>> 
>> Cheers
>> 
>> /Simon
>> 
>> 
>> 
>> # ceph -s
>>   cluster:
>> health: HEALTH_ERR
>> 1 clients failing to respond to cache pressure
>> 1/843763422 objects unfound (0.000%)
>> noout flag(s) set
>> 14 scrub errors
>> Possible data damage: 1 pg recovery_unfound, 1 pg inconsistent
>> Degraded data redundancy: 13795525/7095598195 objects
>> degraded (0.194%), 13 pgs degraded, 12 pgs undersized
>> 70 pgs not deep-scrubbed in time
>> 65 pgs not scrubbed in time
>> 
>>   services:
>> mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 11h)
>> mgr: cephmon3(active, since 35h), standbys: cephmon1
>> mds: 1/1 daemons up, 1 standby
>> osd: 264 osds: 264 up (since 2m), 264 in (since 75m); 227 remapped pgs
>>  flags noout
>> rgw: 8 daemons active (4 hosts, 1 zones)
>> 
>>   data:
>> volumes: 1/1 healthy
>> pools:   15 pools, 3681 pgs
>> objects: 843.76M objects, 1.2 PiB
>> usage:   2.0 PiB used, 847 TiB / 2.8 PiB avail
>> pgs: 13795525/7095598195 objects degraded (0.194%)
>>  54839263/7095598195 objects misplaced (0.773%)
>>  1/843763422 objects unfound (0.000%)
>>  3374 active+clean
>>  195  active+remapped+backfill_wait
>>  65   active+clean+scrubbing+deep
>>  20   active+remapped+backfilling
>>  11   active+clean+snaptrim
>>  10   active+undersized+degraded+remapped+backfill_wait
>>  2active+undersized+degraded+remapped+backfilling
>>  2active+clean+scrubbing
>>  1active+recovery_unfound+degraded
>>  1active+clean+inconsistent
>> 
>>   progress:
>> Global Recovery Event (8h)
>>   [==..] (remaining: 2h)
>> ___
>> ceph-users mailing list -- ceph-users@ceph.io
>> To unsubscribe send an email to ceph-users-le...@ceph.io
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: ceph osd down doesn't seem to work

2023-10-03 Thread Simon Oosthoek

Hoi Josh,

thanks for the explanation, I want to mark it out, not down :-)

Most use of our cluster is in EC 8+3 or 5+4 pools, so one missing osd 
isn't bad, but if some of the blocks can still be read it may help to 
move them to safety. (This is how I imagine things anyway ;-)


I'll have to look into the manually correcting of those inconsistent PGs 
if they don't recover by ceph-magic alone...


Cheers

/Simon

On 03/10/2023 18:21, Josh Baergen wrote:

Hi Simon,

If the OSD is actually up, using 'ceph osd down` will cause it to flap
but come back immediately. To prevent this, you would want to 'ceph
osd set noup'. However, I don't think this is what you actually want:


I'm thinking (but perhaps incorrectly?) that it would be good to keep the OSD 
down+in, to try to read from it as long as possible


In this case, you actually want it up+out ('ceph osd out XXX'), though
if it's replicated then marking it out will switch primaries around so
that it's not actually read from anymore. It doesn't look like you
have that much recovery backfill left, so hopefully you'll be in a
clean state soon, though you'll have to deal with those 'inconsistent'
and 'recovery_unfound' PGs.

Josh

On Tue, Oct 3, 2023 at 10:14 AM Simon Oosthoek  wrote:


Hi

I'm trying to mark one OSD as down, so we can clean it out and replace
it. It keeps getting medium read errors, so it's bound to fail sooner
rather than later. When I command ceph from the mon to mark the osd
down, it doesn't actually do it. When the service on the osd stops, it
is also marked out and I'm thinking (but perhaps incorrectly?) that it
would be good to keep the OSD down+in, to try to read from it as long as
possible. Why doesn't it get marked down and stay that way when I
command it?

Context: Our cluster is in a bit of a less optimal state (see below),
this is after one of OSD nodes had failed and took a week to get back up
(long story). Due to a seriously unbalanced filling of our OSDs we kept
having to reweight OSDs to keep below the 85% threshold. Several disks
are starting to fail now (they're 4+ years old and failures are expected
to occur more frequently).

I'm open to suggestions to help get us back to health_ok more quickly,
but I think we'll get there eventually anyway...

Cheers

/Simon


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] VM hangs when overwriting a file on erasure coded RBD

2023-10-03 Thread Peter Linder

Dear all,

I have a problem that after an OSD host lost connection to the 
sync/cluster rear network for many hours (the public network was 
online), a test VM using RBD cant overwrite its files. I can create a 
new file inside it just fine, but not overwrite it, the process just hangs.


The VM's disk is on an erasure coded data pool and a replicated pool in 
front of it. EC overwrites is on for the pool.


The custer consists of 5 hosts and 4 OSDs on each, and separate hosts 
for compute. There is a public and separate cluster network, separated. 
In this case, the AOC cable to the cluster network went link down on a 
host and it had to be replaced and the host was rebooted. Recovery took 
about a week to complete. The host was half-down for about 12 hours like 
this.


I have some other VMs as well with images in the same pool (totally 4), 
and they seem to work fine, it is just this one that cant overwrite.


I'm thinking there is somehow something wrong with just this image?

Regards,

Peter
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD

2023-10-03 Thread Patrick Bégou

Hi all,

still stuck with this problem.

I've deployed octopus and all my HDD have been setup as osd. Fine.
I've upgraded to pacific and 2 osd have failed. They have been 
automatically removed and upgrade finishes. Cluster Health is finaly OK, 
no data loss.


But now I cannot re-add these osd with pacific (I had previous troubles 
on these old HDDs, lost one osd in octopus and was able to reset and 
re-add it).


I've tried manually to add the first osd on the node where it is 
located, following 
https://docs.ceph.com/en/pacific/rados/operations/bluestore-migration/ 
(not sure it's the best idea...) but it fails too. This node was the one 
used for deploying the cluster.


[ceph: root@mostha1 /]# *ceph-volume lvm zap /dev/sdc*
--> Zapping: /dev/sdc
--> --destroy was not specified, but zapping a whole device will remove 
the partition table
Running command: /usr/bin/dd if=/dev/zero of=/dev/sdc bs=1M count=10 
conv=fsync

 stderr: 10+0 records in
10+0 records out
10485760 bytes (10 MB, 10 MiB) copied, 0.663425 s, 15.8 MB/s
--> Zapping successful for: 


[ceph: root@mostha1 /]# *ceph-volume lvm create --bluestore --data 
/dev/sdc*

Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name 
client.bootstrap-osd --keyring /var/lib/ceph/bootstrap-osd/ceph.keyring 
-i - osd new 9f1eb8ee-41e6-4350-ad73-1be21234ec7c
 stderr: 2023-10-02T16:09:29.855+ 7fb4eb8c0700 -1 auth: unable to 
find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such 
file or directory
 stderr: 2023-10-02T16:09:29.855+ 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4e405c4d8) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.856+ 7fb4eb8c0700 -1 auth: unable to 
find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such 
file or directory
 stderr: 2023-10-02T16:09:29.856+ 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4e40601d0) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.857+ 7fb4eb8c0700 -1 auth: unable to 
find a keyring on /var/lib/ceph/bootstrap-osd/ceph.keyring: (2) No such 
file or directory
 stderr: 2023-10-02T16:09:29.857+ 7fb4eb8c0700 -1 
AuthRegistry(0x7fb4eb8bee90) no keyring found at 
/var/lib/ceph/bootstrap-osd/ceph.keyring, disabling cephx
 stderr: 2023-10-02T16:09:29.858+ 7fb4e965c700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods [2] 
but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4e9e5d700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods [2] 
but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4e8e5b700 -1 
monclient(hunting): handle_auth_bad_method server allowed_methods [2] 
but i only support [1]
 stderr: 2023-10-02T16:09:29.858+ 7fb4eb8c0700 -1 monclient: 
authenticate NOTE: no keyring found; disabled cephx authentication
 stderr: [errno 13] RADOS permission denied (error connecting to the 
cluster)

-->  RuntimeError: Unable to create a new OSD id

Any idea of what is wrong ?

Thanks

Patrick
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ceph luminous client connect to ceph reef always permission denied

2023-10-03 Thread Pureewat Kaewpoi
Hi All !

We have a new installed cluster with ceph reef. but our old client still using 
ceph luminous.
The problem is when using any command to ceph cluster It will hang and no any 
output.

This is a output from command ceph osd pool ls --debug-ms 1
2023-10-02 23:35:22.727089 7fc93807c700  1  Processor -- start
2023-10-02 23:35:22.729256 7fc93807c700  1 -- - start start
2023-10-02 23:35:22.729790 7fc93807c700  1 -- - --> MON-1:6789/0 -- auth(proto 
0 34 bytes epoch 0) v1 -- 0x7fc930174cb0 con 0
2023-10-02 23:35:22.730724 7fc935e72700  1 -- CLIENT:0/187462963 learned_addr 
learned my addr CLIENT:0/187462963
2023-10-02 23:35:22.732091 7fc927fff700  1 -- CLIENT:0/187462963 <== mon.0 
MON-1:6789/0 1  auth_reply(proto 2 0 (0) Success) v1  33+0+0 
(2762451217 0 0) 0x7fc920002310 con 0x7fc93017d0f0
2023-10-02 23:35:22.732228 7fc927fff700  1 -- CLIENT:0/187462963 --> 
MON-1:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7fc914000fc0 con 0
2023-10-02 23:35:22.733237 7fc927fff700  1 -- CLIENT:0/187462963 <== mon.0 
MON-1:6789/0 2  auth_reply(proto 2 0 (0) Success) v1  206+0+0 
(3693167043 0 0) 0x7fc920002830 con 0x7fc93017d0f0
2023-10-02 23:35:22.733428 7fc927fff700  1 -- CLIENT:0/187462963 --> 
MON-1:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x7fc914002e10 con 0
2023-10-02 23:35:22.733451 7fc927fff700  1 -- CLIENT:0/187462963 <== mon.0 
MON-1:6789/0 3  mon_map magic: 0 v1  532+0+0 (3038142027 0 0) 
0x7fc92e50 con 0x7fc93017d0f0
2023-10-02 23:35:22.734365 7fc927fff700  1 -- CLIENT:0/187462963 <== mon.0 
MON-1:6789/0 4  auth_reply(proto 2 0 (0) Success) v1  580+0+0 
(3147563293 0 0) 0x7fc920001640 con 0x7fc93017d0f0
2023-10-02 23:35:22.734597 7fc927fff700  1 -- CLIENT:0/187462963 --> 
MON-1:6789/0 -- mon_subscribe({monmap=0+}) v2 -- 0x7fc9301755e0 con 0
2023-10-02 23:35:22.734678 7fc93807c700  1 -- CLIENT:0/187462963 --> 
MON-1:6789/0 -- mon_subscribe({mgrmap=0+}) v2 -- 0x7fc930180750 con 0
2023-10-02 23:35:22.734805 7fc93807c700  1 -- CLIENT:0/187462963 --> 
MON-1:6789/0 -- mon_subscribe({osdmap=0}) v2 -- 0x7fc930180f00 con 0
2023-10-02 23:35:22.734891 7fc935e72700  1 -- CLIENT:0/187462963 >> 
MON-1:6789/0 conn(0x7fc93017d0f0 :-1 s=STATE_OPEN pgs=754 cs=1 l=1).read_bulk 
peer close file descriptor 13
2023-10-02 23:35:22.734917 7fc935e72700  1 -- CLIENT:0/187462963 >> 
MON-1:6789/0 conn(0x7fc93017d0f0 :-1 s=STATE_OPEN pgs=754 cs=1 l=1).read_until 
read failed
2023-10-02 23:35:22.734922 7fc935e72700  1 -- CLIENT:0/187462963 >> 
MON-1:6789/0 conn(0x7fc93017d0f0 :-1 s=STATE_OPEN pgs=754 cs=1 l=1).process 
read tag failed
2023-10-02 23:35:22.734926 7fc935e72700  1 -- CLIENT:0/187462963 >> 
MON-1:6789/0 conn(0x7fc93017d0f0 :-1 s=STATE_OPEN pgs=754 cs=1 l=1).fault on 
lossy channel, failing
2023-10-02 23:35:22.734966 7fc927fff700  1 -- CLIENT:0/187462963 >> 
MON-1:6789/0 conn(0x7fc93017d0f0 :-1 s=STATE_CLOSED pgs=754 cs=1 l=1).mark_down
2023-10-02 23:35:22.735062 7fc927fff700  1 -- CLIENT:0/187462963 --> 
MON-2:6789/0 -- auth(proto 0 34 bytes epoch 3) v1 -- 0x7fc914005580 con 0
2023-10-02 23:35:22.735077 7fc927fff700  1 -- CLIENT:0/187462963 --> 
MON-3:6789/0 -- auth(proto 0 34 bytes epoch 3) v1 -- 0x7fc914005910 con 0
2023-10-02 23:35:22.737246 7fc927fff700  1 -- CLIENT:0/187462963 <== mon.2 
MON-3:6789/0 1  auth_reply(proto 2 0 (0) Success) v1  33+0+0 
(2138308960 0 0) 0x7fc920002fd0 con 0x7fc91400b0c0
2023-10-02 23:35:22.737443 7fc927fff700  1 -- CLIENT:0/187462963 --> 
MON-3:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7fc914014f10 con 0
2023-10-02 23:35:22.737765 7fc927fff700  1 -- CLIENT:0/187462963 <== mon.1 
MON-2:6789/0 1  auth_reply(proto 2 0 (0) Success) v1  33+0+0 
(3855879565 0 0) 0x7fc928002390 con 0x7fc91400f730
2023-10-02 23:35:22.737799 7fc927fff700  1 -- CLIENT:0/187462963 --> 
MON-2:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x7fc914015850 con 0
2023-10-02 23:35:22.737966 7fc927fff700  1 -- CLIENT:0/187462963 <== mon.2 
MON-3:6789/0 2  auth_reply(proto 2 -13 (13) Permission denied) v1  
24+0+0 (2583972696 0 0) 0x7fc920003240 con 0x7fc91400b0c0
2023-10-02 23:35:22.737981 7fc927fff700  1 -- CLIENT:0/187462963 >> 
MON-3:6789/0 conn(0x7fc91400b0c0 :-1 s=STATE_OPEN pgs=464 cs=1 l=1).mark_down
2023-10-02 23:35:22.738096 7fc927fff700  1 -- CLIENT:0/187462963 <== mon.1 
MON-2:6789/0 2  auth_reply(proto 2 -13 (13) Permission denied) v1  
24+0+0 (2583972696 0 0) 0x7fc928002650 con 0x7fc91400f730
2023-10-02 23:35:22.738110 7fc927fff700  1 -- CLIENT:0/187462963 >> 
MON-2:6789/0 conn(0x7fc91400f730 :-1 s=STATE_OPEN pgs=344 cs=1 l=1).mark_down

By the way I have using same keyring with ceph nautilus client it work well 
without any problem.

What should I do next ? Where to debug or where to fix this issue.
Thanks
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] ingress of haproxy is down after I specify the haproxy.cfg in quincy

2023-10-03 Thread wjsherry075
Hello,
I have a haproxy problem in ceph quincy 17.2.6. Ununtu 22.04
The haproxy image can't be up after I specify the haproxy.cfg, and there is no 
error in the logs.
I set the haproxy.cfg: ceph config-key set 
mgr/cephadm/services/ingress/haproxy.cfg -i  haproxy.cfg
If I remove the haproxy, and let cephadm to generate automatically, it works. I 
tried to create a file same as the cephadm generated, but haproxy were still 
down.
NAME  HOSTPORTS 
STATUS REFRESHED  AGE  MEM USE  MEM LIM  VERSIONIMAGE ID  
CONTAINER ID
haproxy.rgw.foo.test01.pfiiix test01  *:80,9101 error 
2m ago   6h-- 
haproxy.rgw.foo.test02.fsnhnb test02  *:80,9101 error 
2m ago   6h-- 

Any advice on what to do?

Thanks,
Jie
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw-admin sync error trim seems to do nothing

2023-10-03 Thread Matthew Darwin

Hello all,

Any solution to this? I want to trim the error log to get rid of the 
warnings:


Large omap object found. Object: 13:a24ff46e:::sync.error-log.2:head 
PG: 13.762ff245 (13.5) Key count: 236174 Size (bytes): 58472797


Seems similar report: https://tracker.ceph.com/issues/62845


On 2023-08-22 08:00, Matthew Darwin wrote:

Thanks Rich,

On quincy it seems that provding an end-date is an error.  Any other 
ideas from anyone?


$ radosgw-admin sync error trim --end-date="2023-08-20 23:00:00"
end-date not allowed.

On 2023-08-20 19:00, Richard Bade wrote:

Hi Matthew,
At least for nautilus (14.2.22) i have discovered through trial and
error that you need to specify a beginning or end date. Something like
this:
radosgw-admin sync error trim --end-date="2023-08-20 23:00:00"
--rgw-zone={your_zone_name}

I specify the zone as there's a error list for each zone.
Hopefully that helps.

Rich

--

Date: Sat, 19 Aug 2023 12:48:55 -0400
From: Matthew Darwin 
Subject: [ceph-users] radosgw-admin sync error trim seems to do
   nothing
To: Ceph Users 
Message-ID: <95e7edfd-ca29-fc0e-a30a-987f1c43e...@mdarwin.ca>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hello all,

"radosgw-admin sync error list" returns errors from 2022.  I want to
clear those out.

I tried "radosgw-admin sync error trim" but it seems to do nothing.
The man page seems to offer no suggestions
https://protect-au.mimecast.com/s/26o0CzvkGRhLoOXfXjZR3?domain=docs.ceph.com 



Any ideas what I need to do to remove old errors? (or at least I want
to see more recent errors)

ceph version 17.2.6 (quincy)

Thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: cephfs health warn

2023-10-03 Thread Venky Shankar
Hi Ben,

On Tue, Oct 3, 2023 at 8:56 PM Ben  wrote:
>
> Yes, I am. 8 active + 2 standby, no subtree pinning. What if I restart the 
> mds with trimming issues? Trying to figure out what happens with restarting.

We have come across instances in the past where multimds without
subtree pinning can lead to accumulation of log segments which then
leada to trim warnings. This happens due to the default mds balancer
misbehaving. We have a change that's pending merge (and backport)
which switches off the default balancer for this very reason.

https://github.com/ceph/ceph/pull/52196

Suggest using single active mds or multimds with subtree pinning.

>
> Venky Shankar  于2023年10月3日周二 12:39写道:
>>
>> Hi Ben,
>>
>> Are you using multimds without subtree pinning?
>>
>> On Tue, Oct 3, 2023 at 10:00 AM Ben  wrote:
>> >
>> > Dear cephers:
>> > more log captures(see below) show the full segments list(more than 3 to
>> > be trimmed stuck, growing over time). any ideas to get out of this?
>> >
>> > Thanks,
>> > Ben
>> >
>> >
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expiring segment 195341004/893374309813, 180 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expired segment 195341184/893386318445, 145 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expired segment 195341329/893386757388, 1024 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expiring segment 195342353/893388361174, 1024 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expiring segment 195343377/893389870480, 790 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expiring segment 195344167/893390955408, 1024 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expiring segment 195345191/893392321470, 1024 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expiring segment 195346215/893393752928, 1024 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expired segment 195347239/893395131457, 2 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expiring segment 195347241/893395212055, 1024 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expiring segment 195348265/893396582755, 1024 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expiring segment 195349289/893398132191, 860 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expiring segment 195350149/893399338619, 42 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expired segment 195350192/893408004655, 33 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expired segment 195350226/893412331017, 23 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expired segment 195350249/893416563419, 20 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expired segment 195350269/893420702085, 244 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expired segment 195350513/893424694857, 74 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expired segment 195350587/893428947395, 843 events
>> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim already
>> > expired segment 195351430/893432893900, 1019 events
>> > .
>> > . (all expired items abbreviated)
>> > .
>> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim already
>> > expired segment 216605661/827226016068, 100 events
>> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim already
>> > expired segment 216605761/827230263164, 153 events
>> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim already
>> > expired segment 216605914/827234408294, 35 events
>> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim already
>> > expired segment 216605949/827238527911, 1024 events
>> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim already
>> > expired segment 216606973/827241813316, 344 events
>> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim already
>> > expired segment 216607317/827242580233, 1024 events
>> > debug 2023-09-30T15:10:56.521+ 7fc752167700 6 mds.3.journal LogSegment(
>> > 216608341/827244781542).try_to_expire
>> > debug 2023-09-30T15:10:56.521+ 7fc752167700 4 mds.3.sessionmap
>> > save_if_dirty: writing 0
>> > debug 2023-09-30T15:10:56.521+ 7fc752167700 6 mds.3.journal LogSegment(
>> > 216608341/827244781542).try_to_expire success
>> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds

[ceph-users] Re: cephfs health warn

2023-10-03 Thread Ben
Hi Venky,

thanks for help on this. Will change to multimds with subtree pinning.

For the moment, it needs to get the segments list items go by loop of
expiring -> expired -> trimmed. It is observed that each problematic mds
has a few expiring segment stuck in the road of trimming. the segment list
continually grows overtime. Any ideas for having the segment list to be
processed well normal again?

The issue has been around for weeks and haven't seen complaints from
storage client side so far.

Best wishes,
Ben

Venky Shankar  于2023年10月4日周三 13:31写道:

> Hi Ben,
>
> On Tue, Oct 3, 2023 at 8:56 PM Ben  wrote:
> >
> > Yes, I am. 8 active + 2 standby, no subtree pinning. What if I restart
> the mds with trimming issues? Trying to figure out what happens with
> restarting.
>
> We have come across instances in the past where multimds without
> subtree pinning can lead to accumulation of log segments which then
> leada to trim warnings. This happens due to the default mds balancer
> misbehaving. We have a change that's pending merge (and backport)
> which switches off the default balancer for this very reason.
>
> https://github.com/ceph/ceph/pull/52196
>
> Suggest using single active mds or multimds with subtree pinning.
>
> >
> > Venky Shankar  于2023年10月3日周二 12:39写道:
> >>
> >> Hi Ben,
> >>
> >> Are you using multimds without subtree pinning?
> >>
> >> On Tue, Oct 3, 2023 at 10:00 AM Ben  wrote:
> >> >
> >> > Dear cephers:
> >> > more log captures(see below) show the full segments list(more than
> 3 to
> >> > be trimmed stuck, growing over time). any ideas to get out of this?
> >> >
> >> > Thanks,
> >> > Ben
> >> >
> >> >
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expiring segment 195341004/893374309813, 180 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expired segment 195341184/893386318445, 145 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expired segment 195341329/893386757388, 1024 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expiring segment 195342353/893388361174, 1024 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expiring segment 195343377/893389870480, 790 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expiring segment 195344167/893390955408, 1024 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expiring segment 195345191/893392321470, 1024 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expiring segment 195346215/893393752928, 1024 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expired segment 195347239/893395131457, 2 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expiring segment 195347241/893395212055, 1024 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expiring segment 195348265/893396582755, 1024 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expiring segment 195349289/893398132191, 860 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expiring segment 195350149/893399338619, 42 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expired segment 195350192/893408004655, 33 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expired segment 195350226/893412331017, 23 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expired segment 195350249/893416563419, 20 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expired segment 195350269/893420702085, 244 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expired segment 195350513/893424694857, 74 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expired segment 195350587/893428947395, 843 events
> >> > debug 2023-09-30T14:34:14.557+ 7f9c29bb1700 5 mds.4.log trim
> already
> >> > expired segment 195351430/893432893900, 1019 events
> >> > .
> >> > . (all expired items abbreviated)
> >> > .
> >> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim
> already
> >> > expired segment 216605661/827226016068, 100 events
> >> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim
> already
> >> > expired segment 216605761/827230263164, 153 events
> >> > debug 2023-09-30T15:10:56.521+ 7fc752167700 5 mds.3.log trim
> already
> >> > expired segment 216605914/827234408294, 35 events
> >> > debug 2023-09-30T15:10:56.521+ 7fc752167700

[ceph-users] Re: Slow recovery and inaccurate recovery figures since Quincy upgrade

2023-10-03 Thread Sridhar Seshasayee
To help complete the recovery, you can temporarily try disabling scrub and
deep scrub
operations by running:

ceph osd set noscrub
ceph osd set nodeep-scrub

This should help speed up the recovery process. Once the recovery is done,
you
can unset the above scrub flags and revert the mClock profile back to
'high_client_ops'.

If the above doesn't help, then there's something else that's causing the
slow recovery.
-Sridhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io