[ceph-users] Re: backfilling kills rbd performance

2022-11-20 Thread Frank Schilder
Hi Martin,

did you change from rep=2+min_size=1 to rep=3+min_size=2 in one go? I'm 
wondering if the missing extra shard could case PGs going to read-only 
occasionally. Maybe do min_size=1 until all PGs have 3 shards and then set 
min_size=2.

You can set recovery_sleep to a non-zero value. It is zero by default, which 
means recovery can take over all IO. We set it to a small number between 0.0025 
and 0.05, depending on drive performance. The way we tuned it was to have a 
massive backfill operation going on, and then:

- set osd_recovery_sleep to 0 and take note of average recovery througput
- increase osd_recovery_sleep until ca. 30-50% of IO capacity are used by 
recovery

Then, the remaining IO capacity is guaranteed to be available for clients. This 
works really well in our set-up.

Something specific about quincy is the use of mclock scheduler. You can try to 
set it back to wpq or look at high-client IO profiles.

Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14


From: martin.kon...@konsec.com  on behalf of Konold, 
Martin 
Sent: 19 November 2022 18:06:54
To: ceph-users@ceph.io
Subject: [ceph-users] Re: backfilling kills rbd performance

Hi,

On 2022-11-19 17:32, Anthony D'Atri wrote:
> I’m not positive that the options work with hyphens in them.  Try
>
> ceph tell osd.* injectargs '--osd_max_backfills 1
> --osd_recovery_max_active 1 --osd_recovery_max_single_start 1
> --osd_recovery_op_priority=1'

Did so.

> With Quincy the following should already be set, but to be sure:
>
> ceph tell osd.* config set osd_op_queue_cut_off high

Did so too and even restarted all osd as it was recommended.

I then stopped a single osd in order to cause some backfilling.

> What is network saturation like on that 1GE replication network?

Typically 100% saturated.

> Operations like yours that cause massive data movement could easily
> saturate a pipe that narrow.

Sure, but I am used to other setups where the recovery can be slowed
down in order to keep the rbds operating.

To me it looks like all backfilling happens in parallel without any
pauses in between which would benefit the client traffic.

I would expect some of those pgs in
active+undersized+degraded+remapped+backfill_wait state instead of
backfilling.

2022-11-19T16:58:50.139390+ mgr.pve-02 (mgr.18134134) 61735 :
cluster [DBG] pgmap v60978: 576 pgs: 102
active+undersized+degraded+remapped+backfilling, 474 active+clean; 2.4
TiB data, 4.3 TiB used, 10 TiB / 15 TiB avail; 150 KiB/s wr, 10 op/s;
123337/1272524 objects degraded (9.692%); 228 MiB/s, 58 objects/s
recovering

Is this Quincy specific?

Regards
--martin
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Scheduled RBD volume snapshots without mirrioring (-schedule)

2022-11-20 Thread Ilya Dryomov
On Fri, Nov 18, 2022 at 3:46 PM Tobias Bossert  wrote:
>
> Dear List
>
> I'm searching for a way to automate the snapshot creation/cleanup of RBD 
> volumes. Ideally, there would be something like the "Snapshot Scheduler for 
> cephfs"[1] but I understand
> this is not as "easy" with RBD devices since ceph has no idea of the 
> overlaying filesystem.
>
> So what I basically need is:
>
> - Create a snapshot of RBD image X, every N hours
> - Cleanup/thin-out snapshots which are older than Y
> - I do not care about mirroring
> - I do not care about possible filesystem inconsistencies
>
>
> Is there a way to accomplish such a behavior within ceph or do I have to 
> relay on external scripts (If yes, is anyone aware of such a script?)

Hi Tobias,

I don't think there is a way to accomplish that within Ceph.  The
existing RBD snapshot scheduling functionality is limited to mirror
snapshots.

I have seen a few referenced on this mailing list -- typically they
are under a few dozen lines of Bash so people tend to write them from
scratch tailored specifically to their environment (OpenStack, etc).

Thanks,

Ilya

>
>
> [1] https://docs.ceph.com/en/quincy/cephfs/snap-schedule/
>
>
> Many thanks in advance
>
> Tobias
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] RBD migration between pools looks to be stuck on commit

2022-11-20 Thread Jozef Matický

Hello,

I was migrating several RBDs between the two pools - from replicated to EC.
I have managed to migrate about twenty images without any issues, all 
realtivelly the same size.
It took generally about an hour to execute and 10 minutes to commit for 
each one.

The last image however got stuck on commit at 15% for like three days.
I did check the RBD and it was showing it is residing in a new pool. I 
was also able to mount it and use it.
I have thus decided to CTRL+C the commit assuming maybe the terminal 
session have expired while I didn't check the trash.
Now I have this image sitting in the trash saying it can't be deleted 
and saying it is still being migrated.


# rbd status vms/-data
Watchers: none

# rbd info vms/-data
rbd image '-data':
    size 24 TiB in 6291456 objects
    order 22 (4 MiB objects)
    snapshot_count: 0
    id: f779ee47e87321
    data_pool: ecpool
    block_name_prefix: rbd_data.3.f779ee47e87321
    format: 2
    features: layering, exclusive-lock, object-map, fast-diff, 
deep-flatten, data-pool

    op_features:
    flags:
    create_timestamp: Sat Nov 19 10:11:30 2022
    access_timestamp: Sat Nov 19 10:11:30 2022
    modify_timestamp: Sat Nov 19 10:11:30 2022

# rbd trash ls besteffort --all
81cb67ba7aaa59 -data

# rbd --pool besteffort trash remove 81cb67ba7aaa59
2022-11-20T18:59:41.655+ 7f854f7fe640 -1 
librbd::image::RefreshRequest: image being migrated
2022-11-20T18:59:41.655+ 7f854f7fe640 -1 librbd::image::OpenRequest: 
failed to refresh image: (30) Read-only file system
2022-11-20T18:59:41.655+ 7f854f7fe640 -1 librbd::ImageState: 
0x7f8538046be0 failed to open image: (30) Read-only file system
2022-11-20T18:59:41.656+ 7f85467fc640 -1 
librbd::image::RemoveRequest: 0x7f8538000b80 handle_open_image: error 
opening image: (30) Read-only file system

rbd: remove error: (30) Read-only file system
Removing image: 0% complete...failed.

Has anyone experienced this and if so, will waiting more resolve this or 
is there some workaround I can try?


Thank you.

Best regards,
Jozef.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: iscsi target lun error

2022-11-20 Thread Xiubo Li


On 15/11/2022 23:44, Randy Morgan wrote:
You are correct I am using the cephadm to create the iscsi portals. 
The cluster had been one I was learning a lot with and I wondered if 
it was because of the number of creations and deletions of things, so 
I rebuilt the cluster, now I am getting this response even when 
creating my first iscsi target.   Here is the output of the gwcli ls:


sh-4.4# gwcli ls
o- / 
 
[...]
  o- cluster 
 
[Clusters: 1]
  | o- ceph 
. 
[HEALTH_WARN]
  |   o- pools 
. 
[Pools: 8]
  |   | o- .rgw.root 
 [(x3), 
Commit: 0.00Y/71588776M (0%), Used: 1323b]
  |   | o- cephfs_data 
.. [(x3), 
Commit: 0.00Y/71588776M (0%), Used: 1639b]
  |   | o- cephfs_metadata 
.. [(x3), Commit: 
0.00Y/71588776M (0%), Used: 3434b]
  |   | o- default.rgw.control 
.. [(x3), Commit: 
0.00Y/71588776M (0%), Used: 0.00Y]
  |   | o- default.rgw.log 
.. [(x3), Commit: 
0.00Y/71588776M (0%), Used: 3702b]
  |   | o- default.rgw.meta 
.. [(x3), Commit: 
0.00Y/71588776M (0%), Used: 382b]
  |   | o- device_health_metrics 
 [(x3), Commit: 
0.00Y/71588776M (0%), Used: 0.00Y]
  |   | o- rhv-ceph-ssd 
. [(x3), Commit: 
0.00Y/7868560896K (0%), Used: 511746b]
  |   o- topology 
.. 
[OSDs: 36,MONs: 3]
  o- disks 
.. 
[0.00Y, Disks: 0]
  o- iscsi-targets 
.. 
[DiscoveryAuth: None, Targets: 1]
    o- iqn.2001-07.com.ceph:1668466555428 
... [Auth: 
None, Gateways: 1]
  o- disks 
. 
[Disks: 0]
  o- gateways 
... 
[Up: 1/1, Portals: 1]
  | o- host.containers.internal 
 
[192.168.105.145 (UP)]


Please manually remove this gateway before doing further steps.

It should be a bug in cephadm and you can raise one tracker for this.

Thanks


o- host-groups 
. 
[Groups : 0]
  o- hosts 
.. 
[Auth: ACL_ENABLED, Hosts: 0]

sh-4.4#

Randy

On 11/9/2022 6:36 PM, Xiubo Li wrote:


On 10/11/2022 02:21, Randy Morgan wrote:
I am trying to create a second iscsi target and I keep getting an 
error when I create the second target:



   Failed to update target 'iqn.2001-07.com.ceph:1667946365517'

disk create/update failed on host.containers.internal. LUN 
allocation failure


I think you were using the cephadm to add the iscsi targets, not the 
gwcli or Rest APIs directly.


Before we hit other issues were login failures, that because there 
were two gateways using the same IP address. Please share your `gwcli 
ls` output to see what the 'host.containers.internal' gateway's config.


Thanks!



I am running ceph Pacific: *Version*
16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503)
pacific (stable)

All of the information I can find on this problem is from 3 years 
ago and doesn't seem to apply any more.  Does anyone know how to 
correct this problem?


Randy







___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Recent ceph.io Performance Blog Posts

2022-11-20 Thread Stefan Kooman

On 11/9/22 14:30, Mark Nelson wrote:

On 11/9/22 4:48 AM, Stefan Kooman wrote:


On 11/8/22 21:20, Mark Nelson wrote:

Hi Folks,

I thought I would mention that I've released a couple of performance 
articles on the Ceph blog recently that might be of interest to people:


For sure, thanks a lot, it's really informative!

Can we also ask for special requests? One of the things that would 
help us (and CephFS users in general) is how performance of CephFS for 
small files (~512 bytes, 2k up to say 64K) is impacted by the amount 
of PGs a CephFS metadata pool has.


That's an interesting question.  I wouldn't really expect the metadata 
pool PG count to have a dramatic effect here at counts that result in 
reasonable pseudo-random distribution.  Have you seen otherwise?


I can't tell for sure yet. I'm going to perform some tests myself to try 
to figure it out.


Gr. Stefan


___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: backfilling kills rbd performance

2022-11-20 Thread Sridhar Seshasayee
Hi Martin,

In Quincy the osd_op_queue defaults to 'mclock_scheduler'. It was set to
'wpq' before Quincy.


> on a 3 node hyper converged pve cluster with 12 SSD osd devices I do
> experience stalls in the rbd performance during normal backfill
> operations e.g. moving a pool from 2/1 to 3/2.
>
> I was expecting that I could control the load caused by the backfilling
> using
>
> ceph tell 'osd.*' injectargs '--osd-max-backfills 1'
> or
> ceph tell 'osd.*' injectargs '--osd-recovery-max-active 1'
> even
> ceph tell 'osd.*' config set osd_recovery_sleep_ssd 2.1
> did not help.
>
> Any hints?
>

Due to the way mclock scheduler works, all the sleep options along with
backfill and recovery limits cannot be modified. This is documented here:
https://docs.ceph.com/en/quincy/rados/configuration/mclock-config-ref/#mclock-built-in-profiles


I am running Ceph Quincy 17.2.5 on a test system with dedicated
> 1Gbit/9000MTU storage network, while the public ceph network
> 1GBit/1500MTU is shared with the vm network.
>
> I am looking forward to you suggestions.
>
> The following optimizations are slated to be merged that modify the above
behavior especially with backfill/recovery that you are observing:

1. Reduce the current high limit set for backfill/recovery operations that
could
overwhelm client operations in some situations.

2. Allow users to modify the backfill/recovery limits if required using
another
gating option.

3. Optimize the mclock profiles so that client and recovery operations get
the
desired IOPS allocations.

Until the next upcoming Quincy release, to avoid the backfill/recovery
issue, you can
switch to the 'wpq' scheduler by setting osd_op_queue = wpq and restarting
the osds.
-Sridhar
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: 17.2.5 snap_schedule module error (cephsqlite: cannot open temporary database)

2022-11-20 Thread phandaal

On 2022-11-17 14:19, Milind Changire wrote:

On Thu, Nov 17, 2022 at 6:02 PM phandaal  wrote:

On 2022-11-17 12:58, Milind Changire wrote:
The error arrives when trying to restart old schedules
(schedule_client.py line 169) and trying to find the old store, which
does not exist, the schedules have been created in Pacific. Can I just
wipe them out to recreate the schedules from scratch ?



The error does show up when trying to restart old schedules, but the
ioctx.stat()
at line schedule_client.py:201 should've thrown a rados.ObjectNotFound
exception
and got caught at line 205. Which doesn't seem to be the case. Which
implies that
the backing rados object for the DB dump was found, but there was a
libcephsqlite
error as per your original email:
2022-11-17T09:50:25.769+0100 7f7be20db6c0 -1 cephsqlite:
Open: (client.444215)  cannot open temporary database

Hence the error at line 203:
db.executescript(dump)

The problem seems to be due to these missing bits:
pybind/mgr: use memory temp_store #48449



Thanks for pointing to this patch, I just added the two lines, restarted 
the mgr and everything went back to normal. All the schedules are back 
and working.


Regards,
Christian.

--
Christian Vilhelm : phand...@phandaal.net
Reality is for people who lack imagination
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io