[ceph-users] Understanding how crush works

2025-01-27 Thread Andre Tann

Hi list,

I have a problem understanding on how crush works when the crush map 
changes.


Let's take a pool with some data in it, and a crush map that enables a 
client to calculate itself where a particular chunk is stored.


Now we add more OSDs, which means, the crush map changes. Now most 
objects are misplaced, given the new crush map.


If the clients wants a particular chunk, it takes the modified map, but 
as the chunk is misplaced, it won't find it where the crush algorithm 
points to.


How can the client know which crush map to consider when doing the 
calculation?
Do the clients keep several versions of the map, and try them one after 
the other?


Thanks for some hints on this.
--
Andre Tann
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: unmatched rstat rbytes on single dirfrag

2025-01-27 Thread Eugen Block

Hi Frank,


It would just be great to have confirmation or a "no, its critical".


unfortunately, I'm not able to confirm that, I hope someone else can.

By the way, I have these on more than one rank, so it is probably  
not a fall-out of the recent recovery efforts.


In that case I would definitely avoid scrubbing at this time. ;-)


Zitat von Frank Schilder :


Hi Eugen,

my hypothesis is that these recursive counters are uncritical and,  
in fact, updated when the dir/file is modified/accessed. Attributes  
like ceph.dir.rbytes will show somewhat incorrect values, but these  
are approximate anyway (updates are propagated asynchronously).


It would just be great to have confirmation or a "no, its critical".

There was something similar when a read error occurred with "fast  
read" enabled. It also logged an "[ERR]" event even though it is  
expected to happen every now and then. In recent releases this was  
downgraded to a "[WRN]" event to reduce unwarranted panic amongst  
admins. My suspicion is that we have here a similar situation, these  
messages might better be "[WRN]" or even just "[DBG]".


By the way, I have these on more than one rank, so it is probably  
not a fall-out of the recent recovery efforts.


Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mix NVME's in a single cluster

2025-01-27 Thread Peter Grandi
 I have a Ceph Reef cluster with 10 hosts with 16 nvme slots
 but only half occupied with 15TB (2400 KIOPS) drives. 80
 drives in total. I want to add another 80 to fully populate
 the slots. The question: What would be the downside if I
 expand the cluster with 80 x 30TB (3300 KIOPS) drives?

Most previous replies have focused on potential capacity
bottlenecks even ifsome have mentioned PGs and balancing.

I reckon that balancing is by far the biggest issue you are
likely to have because most Ceph releases (I do not know about
Reef) have difficulty balancing across drives of different
sizes even with configuration changes.

Possible solutions/workarounds:

* Assign different CRUSH weights. This configuration change is
  "supposed" to work.

* Assign the 30TB drives to a different class and use them for
  new "pools".

* Split each 30TB drive into two OSDs. Not a good idea for HDDs
  of course but these are low latency SSDs.

The main other problem with large capacity OSDs is the size of
PGs, which can become very large with the default targets
numbers of PGs, and a previous commenter mentioned that.
I think that the current configuration style where one sets the
number of PGs rather than the size of PGs leads people astray.

In general my impression is that current Ceph defaults and its
very design (a single level of grouping: PGs) were meant to be
used with OSDs at most 1TB in size and larger OSDs are anyhow
not a good idea, but of course there are many people who know
better, and good luck to them.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] radosgw daemons with "stuck ops"

2025-01-27 Thread Reid Guyett
Hello,

We are experiencing slowdowns on one of our radosgw clusters. We restart
the radosgw daemons every 2 hours and things start getting slow after an
hour and a half. The avg get/put latencies go from 20ms/400ms to 1s/5s+
according to the metrics. When I stop traffic to one of the radosgw daemon
by setting it to DRAIN in HAProxy and HAProxy reports 0 sessions to the
daemon, I still see the objecter_requests still going up and down 15 min
later. linger_ops seems to stay at a constant 10 the whole time.

> [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> length'
> 46
> [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> length'
> 211
> [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> length'
> 427
> [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> length'
> 0
> [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> length'
> 26
> [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> length'
> 190
>
The requests say they are "call rgw.bucket_list in=234b" and mostly
reference ".dir.5e9bc383-f7bd-4fd1-b607-1e563bfe0011.886814949.1.N" where N
is 1-149 (the shard count for the bucket).

>   {
> "tid": 2583119,
> "pg": "8.579482b0",
> "osd": 208,
> "object_id":
> ".dir.5e9bc383-f7bd-4fd1-b607-1e563bfe0011.886814949.1.124",
> "object_locator": "@8",
> "target_object_id":
> ".dir.5e9bc383-f7bd-4fd1-b607-1e563bfe0011.886814949.1.124",
> "target_object_locator": "@8",
> "paused": 0,
> "used_replica": 0,
> "precalc_pgid": 0,
> "last_sent": "1796975.496664s",
> "age": 0.350012616,
> "attempts": 1,
> "snapid": "head",
> "snap_context": "0=[]",
> "mtime": "1970-01-01T00:00:00.00+",
> "osd_ops": [
>   "call rgw.bucket_list in=234b"
> ]
>   },
>
I don't think it should be other rgw processes because we have this
daemon's ceph.conf set to disable the other threads.

> rgw enable gc threads = false
> rgw enable lc threads = false
> rgw dynamic resharding = false
>

When I restart the service while it is still in a DRAINED state in HAProxy,
checking the objecter_requests yields 0 even a few minutes after it has
been up.

> [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> length'
> 0
> [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> length'
> 0
> [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> length'
> 0
> [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> length'
> 0
> [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> length'
> 0
> [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> length'
> 0
>

Any thoughts on why these ops appear to be stuck/recurring until restarting
the daemon? I think this is related to our performance issues but I don't
know what the fix is.

The rgws are 18.2.4 running as containers in Podman on Debian 11. Our other
clusters do not exhibit this behavior.

Thanks!

Reid
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Error ENOENT: Module not found

2025-01-27 Thread Frédéric Nass
Hi Dev, 

The config-history keys related to node3 are expected to remain even for nodes 
that you remove, as these keys are not cleaned up upon node deletion. 
These keys are used by the 'ceph config log' command to list all configuration 
changes that have occurred over time. 

The 'ceph status' shows 132 PGs undersized - certainly all related to EC3+2 
pools - and 685 other PGs related to replicated pools. 
With 5 nodes and 22 OSDs per node, that's 132*5/(95+22) = 5 PGs per OSD for EC 
pools and 685*3/(95+22) = 17 PGs per OSD for replicated pool. 

With such a low PG per OSD ratio, it is very likely that some pools are not 
utilizing all OSDs. You might want to increase the number of PGs in these pools 
to better balance I/O and improve performance. 
Additionally, ensure that you do not have an even number of MONs. 

Regards, 
Frédéric. 

- Le 25 Jan 25, à 20:34, Devender Singh  a écrit : 

> Hello Fredreic

> Thanks for your reply, Yes I also faced this issue after draining and removing
> of the node.
> So used the same command and remove “original_weight” using ceph config-key 
> get
> mgr/cephadm/osd_remove_queue and injected file again. Which resolved the orch
> issue.

> “ Error ENOENT: Module not found - ceph orch commands stoppd working

> ceph config-key get mgr/cephadm/osd_remove_queue > osd_remove_queue.json

> Then only remove the "original_weight" key from that json and upload it back 
> to
> the config-key store:

> ceph config-key set mgr/cephadm/osd_remove_queue -i
> osd_remove_queue_modified.json

> Then fail the mgr:

> ceph mgr fail ”

> But now issue is, my cluster showing objects misplaced, whereas I had 5 nodes
> with host failure domain with R3 pool (size 3 and min 2), EC with 3+2.

> # ceph -s

> cluster:

> id: 384d7590-d018-11ee-b74c-5b2acfe0b35c

> health: HEALTH_WARN

> Degraded data redundancy: 2848547/29106793 objects degraded (9.787%), 105 pgs
> degraded, 132 pgs undersized

> services:

> mon: 4 daemons, quorum node1,node5,node4,node2 (age 12h)

> mgr: node1.cvknae(active, since 12h), standbys: node4.foomun

> mds: 2/2 daemons up, 2 standby

> osd: 95 osds: 95 up (since 16h), 95 in (since 21h); 124 remapped pgs

> rgw: 2 daemons active (2 hosts, 1 zones)

> data:

> volumes: 2/2 healthy

> pools: 18 pools, 817 pgs

> objects: 6.06M objects, 20 TiB

> usage: 30 TiB used, 302 TiB / 332 TiB avail

> pgs: 2848547/29106793 objects degraded (9.787%)

> 2617833/29106793 objects misplaced (8.994%)

> 561 active+clean

> 124 active+clean+remapped

> 105 active+undersized+degraded

> 27 active+undersized

> io:

> client: 1.4 MiB/s rd, 4.0 MiB/s wr, 25 op/s rd, 545 op/s wr

> And when using 'ceph config-key ls’ it’s showing old node and osd’s.

> # ceph config-key ls|grep -i 03n

> "config-history/135/+osd/host:node3/osd_memory_target",

> "config-history/14990/+osd/host:node3/osd_memory_target",

> "config-history/14990/-osd/host:node3/osd_memory_target",

> "config-history/15003/+osd/host:node3/osd_memory_target",

> "config-history/15003/-osd/host:node3/osd_memory_target",

> "config-history/15016/+osd/host:node3/osd_memory_target",

> "config-history/15016/-osd/host:node3/osd_memory_target",

> "config-history/15017/+osd/host:node3/osd_memory_target",

> "config-history/15017/-osd/host:node3/osd_memory_target",

> "config-history/15022/+osd/host:node3/osd_memory_target",

> "config-history/15022/-osd/host:node3/osd_memory_target",

> "config-history/15024/+osd/host:node3/osd_memory_target",

> "config-history/15024/-osd/host:node3/osd_memory_target",

> "config-history/15025/+osd/host:node3/osd_memory_target",

> "config-history/15025/-osd/host:node3/osd_memory_target",

> "config-history/153/+osd/host:node3/osd_memory_target",

> "config-history/153/-osd/host:node3/osd_memory_target",

> "config-history/165/+mon.node3/container_image",

> "config-history/171/-mon.node3/container_image",

> "config-history/176/+client.crash.node3/container_image",

> "config-history/182/-client.crash.node3/container_image",

> "config-history/4276/+osd/host:node3/osd_memory_target",

> "config-history/4276/-osd/host:node3/osd_memory_target",

> "config-history/433/+client.ceph-exporter.node3/container_image",

> "config-history/439/-client.ceph-exporter.node3/container_image",

> "config-history/459/+osd/host:node3/osd_memory_target",

> "config-history/459/-osd/host:node3/osd_memory_target",

> "config-history/465/+osd/host:node3/osd_memory_target",

> "config-history/465/-osd/host:node3/osd_memory_target",

> "config-history/4867/+osd/host:node3/osd_memory_target",

> "config-history/4867/-osd/host:node3/osd_memory_target",

> "config-history/4878/+mon.node3/container_image",

> "config-history/4884/-mon.node3/container_image",

> "config-history/4889/+client.crash.node3/container_image",

> "config-history/4895/-client.crash.node3/container_image",

> "config-history/5139/+mds.k8s-dev-cephfs.node3.iebxqn/container_image",

> "config-history/5142/-mds.k8s-dev-cephfs.node3.iebxqn/container_

[ceph-users] Re: Seeking Participation! Take the new Ceph User Stores Survey!

2025-01-27 Thread Laura Flores
Hi all,

Huge thanks to the 46 community members who have already taken the survey!
It's still open, so if you haven't taken it already, follow this link to do
so!
https://docs.google.com/forms/d/e/1FAIpQLSe66NedXh4gHLgk9G45eqP5V2wHlz4IKqRmGUJ074peaTGNKQ/viewform?usp=sf_link

We still plan to keep the survey open for a bit, and we will update the
thread in advance when we decide to close it. Stay tuned!

Thanks,
Laura

On Tue, Jan 21, 2025 at 10:43 AM Laura Flores  wrote:

> Hi all,
>
> The Ceph User Council is conducting a survey to gather insights from
> community members who actively use production Ceph clusters. We want to
> hear directly from you: *What is the use case of your production Ceph
> cluster?*
>
> Since its official Argonaut release in 2012, Ceph has grown significantly
> in features and user adoption. By learning about your use cases, we aim to
> understand Ceph’s strengths and limitations in performance, scalability,
> and usability. This feedback will help inform future improvements to Ceph.
>
> Our ultimate goal is to enhance the Ceph community's shared knowledge by
> publishing real-world user stories on our website. These stories will serve
> as valuable resources for both current and future users, demonstrating the
> diversity and potential of Ceph in production environments.
>
> All responses are anonymous unless the participant willingly shares their
> contact information. The Ceph User Council will contact participants who
> opt to share their contact information before anything is officially
> published. We have not yet determined when the survey will close, but we
> will give a heads up before doing so.
>
> Take the survey here!
> https://docs.google.com/forms/d/e/1FAIpQLSe66NedXh4gHLgk9G45eqP5V2wHlz4IKqRmGUJ074peaTGNKQ/viewform?usp=sf_link
>
> Thanks,
> Laura Flores
>
> --
>
> Laura Flores
>
> She/Her/Hers
>
> Software Engineer, Ceph Storage 
>
> Chicago, IL
>
> lflo...@ibm.com | lflo...@redhat.com 
> M: +17087388804
>
>
>

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: radosgw daemons with "stuck ops"

2025-01-27 Thread Reid Guyett
Hi,
Looking at that tracker, I see that you were seeing some errors. I don't
really see any errors that stick out to me. When I turn the logs up to 20,
I see the following types of logs over and over:

> ...
> 2025-01-27T19:36:25.989+ 7f657bb17640 20 req 3636357367581543624
> 4066.843017578s s3:list_bucket list_objects_ordered: skipping past
> namespaced objects, including
> "_multipart_Data/2~a9lvduIp8rv3_btbiGEfOgAC-EvF85n.36"
> 2025-01-27T19:36:25.989+ 7f657bb17640 20 req 3636357367581543624
> 4066.843017578s s3:list_bucket list_objects_ordered: considering entry
> _multipart_Data/2~a9lvduIp8rv3_btbiGEfOgAC-EvF85n.39
> 2025-01-27T19:36:25.989+ 7f657bb17640 20 req 3636357367581543624
> 4066.843017578s s3:list_bucket list_objects_ordered: skipping past
> namespaced objects, including
> "_multipart_Data/2~a9lvduIp8rv3_btbiGEfOgAC-EvF85n.39"
> 2025-01-27T19:36:25.989+ 7f657bb17640 20 req 3636357367581543624
> 4066.843017578s s3:list_bucket list_objects_ordered: considering entry
> _multipart_Data/2~a9lvduIp8rv3_btbiGEfOgAC-EvF85n.5
> 2025-01-27T19:36:25.989+ 7f657bb17640 20 req 3636357367581543624
> 4066.843017578s s3:list_bucket list_objects_ordered: skipping past
> namespaced objects, including
> "_multipart_Data/2~a9lvduIp8rv3_btbiGEfOgAC-EvF85n.5"
> 2025-01-27T19:36:25.989+ 7f657bb17640 10 req 3636357367581543624
> 4066.843017578s s3:list_bucket list_objects_ordered: end of outer loop,
> truncated=1, count=0, attempt=4829
> 2025-01-27T19:36:25.989+ 7f657bb17640 20 req 3636357367581543624
> 4066.843017578s s3:list_bucket list_objects_ordered: starting attempt 4830
> 2025-01-27T19:36:25.989+ 7f657bb17640 10 req 3636357367581543624
> 4066.843017578s s3:list_bucket cls_bucket_list_ordered: request from each
> of 149 shard(s) for 1001 entries to get 1001 total entries
>
> ...
> 2025-01-27T19:40:53.704+ 7f64a716e640 20 req 8762048560071224804
> 5187.776367188s s3:list_bucket list_objects_ordered: skipping past
> namespaced objects, including
> "_multipart_Data/2~XSOCVgidHCMfywQ8-Z3kestTOwfqJ--.6"
> 2025-01-27T19:40:53.704+ 7f64a716e640 20 req 8762048560071224804
> 5187.776367188s s3:list_bucket list_objects_ordered: considering entry
> _multipart_Data/2~XSOCVgidHCMfywQ8-Z3kestTOwfqJ--.7
> 2025-01-27T19:40:53.704+ 7f64a716e640 20 req 8762048560071224804
> 5187.776367188s s3:list_bucket list_objects_ordered: skipping past
> namespaced objects, including
> "_multipart_Data/2~XSOCVgidHCMfywQ8-Z3kestTOwfqJ--.7"
> 2025-01-27T19:40:53.704+ 7f64a716e640 10 req 8762048560071224804
> 5187.776367188s s3:list_bucket list_objects_ordered: end of outer loop,
> truncated=1, count=0, attempt=9134
> 2025-01-27T19:40:53.704+ 7f64a716e640 20 req 8762048560071224804
> 5187.776367188s s3:list_bucket list_objects_ordered: starting attempt 9135
> 2025-01-27T19:40:53.704+ 7f64a716e640 10 req 8762048560071224804
> 5187.776367188s s3:list_bucket cls_bucket_list_ordered: request from each
> of 149 shard(s) for 1001 entries to get 1001 total entries
>
> ...
> 2025-01-27T19:40:53.093+ 7f657bb17640 20 req 3636357367581543624
> 4333.946777344s s3:list_bucket cls_bucket_list_ordered: currently
> processing
> _multipart_Data/.2~r5PVFujVZq_hkwZea-hrYgE13OI1MXP.19 from
> shard 8
> 2025-01-27T19:40:53.093+ 7f657bb17640 10 req 3636357367581543624
> 4333.946777344s s3:list_bucket cls_bucket_list_ordered: got
> _multipart_Data/.2~r5PVFujVZq_hkwZea-hrYgE13OI1MXP.19
> 2025-01-27T19:40:53.093+ 7f657bb17640 20 req 3636357367581543624
> 4333.946777344s s3:list_bucket cls_bucket_list_ordered: currently
> processing
> _multipart_Data/.2~r5PVFujVZq_hkwZea-hrYgE13OI1MXP.2 from
> shard 8
> 2025-01-27T19:40:53.093+ 7f657bb17640 10 req 3636357367581543624
> 4333.946777344s s3:list_bucket cls_bucket_list_ordered: got
> _multipart_Data/.2~r5PVFujVZq_hkwZea-hrYgE13OI1MXP.2
> 2025-01-27T19:40:53.093+ 7f657bb17640 10 req 3636357367581543624
> 4333.946777344s s3:list_bucket cls_bucket_list_ordered: stopped
> accumulating results at count=1001, dirent="", because its shard is
> truncated and exhausted
> 2025-01-27T19:40:53.093+ 7f657bb17640 20 req 3636357367581543624
> 4333.946777344s s3:list_bucket cls_bucket_list_ordered: returning,
> count=1001, is_truncated=1
>

The logs make it seem like it is attempting to list a bucket thousands of
times and 5187s is around 86 minutes. I only checked on one radosgw out of
6 but I suspect all are doing this in the background.

I am assuming that this isn't normal. Anybody shed some light on what the
logs mean? Also is there a way to link the req 3636357367581543624 to logs
at lower levels? I thought that I would just need to convert to hex but it
doesn't line up with the started/completed requests.

Thanks,

Reid

On Mon, Jan 27, 2025 at 11:36 AM Joshua Baergen 
wrote:

> Hey Reid,
>
> This sounds similar to what we saw in
> https://tracker.ceph.com/issues/62256, in case that helps with your
> investigation.
>
> Josh
>
> On Mon, Jan 27, 2025 at 8:0

[ceph-users] Re: radosgw daemons with "stuck ops"

2025-01-27 Thread Joshua Baergen
Hey Reid,

This sounds similar to what we saw in
https://tracker.ceph.com/issues/62256, in case that helps with your
investigation.

Josh

On Mon, Jan 27, 2025 at 8:07 AM Reid Guyett  wrote:
>
> Hello,
>
> We are experiencing slowdowns on one of our radosgw clusters. We restart
> the radosgw daemons every 2 hours and things start getting slow after an
> hour and a half. The avg get/put latencies go from 20ms/400ms to 1s/5s+
> according to the metrics. When I stop traffic to one of the radosgw daemon
> by setting it to DRAIN in HAProxy and HAProxy reports 0 sessions to the
> daemon, I still see the objecter_requests still going up and down 15 min
> later. linger_ops seems to stay at a constant 10 the whole time.
>
> > [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> > length'
> > 46
> > [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> > length'
> > 211
> > [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> > length'
> > 427
> > [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> > length'
> > 0
> > [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> > length'
> > 26
> > [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> > length'
> > 190
> >
> The requests say they are "call rgw.bucket_list in=234b" and mostly
> reference ".dir.5e9bc383-f7bd-4fd1-b607-1e563bfe0011.886814949.1.N" where N
> is 1-149 (the shard count for the bucket).
>
> >   {
> > "tid": 2583119,
> > "pg": "8.579482b0",
> > "osd": 208,
> > "object_id":
> > ".dir.5e9bc383-f7bd-4fd1-b607-1e563bfe0011.886814949.1.124",
> > "object_locator": "@8",
> > "target_object_id":
> > ".dir.5e9bc383-f7bd-4fd1-b607-1e563bfe0011.886814949.1.124",
> > "target_object_locator": "@8",
> > "paused": 0,
> > "used_replica": 0,
> > "precalc_pgid": 0,
> > "last_sent": "1796975.496664s",
> > "age": 0.350012616,
> > "attempts": 1,
> > "snapid": "head",
> > "snap_context": "0=[]",
> > "mtime": "1970-01-01T00:00:00.00+",
> > "osd_ops": [
> >   "call rgw.bucket_list in=234b"
> > ]
> >   },
> >
> I don't think it should be other rgw processes because we have this
> daemon's ceph.conf set to disable the other threads.
>
> > rgw enable gc threads = false
> > rgw enable lc threads = false
> > rgw dynamic resharding = false
> >
>
> When I restart the service while it is still in a DRAINED state in HAProxy,
> checking the objecter_requests yields 0 even a few minutes after it has
> been up.
>
> > [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> > length'
> > 0
> > [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> > length'
> > 0
> > [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> > length'
> > 0
> > [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> > length'
> > 0
> > [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> > length'
> > 0
> > [root@rgw1 ~]# ceph daemon client.rgw.rgw1 objecter_requests | jq '.ops |
> > length'
> > 0
> >
>
> Any thoughts on why these ops appear to be stuck/recurring until restarting
> the daemon? I think this is related to our performance issues but I don't
> know what the fix is.
>
> The rgws are 18.2.4 running as containers in Podman on Debian 11. Our other
> clusters do not exhibit this behavior.
>
> Thanks!
>
> Reid
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Cephalocon 2024 Recordings Available

2025-01-27 Thread Matt Vandermeulen

Hi folks!

The Cephalocon 2024 recordings are available on the YouTube channel!

- Channel: https://www.youtube.com/@Cephstorage/videos
- Cephalocon 2024 playlist: 
https://www.youtube.com/watch?v=ECkgu2zZzeQ&list=PLrBUGiINAakPfVfFfPQ5wLMQJFsLKTQCv


Thanks,
Matt
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] link to grafana dashboard with osd / host % usage

2025-01-27 Thread Marc


Is there an existing grafana dashboard/panel that sort of shows the % used on 
disks and hosts?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Ceph Steering Committee Notes 2025-01-27

2025-01-27 Thread Dan van der Ster
Hi all,

Here is a summary of the CSC meeting Jan 27, 2025. Full notes are
available at https://pad.ceph.com/p/csc-weekly-minutes

Component Leads Poll re: Workload and Bottlenecks

Dan proposes an informal poll for component leads to identify workload
distribution, bottlenecks, and areas where community contributions are
most needed. Venky shares that the CephFS team uses GitHub's
"Assigned" field to track PR reviews and suggests exploring good and
bad aspects of roles to involve more community support.

LRC On-Call Assistance

DavidG seeks improved mechanisms to involve developers and leads
during LRC outages. In a recent case, an outage was related (again) to
iSCSI, which is missing a lead maintainer. It was agreed to have a
follow up discussion on David's plans for modernizing the upstream
infra in the next meeting.

Tentacle Roadmap

Tentacle development freeze and release candidate timelines are not
finalized. Historically, dev freeze occurs by the end of January and
RC by the end of March. Discussions are to continue at the next CSC
meeting.

BOTO3 Issue

A BOTO3 change caused S3 client issues, with backports and fixes in
progress. Relevant links and PRs for tracking the problem are shared.

Strategy Working Group

Yehuda proposes forming a strategy working group to address high-level
Ceph planning over a 3–5 year horizon, as existing forums like CSC and
Dev Summit are too technical for such discussions. Discussion to
continue next meeting.

Cephalocon 2024 Recordings

Most recordings from Cephalocon 2024 are now available on YouTube.
Announcements to the mailing list are planned.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Understanding how crush works

2025-01-27 Thread Joshua Baergen
Hey Andre,

Clients actually have access to more information than just the
crushmap, which includes temporary PG mappings generated when a
backfill is pending, as well as upmap items which override CRUSH's
placement decision. You can see these in "ceph osd dump", for example.

Josh

On Mon, Jan 27, 2025 at 6:00 AM Andre Tann  wrote:
>
> Hi list,
>
> I have a problem understanding on how crush works when the crush map
> changes.
>
> Let's take a pool with some data in it, and a crush map that enables a
> client to calculate itself where a particular chunk is stored.
>
> Now we add more OSDs, which means, the crush map changes. Now most
> objects are misplaced, given the new crush map.
>
> If the clients wants a particular chunk, it takes the modified map, but
> as the chunk is misplaced, it won't find it where the crush algorithm
> points to.
>
> How can the client know which crush map to consider when doing the
> calculation?
> Do the clients keep several versions of the map, and try them one after
> the other?
>
> Thanks for some hints on this.
> --
> Andre Tann
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Seeking Participation! Take the new Ceph User Stores Survey!

2025-01-27 Thread Marc
> https://docs.google.com/forms/d/e/1FAIpQLSe66NedXh4gHLgk9G45eqP5V2wHlz4I
> KqRmGUJ074peaTGNKQ/viewform?usp=sf_link
> 

FYI I have stuff in polish, and no language switching...
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Seeking Participation! Take the new Ceph User Stores Survey!

2025-01-27 Thread Laura Flores
Hi Marc,

Can you clarify what you mean here? Is there a problem with the survey's
language setting? Not seeing anything wrong on my end, but if there is, I'd
appreciate it if someone can confirm.

Thanks,
Laura

On Mon, Jan 27, 2025 at 5:02 PM Marc  wrote:

> > https://docs.google.com/forms/d/e/1FAIpQLSe66NedXh4gHLgk9G45eqP5V2wHlz4I
> > KqRmGUJ074peaTGNKQ/viewform?usp=sf_link
> >
>
> FYI I have stuff in polish, and no language switching...
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Seeking Participation! Take the new Ceph User Stores Survey!

2025-01-27 Thread Marc


> 
> Can you clarify what you mean here? Is there a problem with the survey's
> language setting? Not seeing anything wrong on my end, but if there is,
> I'd appreciate it if someone can confirm.
> 

I guess it is some bug in this form no idea. Not sure if attachments are 
stripped here in the mailing list.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Seeking Participation! Take the new Ceph User Stores Survey!

2025-01-27 Thread Marc
I am quite sure it is not my browser.

> 
> I would suggest checking your browser's language settings. I have gotten
> confirmation from others that the form is working okay.
> 
>   >
>   > Can you clarify what you mean here? Is there a problem with the
> survey's
>   > language setting? Not seeing anything wrong on my end, but if
> there is,
>   > I'd appreciate it if someone can confirm.
>   >
> 
>   I guess it is some bug in this form no idea. Not sure if
> attachments are stripped here in the mailing list.
> 
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Slow initial boot of OSDs in large cluster with unclean state

2025-01-27 Thread Gregory Orange
On 24/1/25 06:45, Stillwell, Bryan wrote:
> ceph report 2>/dev/null | jq '(.osdmap_last_committed -
> .osdmap_first_committed)'
> 
> This number should be between 500-1000 on a healthy cluster.  I've seen
> this as high as 4.8 million before (roughly 50% of the data stored on
> the cluster ended up being osdmaps!)

Yes, ours is and has been healthy for a while... but we didn't start
monitoring it until a few months ago, so it may relate to those slower
startups.

> This appears to be a bug that should be fixed in the latest releases of
> Ceph (Quincy 17.2.8 & Reef 18.2.4) based on this report:
> 
> https://tracker.ceph.com/issues/63883

Thanks, good to know! We'll get to 17.2.8 in the next couple of weeks,
then 18.x later this year.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Seeking Participation! Take the new Ceph User Stores Survey!

2025-01-27 Thread Marc


You should put this as a first

By completing this survey, I agree to be contacted by the Ceph User Council. *
Yes I agree.


> I would suggest checking your browser's language settings. I have gotten
> confirmation from others that the form is working okay.
> 
> Thanks,
> Laura
> 
> 
> 
> 
>   >
>   > Can you clarify what you mean here? Is there a problem with the
> survey's
>   > language setting? Not seeing anything wrong on my end, but if
> there is,
>   > I'd appreciate it if someone can confirm.
>   >
> 
>   I guess it is some bug in this form no idea. Not sure if
> attachments are stripped here in the mailing list.
> 
> 
> 
> 
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Seeking Participation! Take the new Ceph User Stores Survey!

2025-01-27 Thread Laura Flores
Hey Marc,

I would suggest checking your browser's language settings. I have gotten
confirmation from others that the form is working okay.

Thanks,
Laura

On Mon, Jan 27, 2025 at 5:46 PM Marc  wrote:

>
> >
> > Can you clarify what you mean here? Is there a problem with the survey's
> > language setting? Not seeing anything wrong on my end, but if there is,
> > I'd appreciate it if someone can confirm.
> >
>
> I guess it is some bug in this form no idea. Not sure if attachments are
> stripped here in the mailing list.
>


-- 

Laura Flores

She/Her/Hers

Software Engineer, Ceph Storage 

Chicago, IL

lflo...@ibm.com | lflo...@redhat.com 
M: +17087388804
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Mix NVME's in a single cluster

2025-01-27 Thread Anthony D'Atri


> I reckon that balancing is by far the biggest issue you are
> likely to have because most Ceph releases (I do not know about
> Reef) have difficulty balancing across drives of different
> sizes even with configuration changes.

There were some bugs around Firefly-Hammer with failure domains having very 
different aggregate weights, but with some forethought I’m not aware of recent 
situations where CRUSH fails in this scenario.  The pg autoscaler and balancer 
module may have difficulty with complex CRUSH topology, but I doubt the OP has 
such, and if so the JJ Balancer is reputed to work well.

Now, it is *ideal* to have and OSD monoculture, but Ceph does pretty well with 
homogenity.

Say one is doing 3x replication and has 3 failure domains, to keep it simple 
we’ll say 3 hosts.  If those *host* CRUSH buckets have aggregate weights like 
100TB, 100TB, and 200TB, then the usable raw capacity will be 100TB †, because 
CRUSH has to place one copy of data on each host, and once the smaller two 
hosts are full, game over.

Now say the larger and smaller drives and thus OSDs are spread more or less 
evenly across the hosts, so that all three have ~133TB aggregate weights.  All 
raw capacity can be used.

This is one reason why it is advantageous when feasible to have at least one 
more failure domain than demanded by replication policy, so that Ceph can do 
the right thing.  Say we have 4 hosts now, 100TB, 100TB, 100TB, and 150TB, Ceph 
will be able to use most or all of the raw capacity.  With however 100TB, 
100TB, 100TB, and 1000TB, that massive variance in failure domain weight 
probably would prevent all of the heaviest failure domain from being usable.

Note that the OP describes 10 hosts, only half populated with 15TB OSDs today.  
With 10 hosts, the failure domain for CRUSH rules is most likely *host*, so 
adding 8x 30TB OSDs to each results in all failure domains being equal in 
weight.  Ceph will be able to use all of the raw capacity.

The larger OSDs (and thus their drives) will naturally receive approximately 
double the number of PGs compared to the smaller OSDs.  Thus those drives will 
be ~ twice as busy.  With NVMe that probably isn’t an issue, especially if the 
hosts are PCI Gen 4 or later, and adequate RAM is available.  The 30TB SSDs 
almost certainly are Gen 4 or later.


> * Assign different CRUSH weights. This configuration change is
>  "supposed" to work.

Short-stroking?  Sure it’ll work, but you’d waste 1.2PB of raw capacity, so 
that isn’t a great solution.

> * Assign the 30TB drives to a different class and use them for
>  new "pools".

Very possible, but probably not necessary, unless say one of the drive models 
is TLC and the other is QLC, in which case one may wish to segregate the 
workloads with pools.  

> 
> * Split each 30TB drive into two OSDs. Not a good idea for HDDs
>  of course but these are low latency SSDs.

You could do that.  If the number of OSDs and hosts were very low this might 
have a certain appeal - I’ve done that myself.  In the OP’s case, I think that 
wouldn’t accomplish much other than using more RAM.


> The main other problem with large capacity OSDs is the size of
> PGs, which can become very large with the default targets
> numbers of PGs, and a previous commenter mentioned that.

There are enough failure domains here that this wouldn’t be a showstopper, 
especially if pg_nums and/or the autoscaler’s target are raised to like 200-400.

> I think that the current configuration style where one sets the
> number of PGs rather than the size of PGs leads people astray.

Ceph places PGs based on CRUSH weight, so as long as pg_num for a given pool is 
a power of two, and there are a halfway decent number of OSDs — which in this 
case is true — the above strategies would seem roughly equivalent.

> In general my impression is that current Ceph defaults and its
> very design (a single level of grouping: PGs) were meant to be
> used with OSDs at most 1TB in size and larger OSDs are anyhow
> not a good idea,

I don’t follow, I know of no intrinsic issue with larger OSDs.  Were one to 
mix, say, 122TB OSDs and 1TB OSDs, or even 30TB OSDs and 1TB OSDs the imbalance 
could be detrimental to performance and one would need to pay close attention 
to the aforementioned overdose guardrails.  

> but of course there are many people who know
> better, and good luck to them.
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io


 † modulo base 2 vs 10, backfill/full ratios, etc.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: link to grafana dashboard with osd / host % usage

2025-01-27 Thread Afreen Misbah
cc @Ankush Behl  @Aashish Sharma 
 ^^^

On Tue, Jan 28, 2025 at 12:57 AM Marc  wrote:

>
> Is there an existing grafana dashboard/panel that sort of shows the % used
> on disks and hosts?
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
>
>

-- 

Afreen
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Grafana certificates storage and Squid

2025-01-27 Thread Thorsten Fuchs
We recently migrated our cluster from 18.2.4 to 19.2.0 and started having
issues with Grafana.

Ceph gives out the warning "CEPHADM_CERT_ERROR: Invalid grafana certificate on
host cc-1: Invalid certificate key: [('PEM routines', '', 'no start line')].

Looking at the certificates they contain a line '# generated by cephadm' and
are not the certificates we stored in the config, e.g.
'mgr/cephadm/cc-1/grafana_crt'.

After some investigation we found that there was a change to how certificates
are stored. Yet I could find no documentation on how to setup per hosts
certificates with the new way.

The commit that changed the certificate store is
https://github.com/ceph/ceph/commit/bb7e715320e41f5d6b6291769e2b6d230eec74cc

Maybe anyone can point us in the right direction on how to get our own certs
back into Grafana.

--

Thorsten Fuchs

abaut GmbH
Agnes-Pockels-Bogen 1, 80992 München



___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io


[ceph-users] Re: Grafana certificates storage and Squid

2025-01-27 Thread Laimis Juzeliūnas
HI all,

Just adding weight - experiencing the same scenario with Squid.
We just set empty certs manually to get things working.

Best,
Laimis J.

> On 28 Jan 2025, at 09:00, Thorsten Fuchs  wrote:
> 
> We recently migrated our cluster from 18.2.4 to 19.2.0 and started having
> issues with Grafana.
> 
> Ceph gives out the warning "CEPHADM_CERT_ERROR: Invalid grafana certificate on
> host cc-1: Invalid certificate key: [('PEM routines', '', 'no start line')].
> 
> Looking at the certificates they contain a line '# generated by cephadm' and
> are not the certificates we stored in the config, e.g.
> 'mgr/cephadm/cc-1/grafana_crt'.
> 
> After some investigation we found that there was a change to how certificates
> are stored. Yet I could find no documentation on how to setup per hosts
> certificates with the new way.
> 
> The commit that changed the certificate store is
> https://www.google.com/url?q=https://github.com/ceph/ceph/commit/bb7e715320e41f5d6b6291769e2b6d230eec74cc&source=gmail-imap&ust=173865254700&usg=AOvVaw2apVqgTVMBaczT2BSPimg_
> 
> Maybe anyone can point us in the right direction on how to get our own certs
> back into Grafana.
> 
> --
> 
> Thorsten Fuchs
> 
> abaut GmbH
> Agnes-Pockels-Bogen 1, 80992 München
> 
> 
> 
> ___
> ceph-users mailing list -- ceph-users@ceph.io
> To unsubscribe send an email to ceph-users-le...@ceph.io
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io