[ceph-users] Re: unmatched rstat rbytes on single dirfrag

2025-01-25 Thread Frank Schilder
to a "[WRN]" event to reduce unwarranted panic amongst admins. My suspicion is that we have here a similar situation, these messages might better be "[WRN]" or even just "[DBG]". By the way, I have these on more than one rank, so it is probably not a fall-out of the rec

[ceph-users] Re: unmatched rstat rbytes on single dirfrag

2025-01-24 Thread Frank Schilder
rward scrub for a while. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Friday, January 24, 2025 11:40 PM To: ceph-users@ceph.io Subject: [ceph-users] Re: unmatched rstat rbytes on single dirfrag

[ceph-users] unmatched rstat rbytes on single dirfrag

2025-01-24 Thread Frank Schilder
, inode has n(v194 rc2038-01-07T22:42:17.00+0100 b46230814357 2802=913+1889), directory fragments have n(v0 rc2032-04-29T16:39:38.00+0200 b46204251164 428=9+419) How critical are these? Everything seems to work normal. Best regards, = Frank Schilder AIT Risø Campus

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-23 Thread Frank Schilder
forward-scrub purge behaves the same). Our cluster managed to purge about 10K items/s and after a few hours everything was cleaned out. While purging it was serving client IO, so the FS is up right away. A big thank you to everyone who helped with this case. Best regards, ===== Fra

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-23 Thread Frank Schilder
forward-scrub purge behaves the same). Our cluster managed to purge about 10K items/s and after a few hours everything was cleaned out. While purging it was serving client IO, so the FS is up right away. A big thank you to everyone who helped with this case. Best regards, ===== Fra

[ceph-users] Re: Emergency support request for ceph MDS trouble shooting

2025-01-20 Thread Frank Schilder
Hi all, the job is taken. Thanks to anyone considering. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Monday, January 20, 2025 11:53 AM To: ceph-users@ceph.io Subject: [ceph-users

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
> which is 3758096384. I'm not even sure what the unit is, probably bytes? Sorry, it is bytes. Our items are about 100b on average, that's how we observe approximately 37462448 executions of purge_stale_snap_data until the queue is filled up. Best regards, ===== Frank

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
> which is 3758096384. I'm not even sure what the unit is, probably bytes? As far as I understand the unit is "list items". They can have variable length. On our system about 400G are allocated while filling up the bufferlist. Best regards, ===== Frank Schild

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
erent scalings if required. The class Throttle does have a reset_max method, but I'm not sure if it is called anywhere and if it is possible to call it and change the max at runtime via things line "ceph daemon" or "ceph tell" in some way. Best regard

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
companies offering emergency support and hope this can be fixed with reasonable effort and time. Thanks for your help! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Monday, January 20, 2025 12:40 PM To

[ceph-users] Emergency support request for ceph MDS trouble shooting

2025-01-20 Thread Frank Schilder
.html Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-20 Thread Frank Schilder
uot;: 566691, "wait": { "avgcount": 0, "sum": 0.0, "avgtime": 0.0 } } } You might be on to something, we are also trying to find where this limit comes from. Please keep us posted. Best regards,

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-19 Thread Frank Schilder
naler::write_buffer (https://github.com/ceph/ceph/blob/pacific/src/osdc/Journaler.h#L306) in the class definition of class Journaler? Increasing this limit should get us past the deadlock. Note that all the relevant code is identical to branch main, which means that all versions since pacific

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-19 Thread Frank Schilder
eph/blob/pacific/src/osdc/Journaler.h#L306) in the class definition of class Journaler? Increasing this limit should get us past the deadlock. Thanks for your help and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From:

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-18 Thread Frank Schilder
48.132:0/2119497078 conn(0x55e0f3d3e400 0x55e0f3d34000 :6801 s=OPENED pgs=828965 cs=1 l=0).handle_message process tag 14 Best regards and have a good weekend. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ce

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-17 Thread Frank Schilder
mds_recall_max_caps 32768 mds advancedmds_session_blocklist_on_timeoutfalse Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Bailey Allison Sent: Thursday, January 16, 2025

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-16 Thread Frank Schilder
regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: MDS hung in purge_stale_snap_data after populating cache

2025-01-16 Thread Frank Schilder
, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Ceph symbols for v15_2_0 in pacific libceph-common

2025-01-15 Thread Frank Schilder
lib64/ceph/libceph-common.so.2 Why do we see v15 symbols here or am I interpreting the symbol name ceph::buffer::v15_2_0::list::iterator_impl::copy incorrectly? Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___

[ceph-users] Re: MDS crashing on startup

2025-01-15 Thread Frank Schilder
st regards. ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-15 Thread Frank Schilder
.io/thread/XLYHTZBD4AXNPFZOLG7NG24Z4DWXIARG/). Please take a look at this post. Thanks and best regards! ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frédéric Nass Sent: Tuesday, January 14, 2025 8:25 AM To: Frank Schilder;

[ceph-users] MDS hung in purge_stale_snap_data after populating cache

2025-01-15 Thread Frank Schilder
ue what is happening and how we can get out of it. Putting my hopes on tomorrow after sleeping a bit (well, at least trying to). Thanks for all your help with this critical situation. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 10

[ceph-users] Re: MDS crashing on startup

2025-01-14 Thread Frank Schilder
Hi Dan, forget what I wrote. I forgot the "-a" option for ulimit. Its still limited to 1024. I'm too tired to start a new test now. I will report back tomorrow afternoon/evening. Thanks for your hint and sorry for the many mails. ===== Frank Schilder AIT Risø Cam

[ceph-users] Re: MDS crashing on startup

2025-01-14 Thread Frank Schilder
executable, or `objdump -rdS ` is needed to interpret this. Are there other places where abort is called? Could it be a signal from another process? Thanks for helping! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users

[ceph-users] Re: MDS crashing on startup

2025-01-14 Thread Frank Schilder
, I misunderstood your message. You want to know what the MDS is doing right when it gets hung. It does have CPU load at this time and beyond so it should be visible in the backtraces. Thanks again and hopefully something more interesting tomorrow! = Frank Schil

[ceph-users] MDS crashing on startup

2025-01-14 Thread Frank Schilder
the MDS we followed the instructions for manual start here: https://docs.ceph.com/en/pacific/install/manual-deployment/#adding-mds . Thanks for any pointers! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-13 Thread Frank Schilder
for package hints and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-11 Thread Frank Schilder
on it should just work after adding swap. I wonder what is so special about our case. Thanks for your input and have a good night! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Saturday, January 11, 2025

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-11 Thread Frank Schilder
Hi Eugen, thanks and yes, let's try one thing at a time. I will report back. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Saturday, January 11, 2025 10:39 PM To: Frank Schilder Cc:

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-11 Thread Frank Schilder
both values in the next attempt. Otherwise, I will just increase beacon down grace. Thanks a lot again and have a nice Sunday! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Saturday, January 11, 2025 7:59 PM To:

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-11 Thread Frank Schilder
any MDS/ceph-fs related timeout with a 60s default somewhere? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Saturday, January 11, 2025 12:46 PM To: Dan van der Ster Cc: Bailey Allison; ceph

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-11 Thread Frank Schilder
cephx: verify_authorizer could not get service secret for service mds secret_id=51092 Looks like the auth key for the MDS expired and cannot be renewed. Is there a grace period for that as well? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, ru

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-11 Thread Frank Schilder
regards! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Saturday, January 11, 2025 2:36 AM To: Dan van der Ster Cc: Bailey Allison; ceph-users@ceph.io Subject: [ceph-users] Re: Help needed, ceph fs down due to

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-10 Thread Frank Schilder
he MDS idle yet unresponsive". Thanks for your help so far! Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: Saturday, January 11, 2025 3:04 AM To: Frank Schilder Cc: Bailey Allison; ceph

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-10 Thread Frank Schilder
there has to be another way. I would be really grateful for any help regarding getting he system in a stable state for further trouble shooting. I would really block all client access to the fs. In addition, any hints as to how to get the MDS stay in the s

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-10 Thread Frank Schilder
oceed. Thanks so far and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Bailey Allison Sent: Friday, January 10, 2025 10:23 PM To: Frank Schilder; ceph-users@ceph.io Subject: Re: [ceph-users] Re: Help needed, ceph f

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-10 Thread Frank Schilder
up to the point where it actually reports back a number for the stray items. However, some time after it becomes unresponsive and the heartbeat messages start showing up. I don't know how to get past this point. Best regards, ===== Frank Schilder AIT Ri

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-10 Thread Frank Schilder
9 ceph-10 ceph-23 ceph-17 MDS version: ceph version 16.2.15 (618f440892089921c3e944a991122ddc44e60516) pacific (stable) Did it mark itself out of the cluster and is waiting for the MON to fail it?? Please help. Best regards, ===== Frank Schilder AIT Risø Campus Bygni

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-10 Thread Frank Schilder
2 Skipping beacon heartbeat to monitors (last acked 156.027s ago); MDS internal heartbeat is not healthy! I hope it doesn't get failed by some king of timeout now. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From:

[ceph-users] Re: Help needed, ceph fs down due to large stray dir

2025-01-10 Thread Frank Schilder
might try with RAM only first (we have 512G machines, just need to stop the OSDs on the server). I will report back what happens. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Patrick Donnelly Sent: Friday,

[ceph-users] Help needed, ceph fs down due to large stray dir

2025-01-10 Thread Frank Schilder
to another rank? Currently, the rank in question reports .mds_cache.num_strays=0 in perf dump. ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to cep

[ceph-users] Re: MDSs report oversized cache during forward scrub

2025-01-10 Thread Frank Schilder
A further observation: after restarting rank 6 due to oversized cache, rank 6 is no longer shown in the task list of ceph status below. Is an instruction for scrub not sticky to the rank or is the status output incorrect? Best regards, = Frank Schilder AIT Risø Campus Bygning

[ceph-users] MDSs report oversized cache during forward scrub

2025-01-10 Thread Frank Schilder
ere a workaround apart from restarting MDSes all the time? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] MDSs report oversized cache during forward scrub

2025-01-10 Thread Frank Schilder
ere a workaround apart from restarting MDSes all the time? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: ceph tell throws WARN: the service id you provided does not exist.

2025-01-10 Thread Frank Schilder
"scrub_tag": "3b5e0bbc-1c8a-4896-bbae-f0b2902f599b", "mode": "asynchronous" } Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Schilder Sent: Friday, January 10, 2

[ceph-users] ceph tell throws WARN: the service id you provided does not exist.

2025-01-10 Thread Frank Schilder
though. We are on the latest pacific version. Is this expected or do we need to configure something? Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To uns

[ceph-users] Re: Random ephemeral pinning, what happens to sub-tree under pin root dir

2025-01-09 Thread Frank Schilder
stant random access loads? Kind of any information that would help choosing a reasonable probability value for our home-dir sizes. The practical result of random pinning is kind of unintuitive and it would be great to have some examples with stats. Thanks and best regards, = F

[ceph-users] Random ephemeral pinning, what happens to sub-tree under pin root dir

2024-12-13 Thread Frank Schilder
or not. Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: failed to load OSD map for epoch 2898146, got 0 bytes

2024-12-13 Thread Frank Schilder
s not exist. Creating a new epoch. [root@ceph-adm:ceph-13 ~]# ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-1004/ --op set-osdmap --file osd.map --force Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___

[ceph-users] Re: constant increase in osdmap epoch

2024-12-13 Thread Frank Schilder
the new osdmap having a diff. Not sure if this is intentional or a bug. I will add a note about this side effect to the script. A second reason for epoch increase are daily snapshots on our rbd images. That may or may not be intentional. Best regards, = Frank Schilder AIT Risø

[ceph-users] Dashboard redirection changed after upgrade octopus to pacific

2024-12-13 Thread Frank Schilder
octopus do that? Was it reverse DNS lookup and it was just on by default? If not, why is the simple "use result of hostname -f" that does not require DNS not an option? Thanks for answers to either or both of these questions! Best regards, = Frank

[ceph-users] How to list pg-upmap-items

2024-12-12 Thread Frank Schilder
.narkive.com/h7y24SDg/stale-pg-upmap-items-entries-after-pg-increase and https://gitlab.cern.ch/ceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py#L102). However, a good API is always symmetric to make it *easy* for users to check and fix screw-ups. Thanks and best regards, ==

[ceph-users] admin_socket: exception getting command descriptions: [Errno 2] No such file or directory

2024-12-12 Thread Frank Schilder
ewer versions as well. I wonder if this is a fall-out of https://docs.ceph.com/en/latest/releases/pacific/#id39 Point 3: "$pid expansion in config paths like admin_socket will now properly expand to the daemon pid for commands like ceph-mds or ceph-osd. Previously only ceph-fuse/rbd-nbd expanded $pid

[ceph-users] Re: failed to load OSD map for epoch 2898146, got 0 bytes

2024-12-01 Thread Frank Schilder
Hi Dan, I need to bring the affected OSDs up this week. Would be great if you could take a look at this case or let me know if you don't have time. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Frank Sch

[ceph-users] Re: [CephFS] Completely exclude some MDS rank from directory processing

2024-11-21 Thread Frank Schilder
standby daemons configured. Bets regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Thursday, November 21, 2024 3:55 PM To: Александр Руденко Cc: ceph-users@ceph.io Subject: [ceph-users] Re: [CephFS

[ceph-users] Re: [CephFS] Completely exclude some MDS rank from directory processing

2024-11-21 Thread Frank Schilder
will stop waiting for some blocking request to the unhealthy MDS. There seems to be no such thing as IO on other healthy MDSes continues as usual. Specifically rank 0 is critical. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Crush rule examples

2024-11-20 Thread Frank Schilder
. Hand-crafted crush rules for this purpose require 3 or more DCs. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Janne Johansson Sent: Wednesday, November 20, 2024 11:30 AM To: Andre Tann Cc: ceph-users@ceph.io

[ceph-users] Re: constant increase in osdmap epoch

2024-11-18 Thread Frank Schilder
200 epoch changes per day, how much changes do you see? I guess you could play with the pruning configs (many of the defaults don't necessarily fit a production load), but I would first try to find out what exactly is causing them. Regards, Eugen Zitat von Frank Schilder : > Hi all, >

[ceph-users] constant increase in osdmap epoch

2024-11-18 Thread Frank Schilder
s intentional. I don't see this redundant pgp_num_actual setting by the mgrs reported here: https://tracker.ceph.com/issues/51433 . I can't find a resolution anywhere. Any help would be very much appreciated. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: Ceph Octopus packages missing at download.ceph.com

2024-11-14 Thread Frank Schilder
These mirrors will sync very soon and delete the tree as well. This needs to be fixed on the ceph repo side. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Ben Zieglmeier Sent: Thursday, November 14, 2024 1:50

[ceph-users] Re: Ceph Octopus packages missing at download.ceph.com

2024-11-14 Thread Frank Schilder
Hi all, +1 from me this is a really bad issue. We need access to these packages very soon. Please restore this folder. In the meantime, is there a mirror somewhere? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

[ceph-users] Re: failed to load OSD map for epoch 2898146, got 0 bytes

2024-11-08 Thread Frank Schilder
dm_container or just restart the adm-container) or can I ignore it? I guess in the future we need to execute an "osd deactivate" or "umount" to clean this up after OSD creation. Thanks for your help! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 __

[ceph-users] Re: Procedure for temporary evacuation and replacement

2024-10-28 Thread Frank Schilder
of data. Instead of just reading shard by shard from the out-OSDs, shards should also be reconstructed by recovery from all other OSDs. Our evacuation lasted for about 2 weeks. If recovery would kick in, this time would go down to 2-3 days. Best regards, ===== Frank Schilder AIT Risø Ca

[ceph-users] Re: pgs not deep-scrubbed in time and pgs not scrubbed in time

2024-10-25 Thread Frank Schilder
Hi, you might want to take a look here: https://github.com/frans42/ceph-goodies/blob/main/doc/TuningScrub.md Don't set max_scrubs > 1 on HDD OSDs, you will almost certainly regret it like I did. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109,

[ceph-users] Re: failed to load OSD map for epoch 2898146, got 0 bytes

2024-10-22 Thread Frank Schilder
xury of all data being healthy and we have a chance to experiment without any risk. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: Tuesday, October 22, 2024 12:04 AM To: Frank Schilder Cc: ceph

[ceph-users] Re: failed to load OSD map for epoch 2898146, got 0 bytes

2024-10-21 Thread Frank Schilder
If absolutely necessary, I could start the OSD manually with logging to disk disabled. Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: Monday, October 21, 2024 9:03 PM To: Fr

[ceph-users] failed to load OSD map for epoch 2898146, got 0 bytes

2024-10-21 Thread Frank Schilder
7fad651ca700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1729525300521403, "cf_name": "default", "job": 12, "event": "table_file_creation", "file_number": 82880, "file_size": 67868330, "table_properties": {"data_size

[ceph-users] Re: Procedure for temporary evacuation and replacement

2024-10-18 Thread Frank Schilder
other operations as well? Essentially: can I leave it at its current value or should I reset it to default? Thanks and best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Joshua Baergen Sent: Thursday, October 17, 2024 3:

[ceph-users] Re: Procedure for temporary evacuation and replacement

2024-10-17 Thread Frank Schilder
we have this up - down - up in fast succession. I don't want to play with heartbeat graces etc as the cluster should still respond normally to actual fails. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: W

[ceph-users] Re: Procedure for temporary evacuation and replacement

2024-10-17 Thread Frank Schilder
regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Frank Schilder Sent: Friday, October 11, 2024 12:18 PM To: Robert Sander; ceph-users@ceph.io Subject: [ceph-users] Re: Procedure for temporary evacuation and replacement Hi Rob

[ceph-users] Re: Reef osd_memory_target and swapping

2024-10-16 Thread Frank Schilder
titions on disk for emergency cases. We have swap off by default to avoid the memory "leak" issue and we actually have sufficient RAM to begin with - maybe that's a bit of a luxury. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 _

[ceph-users] Re: Procedure for temporary evacuation and replacement

2024-10-11 Thread Frank Schilder
Hi Robert, thanks, that solves it then. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Robert Sander Sent: Friday, October 11, 2024 10:20 AM To: ceph-users@ceph.io Subject: [ceph-users] Re: Procedure for

[ceph-users] Re: Procedure for temporary evacuation and replacement

2024-10-11 Thread Frank Schilder
was a config option that can prevent that (or a different osd out command that uses an ID). If you remember something like that, please let me know. If I find it, I will post it here. Thanks for your review! = Frank Schilder AIT Risø Campus Bygning 109, ru

[ceph-users] Re: Procedure for temporary evacuation and replacement

2024-10-10 Thread Frank Schilder
the first one won't. Does the OSD being formally UP+OUT make any difference compared with UP+IN for evacuation? My initial simplistic test says no, but I would like to be a bit more sure than that. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 _

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Frank Schilder
> But not, I suspect, nearly as many tentacles. No, that's the really annoying part. It just works. ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Anthony D'Atri Sent: Thursday, October 10, 2024 2:13 PM To: Fran

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Frank Schilder
0 per OSD fixed except that it was suitable for 1TB drives. - You also can't answer what will happen if one goes for 100-200 PGs per TB, meaning 1600-3200 PGs per 16TB drive. So my main question, the last one, is still looking for an answer. Thanks for your comment and best regards, ===

[ceph-users] Re: Erasure coding scheme 2+4 = good idea?

2024-10-10 Thread Frank Schilder
be up with a DC down. For example, it will make sure that with min_size=2 an ACK is only sent to a client if each DC has a shard. An ordinary crush rule will not do that. Stretch mode only works for replicated pools. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, ru

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Frank Schilder
of the cluster. That's how you get scale-out capability. A fixed PG count counteracts that with the insane increase of capacity per disk we have lately. That's why I actually lean towards that the recommendation was intended to keep PGs below 5-10G each (and or Sent: Thursday, October

[ceph-users] Procedure for temporary evacuation and replacement

2024-10-10 Thread Frank Schilder
after some time. I'm also wondering if UP+OUT OSDs participate in peering in case there is an OSD restart somewhere in the pool. Thanks for your input and best regards! ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-u

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Frank Schilder
any more. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Janne Johansson Sent: Thursday, October 10, 2024 8:51 AM To: Frank Schilder Cc: Anthony D'Atri; ceph-users@ceph.io Subject: Re: [ceph-users] Re: What is the probl

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Frank Schilder
That I would vaguely understand: to keep the average PG size constant at a max of about 10G. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Anthony D'Atri Sent: Wednesday, October 9, 2024 3:52 PM To: Frank Sc

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Frank Schilder
hould expect. None of the discussions I have seen so far address this extreme weirdness of the recommendation. If there is an unsolved scaling problem, please anyone state what it is, why its there and what the critical threshold is. What part of the code will explode? Thanks and best regards, ===

[ceph-users] Re: Forced upgrade OSD from Luminous to Pacific

2024-10-09 Thread Frank Schilder
ng and OSD logs. Maybe they are corrupted? Do they manage to read the rocksdb and get to the state where they try to join the cluster? Do they crash? You can start an OSD daemon manually to see he complete startup log live in a terminal. Best regards, ===== Frank Schilder AIT Risø Cam

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Frank Schilder
that a dev drops by and can comment on that with background from the implementation. I just won't be satisfied with speculation this time around and will keep bugging. Thanks and best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 _

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Frank Schilder
nd best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Anthony D'Atri Sent: Wednesday, October 9, 2024 2:40 AM To: Frank Schilder Cc: ceph-users@ceph.io Subject: Re: [ceph-users] What is the problem with many PGs pe

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Frank Schilder
ou have performance metrics before/after? Did you actually observe any performance degradation? Was there an increased memory consumption? Anything that justifies making a statement alluding to (potential) negative performance impact? Thanks and best regards, = Frank Schilder AI

[ceph-users] What is the problem with many PGs per OSD

2024-10-08 Thread Frank Schilder
SD to large values>500? Thanks a lot for any clarifications in this matter! = Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [Ceph incident] PG stuck in peering.

2024-09-26 Thread Frank Schilder
ry for the confusion and hopefully our experience reports here help other users. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to cep

[ceph-users] Re: [Ceph incident] PG stuck in peering.

2024-09-23 Thread Frank Schilder
g to avoid data loss on other PGs)." I hope you mean "waited for recovery" or what does a wipe here mean. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: HARROUIN Loan (PRESTATAIRE CA-GIP) Sent:

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-17 Thread Frank Schilder
r 2-5 minutes. NTP shouldn't take much time to come up under normal circumstances. I'm not a systemd wizard. If you do something like this, please post it here as a reply for others to find it. Best regards, ===== Frank Schilder AIT Risø Campus Bygni

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Frank Schilder
case if this happens again. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Amudhan P Sent: Monday, September 16, 2024 6:19 PM To: Frank Schilder Cc: Eugen Block; ceph-users@ceph.io Subject: Re: [ceph-users] Re: Ceph

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Frank Schilder
l log files to be written. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Amudhan P Sent: Monday, September 16, 2024 12:18 PM To: Frank Schilder Cc: Eugen Block; ceph-users@ceph.io Subject: Re: [ceph-users] Re: C

[ceph-users] Re: Ceph octopus version cluster not starting

2024-09-16 Thread Frank Schilder
lpful. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Amudhan P Sent: Monday, September 16, 2024 10:36 AM To: Eugen Block Cc: ceph-users@ceph.io Subject: [ceph-users] Re: Ceph octopus version cluster not starting No, I don'

[ceph-users] Re: Successfully using dm-cache

2024-09-12 Thread Frank Schilder
gards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Michael Lipp Sent: Wednesday, January 31, 2024 6:23 PM To: ceph-users@ceph.io Subject: [ceph-users] Successfully using dm-cache Just in case anybody is interested: Using dm-

[ceph-users] Re: Identify laggy PGs

2024-08-15 Thread Frank Schilder
ing this to 300PGs/OSD due to excessively long deep-scrub times per PG. Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Szabo, Istvan (Agoda) Sent: Wednesday, August 14, 2024 12:00 PM To: Eugen Block; ceph-us

[ceph-users] Re: Bluestore issue using 18.2.2

2024-08-14 Thread Frank Schilder
have damage). Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Wednesday, August 14, 2024 9:05 AM To: ceph-users@ceph.io Subject: [ceph-users] Re: Bluestore issue using 18.2.2 Hi, it looks like y

[ceph-users] Re: 0 slow ops message stuck for down+out OSD

2024-07-29 Thread Frank Schilder
> Hi, would a mgr restart fix that? It did! The one thing we didn't try last time. We thought the message was stuck in the MONs. Thanks! ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Monday, July

[ceph-users] Re: snaptrim not making progress

2024-07-29 Thread Frank Schilder
bout it as well. However, this was at 9:30am but the snaptrim was hanging since 3am. Is there any event with an OSD/disk that can cause snaptrim to stall yet there is no health issue detected/reported? Thanks for any pointers! ===== Frank Schilder AIT Risø Campus Bygning 10

[ceph-users] Re: snaptrim not making progress

2024-07-29 Thread Frank Schilder
48.11:0/2422413806 client.370420944 cookie=140578306156832 Best regards, ===== Frank Schilder AIT Risø Campus Bygning 109, rum S14 ____ From: Frank Schilder Sent: Monday, July 29, 2024 10:24 AM To: ceph-users@ceph.io Subject: [ceph-users] snaptrim not maki

[ceph-users] Re: 0 slow ops message stuck for down+out OSD

2024-07-29 Thread Frank Schilder
Very funny, it was actually me who made this case some time ago: https://www.mail-archive.com/ceph-users@ceph.io/msg10095.html I will look into what we did last time. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14

  1   2   3   4   5   6   7   8   9   >