hi,
faced with a strange problem, one of multiple mds keeps in resolve state. following logs repeat ``` ... 2025-08-29 20:01:39.582 7f8656fe3700 1 mds.ceph-prod-60 Updating MDS map to version 1655087 from mon.3 2025-08-29 20:01:39.582 7f8656fe3700 10 mds.ceph-prod-60 my compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,7=mds uses inline data,8=no anchor table,9=file layout v2,10=snaprealm v2} 2025-08-29 20:01:39.582 7f8656fe3700 10 mds.ceph-prod-60 mdsmap compat compat={},rocompat={},incompat={1=base v0.20,2=client writeable ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout v2,10=snaprealm v2} 2025-08-29 20:01:39.582 7f8656fe3700 10 mds.ceph-prod-60 my gid is 91681643 2025-08-29 20:01:39.582 7f8656fe3700 10 mds.ceph-prod-60 map says I am mds.12.1646436 state up:resolve 2025-08-29 20:01:39.582 7f8656fe3700 10 mds.ceph-prod-60 msgr says I am [v2:7.33.104.23:6988/1418417955,v1:7.33.104.23:6989/1418417955] 2025-08-29 20:01:39.582 7f8656fe3700 10 mds.ceph-prod-60 handle_mds_map: handling map as rank 12 2025-08-29 20:01:39.606 7f86527da700 10 mds.12.cache cache not ready for trimming 2025-08-29 20:01:40.606 7f86527da700 10 mds.12.cache cache not ready for trimming 2025-08-29 20:01:40.698 7f86547de700 5 mds.beacon.ceph-prod-60 Sending beacon up:resolve seq 6940 2025-08-29 20:01:40.698 7f86597e8700 5 mds.beacon.ceph-prod-60 received beacon reply up:resolve seq 6940 rtt 0 2025-08-29 20:01:41.606 7f86527da700 10 mds.12.cache cache not ready for trimming ... ``` mds status look like ``` ========= +------+---------+---------------------------+---------------+-------+-------+ | Rank | State | MDS | Activity | dns | inos | +------+---------+---------------------------+---------------+-------+-------+ | 0 | active | ceph-prod-45 | Reqs: 0 /s | 113k | 112k | | 1 | active | ceph-prod-46 | Reqs: 0 /s | 114k | 113k | | 2 | active | ceph-prod-47 | Reqs: 50 /s | 3967k | 3924k | | 3 | active | ceph-prod-10 | Reqs: 14 /s | 2402k | 2391k | | 4 | active | ceph-prod-02 | Reqs: 0 /s | 31.6k | 27.0k | | 5 | active | ceph-prod-48 | Reqs: 0 /s | 357k | 356k | | 6 | active | ceph-prod-11 | Reqs: 0 /s | 1144k | 1144k | | 7 | active | ceph-prod-57 | Reqs: 0 /s | 168k | 168k | | 8 | active | ceph-prod-44 | Reqs: 30 /s | 5007k | 5007k | | 9 | active | ceph-prod-20 | Reqs: 0 /s | 195k | 195k | | 10 | active | ceph-prod-43 | Reqs: 0 /s | 1757k | 1750k | | 11 | active | ceph-prod-01 | Reqs: 0 /s | 2879k | 2849k | | 12 | resolve | ceph-prod-60 | | 652 | 655 | | 13 | active | fuxi-aliyun-ceph-res-tmp3 | Reqs: 0 /s | 79.9k | 59.6k | +------+---------+---------------------------+---------------+-------+-------+ +-----------------+----------+-------+-------+ | Pool | type | used | avail | +-----------------+----------+-------+-------+ | cephfs_metadata | metadata | 2110G | 2738G | | cephfs_data | data | 1457T | 205T | +-----------------+----------+-------+-------+ ``` can anyone help? thanks. ---- Replied Message ---- | From | <ceph-users-requ...@ceph.io> | | Date | 8/28/2025 16:54 | | To | <ceph-users@ceph.io> | | Subject | ceph-users Digest, Vol 134, Issue 88 | Send ceph-users mailing list submissions to ceph-users@ceph.io To subscribe or unsubscribe via email, send a message with subject or body 'help' to ceph-users-requ...@ceph.io You can reach the person managing the list at ceph-users-ow...@ceph.io When replying, please edit your Subject line so it is more specific than "Re: Contents of ceph-users digest..." Today's Topics: 1. Re: OSD crc errors: Faulty SSD? (Anthony D'Atri) 2. Re: OSD crc errors: Faulty SSD? (Igor Fedotov) 3. Debian Packages for Trixie (Andrew) ---------------------------------------------------------------------- Date: Wed, 27 Aug 2025 10:08:16 -0400 From: Anthony D'Atri <anthony.da...@gmail.com> Subject: [ceph-users] Re: OSD crc errors: Faulty SSD? To: Roland Giesler <rol...@giesler.za.net> Cc: ceph-users@ceph.io Message-ID: <400ec8c4-0b0d-4fee-adb5-1f355e9ab...@gmail.com> Content-Type: text/plain; charset=utf-8 Please: * Identify the SSD involved * Look for messages in `dmesg` and `/var/log/{syslog/messages}` for that device * `smartctl -a /dev/xxxx` * If you you got the device from Dell or HP, look for a firmware update. Are you setting non-default RocksDB options? On Aug 27, 2025, at 10:01 AM, Roland Giesler <rol...@giesler.za.net> wrote: I have relatively new Samsung Enterprise NVMe in a node that is generating the following error: 2025-08-26T15:56:43.870+0200 7fe8ac968700 0 bad crc in data 3326000616 != exp 1246001655 fromv1:192.168.131.4:0/1799093090 2025-08-26T16:03:54.757+0200 7fe8ad96a700 0 bad crc in data 3195468789 != exp 4291467912 fromv1:192.168.131.3:0/315398791 2025-08-26T16:17:34.160+0200 7fe8ad96a700 0 bad crc in data 1471079732 != exp 1408597599 fromv1:192.168.131.3:0/315398791 2025-08-26T16:33:34.035+0200 7fe8ad96a700 0 bad crc in data 724234454 != exp 3110238891 fromv1:192.168.131.3:0/315398791 2025-08-26T16:36:34.265+0200 7fe8ad96a700 0 bad crc in data 96649884 != exp 3724606899 fromv1:192.168.131.3:0/315398791 2025-08-26T16:40:34.395+0200 7fe8ad96a700 0 bad crc in data 1554359919 != exp 1420125995 fromv1:192.168.131.3:0/315398791 2025-08-26T16:54:18.323+0200 7fe8ad169700 0 bad crc in data 362320144 != exp 1850249930 fromv1:192.168.131.1:0/1316652062 This is ceph osd.40. More details from the log: 2025-08-26T17:00:15.013+0200 7fe8a06fe700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1756220415016487, "job": 447787, "event": "table_file_deletion", "file_number": 389377} 2025-08-26T17:00:15.013+0200 7fe8a06fe700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1756220415016839, "job": 447787, "event": "table_file_deletion", "file_number": 389359} 2025-08-26T17:00:15.013+0200 7fe8a06fe700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1756220415017245, "job": 447787, "event": "table_file_deletion", "file_number": 389342} 2025-08-26T17:00:15.013+0200 7fe8a06fe700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1756220415017633, "job": 447787, "event": "table_file_deletion", "file_number": 389276} 2025-08-26T17:00:15.041+0200 7fe8a06fe700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1756220415047481, "job": 447787, "event": "table_file_deletion", "file_number": 389254} 2025-08-26T17:00:15.045+0200 7fe8a06fe700 4 rocksdb: (Original Log Time 2025/08/26-17:00:15.047592) [db/db_impl/db_impl_compaction_flush.cc:2818] Compaction nothing to do 2025-08-26T17:04:35.776+0200 7fe899ee2700 4 rocksdb: [db/db_impl/db_impl.cc:901] ------- DUMPING STATS ------- 2025-08-26T17:04:35.776+0200 7fe899ee2700 4 rocksdb: [db/db_impl/db_impl.cc:903] ** DB Stats ** Uptime(secs): 24012063.5 total, 600.0 interval Cumulative writes: 10G writes, 37G keys, 10G commit groups, 1.0 writes per commit group, ingest: 19073.99 GB, 0.81 MB/s Cumulative WAL: 10G writes, 4860M syncs, 2.14 writes per sync, written: 19073.99 GB, 0.81 MB/s Cumulative stall: 00:00:0.000H:M:S, 0.0 percent Interval writes: 159K writes, 546K keys, 159K commit groups, 1.0 writes per commit group, ingest: 202.96 MB, 0.34 MB/s Interval WAL: 159K writes, 76K syncs, 2.10 writes per sync, written: 0.20 MB, 0.34 MB/s Interval stall: 00:00:0.000H:M:S, 0.0 percent ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- L0 0/0 0.00 KB 0.0 0.0 0.0 0.0 2.4 2.4 0.0 1.0 0.0 25.1 98.33 49.99 19074 0.005 0 0 L1 2/0 137.69 MB 1.0 299.7 2.4 297.3 297.9 0.7 0.0 123.5 59.6 59.2 5152.28 4745.40 4769 1.080 8130M 47M L2 7/0 410.80 MB 0.2 1.3 0.2 1.1 1.2 0.1 0.3 5.8 79.4 72.1 17.00 15.59 4 4.250 32M 3822K Sum 9/0 548.48 MB 0.0 301.0 2.6 298.4 301.5 3.1 0.3 125.0 58.5 58.6 5267.61 4810.98 23847 0.221 8162M 51M Int 0/0 0.00 KB 0.0 0.1 0.0 0.1 0.1 0.0 0.0 3194.3 48.0 47.9 2.87 2.72 2 1.437 5633K 2173 ** Compaction Stats [default] ** Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Low 0/0 0.00 KB 0.0 301.0 2.6 298.4 299.1 0.7 0.0 0.0 59.6 59.3 5169.28 4761.00 4773 1.083 8162M 51M High 0/0 0.00 KB 0.0 0.0 0.0 0.0 2.4 2.4 0.0 0.0 0.0 25.1 98.33 49.99 19073 0.005 0 0 User 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 32.8 0.00 0.00 1 0.001 0 0 This gets repeated many times until finally all seems well again. This a few minutes later the problem repeats itself. Is this a faulty ssd causing this? Environment: # pveversion: pve-manager/7.4-19/f98bf8d4 (running kernel: 5.15.131-2-pve) # ceph version 17.2.7 (29dffbfe59476a6bb5363cf5cc629089b25654e3) quincy (stable) _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ------------------------------ Date: Wed, 27 Aug 2025 17:22:19 +0300 From: Igor Fedotov <igor.fedo...@croit.io> Subject: [ceph-users] Re: OSD crc errors: Faulty SSD? To: Roland Giesler <rol...@giesler.za.net>, ceph-users@ceph.io Message-ID: <b32252ed-1eaa-4c68-8d1c-d38ad158b...@croit.io> Content-Type: text/plain; charset=UTF-8; format=flowed Hi Roland, this looks like a messenger error to me. Hence it's rather a transport/networking issue not data-at-rest one. Thanks, Igor On 8/27/2025 5:01 PM, Roland Giesler wrote: I have relatively new Samsung Enterprise NVMe in a node that is generating the following error: 2025-08-26T15:56:43.870+0200 7fe8ac968700 0 bad crc in data 3326000616 != exp 1246001655 fromv1:192.168.131.4:0/1799093090 2025-08-26T16:03:54.757+0200 7fe8ad96a700 0 bad crc in data 3195468789 != exp 4291467912 fromv1:192.168.131.3:0/315398791 2025-08-26T16:17:34.160+0200 7fe8ad96a700 0 bad crc in data 1471079732 != exp 1408597599 fromv1:192.168.131.3:0/315398791 2025-08-26T16:33:34.035+0200 7fe8ad96a700 0 bad crc in data 724234454 != exp 3110238891 fromv1:192.168.131.3:0/315398791 2025-08-26T16:36:34.265+0200 7fe8ad96a700 0 bad crc in data 96649884 != exp 3724606899 fromv1:192.168.131.3:0/315398791 2025-08-26T16:40:34.395+0200 7fe8ad96a700 0 bad crc in data 1554359919 != exp 1420125995 fromv1:192.168.131.3:0/315398791 2025-08-26T16:54:18.323+0200 7fe8ad169700 0 bad crc in data 362320144 != exp 1850249930 fromv1:192.168.131.1:0/1316652062 This is ceph osd.40. More details from the log: 2025-08-26T17:00:15.013+0200 7fe8a06fe700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1756220415016487, "job": 447787, "event": "table_file_deletion", "file_number": 389377} 2025-08-26T17:00:15.013+0200 7fe8a06fe700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1756220415016839, "job": 447787, "event": "table_file_deletion", "file_number": 389359} 2025-08-26T17:00:15.013+0200 7fe8a06fe700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1756220415017245, "job": 447787, "event": "table_file_deletion", "file_number": 389342} 2025-08-26T17:00:15.013+0200 7fe8a06fe700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1756220415017633, "job": 447787, "event": "table_file_deletion", "file_number": 389276} 2025-08-26T17:00:15.041+0200 7fe8a06fe700 4 rocksdb: EVENT_LOG_v1 {"time_micros": 1756220415047481, "job": 447787, "event": "table_file_deletion", "file_number": 389254} 2025-08-26T17:00:15.045+0200 7fe8a06fe700 4 rocksdb: (Original Log Time 2025/08/26-17:00:15.047592) [db/db_impl/db_impl_compaction_flush.cc:2818] Compaction nothing to do 2025-08-26T17:04:35.776+0200 7fe899ee2700 4 rocksdb: [db/db_impl/db_impl.cc:901] ------- DUMPING STATS ------- 2025-08-26T17:04:35.776+0200 7fe899ee2700 4 rocksdb: [db/db_impl/db_impl.cc:903] ** DB Stats ** Uptime(secs): 24012063.5 total, 600.0 interval Cumulative writes: 10G writes, 37G keys, 10G commit groups, 1.0 writes per commit group, ingest: 19073.99 GB, 0.81 MB/s Cumulative WAL: 10G writes, 4860M syncs, 2.14 writes per sync, written: 19073.99 GB, 0.81 MB/s Cumulative stall: 00:00:0.000H:M:S, 0.0 percent Interval writes: 159K writes, 546K keys, 159K commit groups, 1.0 writes per commit group, ingest: 202.96 MB, 0.34 MB/s Interval WAL: 159K writes, 76K syncs, 2.10 writes per sync, written: 0.20 MB, 0.34 MB/s Interval stall: 00:00:0.000H:M:S, 0.0 percent ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- L0 0/0 0.00 KB 0.0 0.0 0.0 0.0 2.4 2.4 0.0 1.0 0.0 25.1 98.33 49.99 19074 0.005 0 0 L1 2/0 137.69 MB 1.0 299.7 2.4 297.3 297.9 0.7 0.0 123.5 59.6 59.2 5152.28 4745.40 4769 1.080 8130M 47M L2 7/0 410.80 MB 0.2 1.3 0.2 1.1 1.2 0.1 0.3 5.8 79.4 72.1 17.00 15.59 4 4.250 32M 3822K Sum 9/0 548.48 MB 0.0 301.0 2.6 298.4 301.5 3.1 0.3 125.0 58.5 58.6 5267.61 4810.98 23847 0.221 8162M 51M Int 0/0 0.00 KB 0.0 0.1 0.0 0.1 0.1 0.0 0.0 3194.3 48.0 47.9 2.87 2.72 2 1.437 5633K 2173 ** Compaction Stats [default] ** Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Low 0/0 0.00 KB 0.0 301.0 2.6 298.4 299.1 0.7 0.0 0.0 59.6 59.3 5169.28 4761.00 4773 1.083 8162M 51M High 0/0 0.00 KB 0.0 0.0 0.0 0.0 2.4 2.4 0.0 0.0 0.0 25.1 98.33 49.99 19073 0.005 0 0 User 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 32.8 0.00 0.00 1 0.001 0 0 This gets repeated many times until finally all seems well again. This a few minutes later the problem repeats itself. Is this a faulty ssd causing this? Environment: # pveversion: pve-manager/7.4-19/f98bf8d4 (running kernel: 5.15.131-2-pve) # ceph version 17.2.7 (29dffbfe59476a6bb5363cf5cc629089b25654e3) quincy (stable) _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io ------------------------------ Date: Thu, 28 Aug 2025 18:52:05 +1000 From: Andrew <and...@donehue.net> Subject: [ceph-users] Debian Packages for Trixie To: ceph-users@ceph.io Message-ID: <e81c6fed-ff9f-483a-acc0-23d0de5db...@donehue.net> Content-Type: text/plain; charset=UTF-8; format=flowed Hello Team, I can see that there are official packages for Debian Old-Stable (Bookworm) available on the official site: https://download.ceph.com/debian-19.2.3/dists/ Does anyone know if there is an approximate (or likely) ETA for the Debian Packages to be released for current stable version (Trixie)? Many thanks, Andrew ------------------------------ Subject: Digest Footer _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io %(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s ------------------------------ End of ceph-users Digest, Vol 134, Issue 88 ******************************************* _______________________________________________ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io