journalctl logs of a failing OSD starts like in the link: https://pastes.io/osd11-restart-log
Oct 24 18:50:46 ank-backup01 systemd[1]: Started [email protected] - Ceph osd.11 for 4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f. Oct 24 18:50:46 ank-backup01 ceph-osd[586118]: set uid:gid to 167:167 (ceph:ceph) Oct 24 18:50:46 ank-backup01 ceph-osd[586118]: ceph version 19.2.3 (c92aebb279828e9c3c1f5d24613efca272649e62) squid (stable), process ceph-osd, pid 2 Oct 24 18:50:46 ank-backup01 ceph-osd[586118]: pidfile_write: ignore empty --pid-file I can see that it runs with the Ceph version 19.2.3. After the activation process (it takes some time), it does not crash actually. AFAIR from the logs, it mounts bluefs, starts the rocksdb version 7.9.2 and loads the tables. It evens compacts the rocksdb. Afterwards, it triggers the boot process as follows. Oct 24 18:53:41 ank-backup01 ceph-osd[586118]: osd.11 0 journal looks like hdd Oct 24 18:53:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: 2025-10-24T15:53:41.705+0000 7e2ccf90a740 2 osd.11 0 journal looks like hdd Oct 24 18:53:41 ank-backup01 ceph-osd[586118]: osd.11 0 boot Oct 24 18:53:41 ank-backup01 ceph-osd[586118]: osd.11 0 configured osd_max_object_name[space]_len looks ok Oct 24 18:53:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: 2025-10-24T15:53:41.705+0000 7e2ccf90a740 2 osd.11 0 boot Oct 24 18:53:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: 2025-10-24T15:53:41.705+0000 7e2ccf90a740 20 osd.11 0 configured osd_max_object_name[space]_len looks ok I noted that at this point, it loads the osdmap 70297 Oct 24 18:53:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: 2025-10-24T15:53:41.706+0000 7e2ccf90a740 20 osd.11 0 get_map 70297 - loading and decoding 0x565a07bb0000 There are lots of missing logs: Oct 24 18:53:48 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: 2025-10-24T15:53:48.603+0000 7e2ccf90a740 20 read_log_and_missing 68276'8386071 (0'0) modify 23:bf0d9243:::48f964cb-12c5-4689-8aa4-0b8e573f41d5.37815204.2_001%2f184%2f288%2f052.zip:head by client.43525885.0:1306616953 2025-10-09T11:06:44.324360+0000 0 ObjectCleanRegions clean_offsets: [(48911, 18446744073709502704)], clean_omap: true, new_object: false After all these missing logs, it starts and loads the PGs. Oct 24 18:54:05 ank-backup01 ceph-osd[586118]: osd.11 70297 load_pgs opened 66 pgs Oct 24 18:54:05 ank-backup01 ceph-osd[586118]: osd.11 70297 superblock: I am osd.11 Oct 24 18:54:05 ank-backup01 ceph-osd[586118]: osd.11 70297 heartbeat osd_stat(store_statfs(0x19ff4006000/0x0/0x74702400000, data 0x4b4a481b898/0x4b4a4e68000, compress 0x0/0x0/0x0, omap 0x1a50, meta 0xf26958e5b0), peers [] op hist []) Oct 24 18:54:05 ank-backup01 ceph-osd[586118]: osd.11 70297 done with init, starting boot process Oct 24 18:54:05 ank-backup01 ceph-osd[586118]: osd.11 70297 start_boot It periodically dumps stats that shows that OSD is running (up) for xxx secs. Oct 24 19:03:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: Uptime(secs): 600.6 total, 600.0 interval Oct 24 19:03:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: Flush(GB): cumulative 0.000, interval 0.000 Oct 24 19:03:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: AddFile(GB): cumulative 0.000, interval 0.000 Oct 24 19:03:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: AddFile(Total Files): cumulative 0, interval 0 Oct 24 19:03:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: AddFile(L0 Files): cumulative 0, interval 0 Oct 24 19:03:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: AddFile(Keys): cumulative 0, interval 0 Oct 24 19:03:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: Cumulative compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Oct 24 19:03:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: Interval compaction: 0.00 GB write, 0.00 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.0 seconds Oct 24 19:03:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_b> Oct 24 19:03:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: Block cache BinnedLRUCache@0x5659dca05350#2 capacity: 1.06 GB usage: 100.56 MB table_size: 0 occupancy: 18446744073709551615 collections: 2 last_copies: 8> Oct 24 19:03:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: Block cache entry stats(count,size,portion): DataBlock(24571,97.85 MB,8.99328%) FilterBlock(14,1.17 MB,0.107375%) IndexBlock(20,1.54 MB,0.141681%) Misc(1,> Oct 24 19:03:41 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: The problem is, it can not register itself to the mon as UP and running. There are lots of ticks and tick_without_osd_lock messages in the log file. Oct 24 18:57:14 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: 2025-10-24T15:57:14.370+0000 7e2cc8eb9640 10 osd.11 70297 tick Oct 24 18:57:14 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: 2025-10-24T15:57:14.370+0000 7e2cc8eb9640 20 osd.11 70297 tick last_purged_snaps_scrub 2025-10-24T08:38:08.454484+0000 next 2025-10-25T18:30:10.329484+0000 Oct 24 18:57:14 ank-backup01 ceph-osd[586118]: osd.11 70297 tick_without_osd_lock Oct 24 18:57:14 ank-backup01 ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[586110]: 2025-10-24T15:57:14.729+0000 7e2cc86b8640 10 osd.11 70297 tick_without_osd_lock Oct 24 18:57:15 ank-backup01 ceph-osd[586118]: osd.11 70297 reports for 0 queries Oct 24 18:57:15 ank-backup01 ceph-osd[586118]: osd.11 70297 collect_pg_stats It looks like it stays in a loop that send ticks with the older epoch (70297). Thanks for your help BR, Huseyin Cotuk [email protected] > On 24 Oct 2025, at 18:36, Kirby Haze <[email protected]> wrote: > > if you look at the osd’s logs and follow it while trigger a restart through > systemctl > > systemctl list-units | grep osd > > journalctl -u <osd unit> > > When the osd boots up it spits out the version, I would be curious to see how > it crashes and the stack trace > > On Fri, Oct 24, 2025 at 8:31 AM Huseyin Cotuk <[email protected] > <mailto:[email protected]>> wrote: >> Hi Kirby, >> >> Is there any way to check whether these OSDs have completed the upgrade >> successfully or not? >> >> Any help will be appreciated >> >> BR, >> Huseyin >> [email protected] <mailto:[email protected]> >> >> >> >> >>> On 24 Oct 2025, at 18:14, Kirby Haze <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Osds usually try to load the last clean map on boot, but if these osds >>> finished the upgrade it must have hit an assert I think >>> >>> On Fri, Oct 24, 2025 at 8:06 AM Huseyin Cotuk <[email protected] >>> <mailto:[email protected]>> wrote: >>>> Hi Can, >>>> >>>> No, I did not add or recreate these OSDs. I tried cephadm to run failing >>>> OSDs with the help of following commands without success: >>>> >>>> cephadm --image quay.io/ceph/ceph:v19 <http://quay.io/ceph/ceph:v19> unit >>>> --name osd.$i --fsid 4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f restart >>>> >>>> cephadm --image aade1b12b8e6 run --name osd.$i --fsid >>>> 4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f >>>> >>>> BR, >>>> Huseyin >>>> [email protected] <mailto:[email protected]> >>>> >>>> >>>> >>>> >>>> > On 24 Oct 2025, at 17:55, Can Özyurt <[email protected] >>>> > <mailto:[email protected]>> wrote: >>>> > >>>> > Hi Huseyin, >>>> > >>>> > Did you add or recreate these OSDs after the upgrade? >>>> > >>>> > On Fri, 24 Oct 2025 at 16:42, Huseyin Cotuk <[email protected] >>>> > <mailto:[email protected]> <mailto:[email protected] >>>> > <mailto:[email protected]>>> wrote: >>>> >> More debug logs from a failing OSD: >>>> >> >>>> >> Oct 24 17:39:52 ank-backup01 >>>> >> ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[4100566]: >>>> >> 2025-10-24T14:39:52.271+0000 77f47e35c640 20 osd.11 70297 reports for 0 >>>> >> queries >>>> >> Oct 24 17:39:52 ank-backup01 >>>> >> ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[4100566]: >>>> >> 2025-10-24T14:39:52.271+0000 77f47e35c640 15 osd.11 70297 >>>> >> collect_pg_stats >>>> >> Oct 24 17:39:52 ank-backup01 ceph-osd[4100574]: osd.11 pg_epoch: 70297 >>>> >> pg[23.b9s0( v 69083'8392506 (68272'8387884,69083'8392506] >>>> >> local-lis/les=69076/69077 n=8314181 ec=21807/21807 lis/c=69076/68833 >>>> >> les/c/f=69077/68834/39143 sis=70280) >>>> >> [11,NONE,NONE,12,31,21,7,NONE,NONE,26,0]p11(0) r=0 lpr=70297 >>>> >> pi=[68833,70280)/5 crt=69083'8392506 lcod 0'0 mlcod 0'0 unknown mbc={}] >>>> >> PeeringState::prepare_stats_for_publish reporting purged_snaps [] >>>> >> Oct 24 17:39:52 ank-backup01 >>>> >> ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[4100566]: >>>> >> 2025-10-24T14:39:52.271+0000 77f47e35c640 20 osd.11 pg_epoch: 70297 >>>> >> pg[23.b9s0( v 69083'8392506 (68272'8387884,69083'8392506] >>>> >> local-lis/les=69076/69077 n=8314181 ec=21807/21807 lis/c=69076/68833 >>>> >> les/c/f=69077/68834/39143 sis=70280) >>>> >> [11,NONE,NONE,12,31,21,7,NONE,NONE,26,0]p11(0) r=0 lpr=70297 >>>> >> pi=[68833,70280)/5 crt=69083'8392506 lcod 0'0 mlcod 0'0 unknown mbc={}] >>>> >> PeeringState::prepare_stats_for_publish reporting purged_snaps [] >>>> >> Oct 24 17:39:52 ank-backup01 ceph-osd[4100574]: osd.11 pg_epoch: 70297 >>>> >> pg[23.b9s0( v 69083'8392506 (68272'8387884,69083'8392506] >>>> >> local-lis/les=69076/69077 n=8314181 ec=21807/21807 lis/c=69076/68833 >>>> >> les/c/f=69077/68834/39143 sis=70280) >>>> >> [11,NONE,NONE,12,31,21,7,NONE,NONE,26,0]p11(0) r=0 lpr=70297 >>>> >> pi=[68833,70280)/5 crt=69083'8392506 lcod 0'0 mlcod 0'0 unknown mbc={}] >>>> >> PeeringState::prepare_stats_for_publish publish_stats_to_osd >>>> >> 70297:52872808 >>>> >> Oct 24 17:39:52 ank-backup01 >>>> >> ceph-4e7e7d1c-22db-49c7-9f24-5a75cd3a3b9f-osd-11[4100566]: >>>> >> 2025-10-24T14:39:52.271+0000 77f47e35c640 15 osd.11 pg_epoch: 70297 >>>> >> pg[23.b9s0( v 69083'8392506 (68272'8387884,69083'8392506] >>>> >> local-lis/les=69076/69077 n=8314181 ec=21807/21807 lis/c=69076/68833 >>>> >> les/c/f=69077/68834/39143 sis=70280) >>>> >> [11,NONE,NONE,12,31,21,7,NONE,NONE,26,0]p11(0) r=0 lpr=70297 >>>> >> pi=[68833,70280)/5 crt=69083'8392506 lcod 0'0 mlcod 0'0 unknown mbc={}] >>>> >> PeeringState::prepare_stats_for_publish publish_stats_to_osd >>>> >> 70297:52872808 >>>> >> Oct 24 17:39:52 ank-backup01 ceph-osd[4100574]: osd.11 pg_epoch: 70297 >>>> >> pg[23.5fs2( v 69081'8395094 (68276'8390628,69081'8395094] >>>> >> local-lis/les=69078/69079 n=8316950 ec=21807/21807 lis/c=69078/68833 >>>> >> les/c/f=69079/68834/39143 sis=70280) >>>> >> [NONE,NONE,11,8,14,16,0,20,17,18,NONE]p11(2) r=2 lpr=70297 >>>> >> pi=[68833,70280)/7 crt=69081'8395094 lcod 0'0 mlcod 0'0 unknown mbc={}] >>>> >> PeeringState::prepare_stats_for_publish reporting purged_snaps [] >>>> >> >>>> >> >>>> >> Selamlar, >>>> >> Huseyin Cotuk >>>> >> [email protected] <mailto:[email protected]> <mailto:[email protected] >>>> >> <mailto:[email protected]>> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> _______________________________________________ >>>> >> ceph-users mailing list -- [email protected] >>>> >> <mailto:[email protected]> <mailto:[email protected] >>>> >> <mailto:[email protected]>> >>>> >> To unsubscribe send an email to [email protected] >>>> >> <mailto:[email protected]> <mailto:[email protected] >>>> >> <mailto:[email protected]>> >>>> >>>> _______________________________________________ >>>> ceph-users mailing list -- [email protected] <mailto:[email protected]> >>>> To unsubscribe send an email to [email protected] >>>> <mailto:[email protected]> _______________________________________________ ceph-users mailing list -- [email protected] To unsubscribe send an email to [email protected]
