Hello Everyone,

@Frank Schilder: It's normal that 500 is indicated as blocking the peer. In 
peering state it's not possible to 'query' the pg information from the mon.
We need to restart one of the OSD hosting the PG to 'unfreeze' the cmd query. 
And... OSD.500 was the one we restarted to have the output of the cmd.

We have investigated a lot since last week and resolve the issue few days ago.
Here a big summary of the issue!

Documentation that we used:
Similar bug from our: https://tracker.ceph.com/issues/50637
Similar bug encountered by CERN on ML ceph-users: 
https://www.mail-archive.com/ceph-users@lists.ceph.com/msg35914.html
CERN report following the bug encountered: 
https://indico.cern.ch/event/617118/contributions/2490930/attachments/1422793/2181063/ceph_hep_stuck_pg.pdf
Useful commands: 
https://github.com/TheJJ/ceph-cheatsheet/blob/master/README.md#data-corruption
Low level manip with ceph-objectstore: 
https://croit.io/blog/recover-inactive-pgs

Quick overview of our platform:
We have a ceph cluster using erasure coding maintained by 3 mon and 3 mgr.
The ceph version is 16.2.10 pacific (stable).
Our ceph is used by openstack cinder/glance as RBD devices. 
Here the config of the erasure profile:

########################################################################################################
        crush-device-class=ssd
        crush-failure-domain=host
        crush-root=dal-vrack-2
        jerasure-per-chunk-alignment=false
        k=4
        m=2
        plugin=jerasure
        technique=reed_sol_van
        w=8
########################################################################################################

The issue happens on the pool cinder-dal-2-data that contain 12 hosts (each 
host contain 18 OSD of 7TB).
Here the conf of the pool cinder-dal-2 (the pg_num/pgp_num size is wrongly 
configured but... that's for another story...):

########################################################################################################
        size: 6
        min_size: 4
        pg_num: 512
        pgp_num: 512
        crush_rule: cinder-dal-2-data
        hashpspool: true
        allow_ec_overwrites: true
        nodelete: false
        nopgchange: false
        nosizechange: false
        write_fadvise_dontneed: false
        noscrub: false
        nodeep-scrub: false
        use_gmt_hitset: 1
        erasure_code_profile: dal-vrack-2
        fast_read: 0
        pg_autoscale_mode: warn
        bulk: false
########################################################################################################

Here the conf of the pool cinder-dal-2 that store metadata of ec pool 
cinder-dal-2-data:

########################################################################################################
        size: 3
        min_size: 1
        pg_num: 128
        pgp_num: 128
        crush_rule: repl_Rack_02_HDD_Dal
        hashpspool: true
        nodelete: false
        nopgchange: false
        nosizechange: false
        write_fadvise_dontneed: false
        noscrub: false
        nodeep-scrub: false
        use_gmt_hitset: 1
        fast_read: 0
        pg_autoscale_mode: warn
        bulk: false
########################################################################################################

What happen and what we did to solve the issue:

15/09:
Actions:
Restart OSD linked to the PG 13.6a that was stuck in peering state.
Restart all ceph monitors

The OSD 11 was indicated as primary OSD of the PG 13.6a. Because of that, we 
decided to wipe it:
        indicate OSD.11 as lost (ceph osd lost 11)
        connect on the hypervisor that is hosting the OSD.11
                ceph osd destroy 11
                ceph-volume lvm create /dev/xxx --osd-ids 11

Result:
=> cluster rebuilding and no cluster state evolution (PG stuck peering)
=> pg query qui hang (except when secondary OSDs restart)
New primary: from 11 to 148


16/09 morning:
Actions:
Try to force the repair: ceph pg repair 13.6a
        No results on the peering itself. Changing the action set (only the 
primary)
Find a case reported by CERN that seem like our problem. We decided to apply 
the fix that they performed 
(https://www.mail-archive.com/ceph-users@lists.ceph.com/msg35914.html)
        The ceph pg query showed that peering was "blocked by" the OSD.56.
        We wiped the OSD and putting it back in the cluster.
                connect on the hypervisor that is hosting the OSD.56
                ceph osd destroy 56
                ceph-volume lvm create /dev/xxx --osd-ids 56

The command 'ceph pg 13.6a query' doesn't respond because the PG is in peering 
state. To unfreeze it we hard to restart one of the OSD that was hosting the PG.
After weeping the 56, we restarted him to retrieve the query. That's why in the 
result it said at first that the peering is blocked by this OSD.
Here the result of the query:

########################################################################################################
ceph pg 13.6a query
{
    "snap_trimq": "[]",
    "snap_trimq_len": 0,
    "state": "down",
    "epoch": 793941,
    "up": [
        280,
        253,
        254,
        84,
        500,
        2147483647
    ],
    "acting": [
        280,
        253,
        254,
        84,
        500,
        2147483647
    ],
    "info": {
        "pgid": "13.6as0",
        "last_update": "0'0",
        "last_complete": "0'0",
        "log_tail": "0'0",
        "last_user_version": 0,
        "last_backfill": "MAX",
        "purged_snaps": [],
        "history": {
            "epoch_created": 34867,
            "epoch_pool_created": 4002,
            "last_epoch_started": 792996,
            "last_interval_started": 792995,
            "last_epoch_clean": 791031,
            "last_interval_clean": 791030,
            "last_epoch_split": 90048,
            "last_epoch_marked_full": 0,
            "same_up_since": 793941,
            "same_interval_since": 793941,
            "same_primary_since": 793908,
            "last_scrub": "791841'1303832763",
            "last_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
            "last_deep_scrub": "781724'1296647995",
            "last_deep_scrub_stamp": "2024-09-09T03:41:56.778209+0200",
            "last_clean_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
            "prior_readable_until_ub": 0
        },
        "stats": {
            "version": "0'0",
            "reported_seq": 114,
            "reported_epoch": 793941,
            "state": "down",
            "last_fresh": "2024-09-16T01:48:41.620475+0200",
            "last_change": "2024-09-16T01:48:41.620475+0200",
            "last_active": "0.000000",
            "last_peered": "0.000000",
            "last_clean": "0.000000",
            "last_became_active": "0.000000",
            "last_became_peered": "0.000000",
            "last_unstale": "2024-09-16T01:48:41.620475+0200",
            "last_undegraded": "2024-09-16T01:48:41.620475+0200",
            "last_fullsized": "2024-09-16T01:48:41.620475+0200",
            "mapping_epoch": 793941,
            "log_start": "0'0",
            "ondisk_log_start": "0'0",
            "created": 34867,
            "last_epoch_clean": 791031,
            "parent": "0.0",
            "parent_split_bits": 0,
            "last_scrub": "791841'1303832763",
            "last_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
            "last_deep_scrub": "781724'1296647995",
            "last_deep_scrub_stamp": "2024-09-09T03:41:56.778209+0200",
            "last_clean_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
            "log_size": 0,
            "ondisk_log_size": 0,
            "stats_invalid": false,
            "dirty_stats_invalid": false,
            "omap_stats_invalid": false,
            "hitset_stats_invalid": false,
            "hitset_bytes_stats_invalid": false,
            "pin_stats_invalid": false,
            "manifest_stats_invalid": false,
            "snaptrimq_len": 0,
            "stat_sum": {
                "num_bytes": 0,
                "num_objects": 0,
                "num_object_clones": 0,
                "num_object_copies": 0,
                "num_objects_missing_on_primary": 0,
                "num_objects_missing": 0,
                "num_objects_degraded": 0,
                "num_objects_misplaced": 0,
                "num_objects_unfound": 0,
                "num_objects_dirty": 0,
                "num_whiteouts": 0,
                "num_read": 0,
                "num_read_kb": 0,
                "num_write": 0,
                "num_write_kb": 0,
                "num_scrub_errors": 0,
                "num_shallow_scrub_errors": 0,
                "num_deep_scrub_errors": 0,
                "num_objects_recovered": 0,
                "num_bytes_recovered": 0,
                "num_keys_recovered": 0,
                "num_objects_omap": 0,
                "num_objects_hit_set_archive": 0,
                "num_bytes_hit_set_archive": 0,
                "num_flush": 0,
                "num_flush_kb": 0,
                "num_evict": 0,
                "num_evict_kb": 0,
                "num_promote": 0,
                "num_flush_mode_high": 0,
                "num_flush_mode_low": 0,
                "num_evict_mode_some": 0,
                "num_evict_mode_full": 0,
                "num_objects_pinned": 0,
                "num_legacy_snapsets": 0,
                "num_large_omap_objects": 0,
                "num_objects_manifest": 0,
                "num_omap_bytes": 0,
                "num_omap_keys": 0,
                "num_objects_repaired": 0
            },
            "up": [
                280,
                253,
                254,
                84,
                500,
                2147483647
            ],
            "acting": [
                280,
                253,
                254,
                84,
                500,
                2147483647
            ],
            "avail_no_missing": [],
            "object_location_counts": [],
            "blocked_by": [
                11,
                56
            ],
            "up_primary": 280,
            "acting_primary": 280,
            "purged_snaps": []
        },
        "empty": 1,
        "dne": 0,
        "incomplete": 0,
        "last_epoch_started": 0,
        "hit_set_history": {
            "current_last_update": "0'0",
            "history": []
        }
    },
    "peer_info": [],
    "recovery_state": [
        {
            "name": "Started/Primary/Peering/Down",
            "enter_time": "2024-09-16T01:48:41.620452+0200",
            "comment": "not enough up instances of this PG to go active"
        },
        {
            "name": "Started/Primary/Peering",
            "enter_time": "2024-09-16T01:48:41.620318+0200",
            "past_intervals": [
                {
                    "first": "791030",
                    "last": "793940",
                    "all_participants": [
                        {
                            "osd": 11,
                            "shard": 0
                        },
                        {
                            "osd": 56,
                            "shard": 5
                        },
                        {
                            "osd": 84,
                            "shard": 3
                        },
                        {
                            "osd": 253,
                            "shard": 1
                        },
                        {
                            "osd": 254,
                            "shard": 2
                        },
                        {
                            "osd": 280,
                            "shard": 0
                        },
                        {
                            "osd": 500,
                            "shard": 4
                        }
                    ],
                    "intervals": [
                        {
                            "first": "792995",
                            "last": "792997",
                            "acting": "11(0),56(5),253(1),254(2),500(4)"
                        },
                        {
                            "first": "793671",
                            "last": "793673",
                            "acting": "56(5),84(3),254(2),500(4)"
                        },
                        {
                            "first": "793752",
                            "last": "793754",
                            "acting": "11(0),84(3),253(1),254(2),500(4)"
                        },
                        {
                            "first": "793855",
                            "last": "793858",
                            "acting": "56(5),84(3),253(1),254(2),280(0)"
                        },
                        {
                            "first": "793870",
                            "last": "793874",
                            "acting": "56(5),253(1),254(2),280(0),500(4)"
                        },
                        {
                            "first": "793884",
                            "last": "793887",
                            "acting": "56(5),84(3),253(1),280(0),500(4)"
                        },
                        {
                            "first": "793906",
                            "last": "793907",
                            "acting": "56(5),84(3),253(1),254(2),500(4)"
                        },
                        {
                            "first": "793908",
                            "last": "793940",
                            "acting": "56(5),84(3),253(1),254(2),280(0),500(4)"
                        }
                    ]
                }
            ],
            "probing_osds": [
                "84(3)",
                "253(1)",
                "254(2)",
                "280(0)",
                "500(4)"
            ],
            "blocked": "peering is blocked due to down osds",
            "down_osds_we_would_probe": [
                11,
                56
            ],
            "peering_blocked_by": [
                {
                    "osd": 56,
                    "current_lost_at": 0,
                    "comment": "starting or marking this osd lost may let us 
proceed"
                }
            ]
        },
        {
            "name": "Started",
            "enter_time": "2024-09-16T01:48:41.620253+0200"
        }
    ],
    "agent_state": {}
}
########################################################################################################

Result:
Because we wipe OSD.11 and OSD.56, we already lost 2 shards (shard 0 was on 11 
and shard 5 was on 56).
The PG is still stuck in peering state.


16/09 afternoon:
OSD.280 is now tagged as primary for PG 13.6a.
But because the PG is stuck in peering state, there is no data on it.
We decided to put this OSD on debug mode to see if relevant logs appear.
Unfortunately, we didn't see any relevant information from the logs.

Result:
The cluster continues to block client I/O due to the remaining PG peering 
state. The shard0 and shard5 have been wiped, no data lost at this time.

17/09 morning:
After some research on the web, we discover the cli ceph-objectstore that allow 
to retrieve information directly from OSD 
(https://docs.ceph.com/en/pacific/man/8/ceph-objectstore-tool/).
We decided to retrieve the info of the PG 13.6a on each OSD that hosted or have 
hosted the PG: OSD.280(shard 0), OSD.253(shard 1), OSD.254(shard 2), 
OSD.84(shard 3), OSD.500(shard 4), OSD.56(shard 5).
Here the output from each OSD:

########################################################################################################
ceph@cos1-dal-ceph-02:/root$ ceph-objectstore-tool --data-path 
/var/lib/ceph/osd/ceph-280 --op info --pgid 13.6as0 --no-mon-config --type 
bluestore
PG '13.6as0' not found
########################################################################################################

########################################################################################################
ceph@cos1-dal-ceph-08:/root$ ceph-objectstore-tool --data-path 
/var/lib/ceph/osd/ceph-253 --op info --pgid 13.6as1 --no-mon-config --type 
bluestore
{
    "pgid": "13.6as1",
    "last_update": "792997'1304859031",
    "last_complete": "792997'1304859031",
    "log_tail": "792986'1304849009",
    "last_user_version": 1304859031,
    "last_backfill": "MAX",
    "purged_snaps": [],
    "history": {
        "epoch_created": 34867,
        "epoch_pool_created": 4002,
        "last_epoch_started": 792996,
        "last_interval_started": 792995,
        "last_epoch_clean": 791031,
        "last_interval_clean": 791030,
        "last_epoch_split": 90048,
        "last_epoch_marked_full": 0,
        "same_up_since": 795885,
        "same_interval_since": 795885,
        "same_primary_since": 795772,
        "last_scrub": "791841'1303832763",
        "last_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "last_deep_scrub": "781724'1296647995",
        "last_deep_scrub_stamp": "2024-09-09T03:41:56.778209+0200",
        "last_clean_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "prior_readable_until_ub": 0
    },
    "stats": {
        "version": "792997'1304859031",
        "reported_seq": 1487907026,
        "reported_epoch": 795771,
        "state": "peering",
        "last_fresh": "2024-09-17T12:05:44.343820+0200",
        "last_change": "2024-09-17T12:04:15.968213+0200",
        "last_active": "2024-09-15T23:55:04.038838+0200",
        "last_peered": "2024-09-15T08:13:35.443051+0200",
        "last_clean": "2024-09-15T08:11:21.052407+0200",
        "last_became_active": "2024-09-15T08:11:25.577367+0200",
        "last_became_peered": "2024-09-15T08:11:25.577367+0200",
        "last_unstale": "2024-09-17T12:05:44.343820+0200",
        "last_undegraded": "2024-09-17T12:05:44.343820+0200",
        "last_fullsized": "2024-09-17T12:05:44.343820+0200",
        "mapping_epoch": 795885,
        "log_start": "792986'1304849009",
        "ondisk_log_start": "792986'1304849009",
        "created": 34867,
        "last_epoch_clean": 791031,
        "parent": "0.0",
        "parent_split_bits": 7,
        "last_scrub": "791841'1303832763",
        "last_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "last_deep_scrub": "781724'1296647995",
        "last_deep_scrub_stamp": "2024-09-09T03:41:56.778209+0200",
        "last_clean_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "log_size": 10022,
        "ondisk_log_size": 10022,
        "stats_invalid": false,
        "dirty_stats_invalid": false,
        "omap_stats_invalid": false,
        "hitset_stats_invalid": false,
        "hitset_bytes_stats_invalid": false,
        "pin_stats_invalid": false,
        "manifest_stats_invalid": false,
        "snaptrimq_len": 0,
        "stat_sum": {
            "num_bytes": 1150405518336,
            "num_objects": 275089,
            "num_object_clones": 2,
            "num_object_copies": 1650534,
            "num_objects_missing_on_primary": 0,
            "num_objects_missing": 0,
            "num_objects_degraded": 0,
            "num_objects_misplaced": 0,
            "num_objects_unfound": 0,
            "num_objects_dirty": 275089,
            "num_whiteouts": 0,
            "num_read": 152074641,
            "num_read_kb": 19905452430,
            "num_write": 773240119,
            "num_write_kb": 37150859934,
            "num_scrub_errors": 0,
            "num_shallow_scrub_errors": 0,
            "num_deep_scrub_errors": 0,
            "num_objects_recovered": 1132336,
            "num_bytes_recovered": 4705340429824,
            "num_keys_recovered": 0,
            "num_objects_omap": 0,
            "num_objects_hit_set_archive": 0,
            "num_bytes_hit_set_archive": 0,
            "num_flush": 0,
            "num_flush_kb": 0,
            "num_evict": 0,
            "num_evict_kb": 0,
            "num_promote": 0,
            "num_flush_mode_high": 0,
            "num_flush_mode_low": 0,
            "num_evict_mode_some": 0,
            "num_evict_mode_full": 0,
            "num_objects_pinned": 0,
            "num_legacy_snapsets": 0,
            "num_large_omap_objects": 0,
            "num_objects_manifest": 0,
            "num_omap_bytes": 0,
            "num_omap_keys": 0,
            "num_objects_repaired": 0
        },
        "up": [
            20,
            253,
            254,
            84,
            500,
            56
        ],
        "acting": [
            20,
            253,
            254,
            84,
            500,
            56
        ],
        "avail_no_missing": [],
        "object_location_counts": [],
        "blocked_by": [
            84
        ],
        "up_primary": 20,
        "acting_primary": 20,
        "purged_snaps": []
    },
    "empty": 0,
    "dne": 0,
    "incomplete": 0,
    "last_epoch_started": 792996,
    "hit_set_history": {
        "current_last_update": "0'0",
        "history": []
    }
}
########################################################################################################

########################################################################################################
ceph@cos1-dal-ceph-05:/root$ ceph-objectstore-tool --data-path 
/var/lib/ceph/osd/ceph-254 --op info --pgid 13.6as2 --no-mon-config --type 
bluestore
{
    "pgid": "13.6as2",
    "last_update": "792997'1304859031",
    "last_complete": "792997'1304859031",
    "log_tail": "792986'1304849009",
    "last_user_version": 1304859031,
    "last_backfill": "MAX",
    "purged_snaps": [],
    "history": {
        "epoch_created": 34867,
        "epoch_pool_created": 4002,
        "last_epoch_started": 792996,
        "last_interval_started": 792995,
        "last_epoch_clean": 791031,
        "last_interval_clean": 791030,
        "last_epoch_split": 90048,
        "last_epoch_marked_full": 0,
        "same_up_since": 795877,
        "same_interval_since": 795877,
        "same_primary_since": 795772,
        "last_scrub": "791841'1303832763",
        "last_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "last_deep_scrub": "781724'1296647995",
        "last_deep_scrub_stamp": "2024-09-09T03:41:56.778209+0200",
        "last_clean_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "prior_readable_until_ub": 0
    },
    "stats": {
        "version": "792997'1304859031",
        "reported_seq": 1487906882,
        "reported_epoch": 793673,
        "state": "down",
        "last_fresh": "2024-09-16T00:01:26.449646+0200",
        "last_change": "2024-09-15T23:59:36.025152+0200",
        "last_active": "2024-09-15T23:59:36.024851+0200",
        "last_peered": "2024-09-15T08:13:35.443051+0200",
        "last_clean": "2024-09-15T08:11:21.052407+0200",
        "last_became_active": "2024-09-15T08:11:25.577367+0200",
        "last_became_peered": "2024-09-15T08:11:25.577367+0200",
        "last_unstale": "2024-09-16T00:01:26.449646+0200",
        "last_undegraded": "2024-09-16T00:01:26.449646+0200",
        "last_fullsized": "2024-09-16T00:01:26.449646+0200",
        "mapping_epoch": 795877,
        "log_start": "792986'1304849009",
        "ondisk_log_start": "792986'1304849009",
        "created": 34867,
        "last_epoch_clean": 791031,
        "parent": "0.0",
        "parent_split_bits": 7,
        "last_scrub": "791841'1303832763",
        "last_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "last_deep_scrub": "781724'1296647995",
        "last_deep_scrub_stamp": "2024-09-09T03:41:56.778209+0200",
        "last_clean_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "log_size": 10022,
        "ondisk_log_size": 10022,
        "stats_invalid": false,
        "dirty_stats_invalid": false,
        "omap_stats_invalid": false,
        "hitset_stats_invalid": false,
        "hitset_bytes_stats_invalid": false,
        "pin_stats_invalid": false,
        "manifest_stats_invalid": false,
        "snaptrimq_len": 0,
        "stat_sum": {
            "num_bytes": 1150405518336,
            "num_objects": 275089,
            "num_object_clones": 2,
            "num_object_copies": 1650534,
            "num_objects_missing_on_primary": 0,
            "num_objects_missing": 0,
            "num_objects_degraded": 0,
            "num_objects_misplaced": 0,
            "num_objects_unfound": 0,
            "num_objects_dirty": 275089,
            "num_whiteouts": 0,
            "num_read": 152074641,
            "num_read_kb": 19905452430,
            "num_write": 773240119,
            "num_write_kb": 37150859934,
            "num_scrub_errors": 0,
            "num_shallow_scrub_errors": 0,
            "num_deep_scrub_errors": 0,
            "num_objects_recovered": 1132336,
            "num_bytes_recovered": 4705340429824,
            "num_keys_recovered": 0,
            "num_objects_omap": 0,
            "num_objects_hit_set_archive": 0,
            "num_bytes_hit_set_archive": 0,
            "num_flush": 0,
            "num_flush_kb": 0,
            "num_evict": 0,
            "num_evict_kb": 0,
            "num_promote": 0,
            "num_flush_mode_high": 0,
            "num_flush_mode_low": 0,
            "num_evict_mode_some": 0,
            "num_evict_mode_full": 0,
            "num_objects_pinned": 0,
            "num_legacy_snapsets": 0,
            "num_large_omap_objects": 0,
            "num_objects_manifest": 0,
            "num_omap_bytes": 0,
            "num_omap_keys": 0,
            "num_objects_repaired": 0
        },
        "up": [
            20,
            253,
            254,
            84,
            500,
            56
        ],
        "acting": [
            20,
            253,
            254,
            84,
            500,
            56
        ],
        "avail_no_missing": [],
        "object_location_counts": [],
        "blocked_by": [
            11,
            253
        ],
        "up_primary": 20,
        "acting_primary": 20,
        "purged_snaps": []
    },
    "empty": 0,
    "dne": 0,
    "incomplete": 0,
    "last_epoch_started": 792996,
    "hit_set_history": {
        "current_last_update": "0'0",
        "history": []
    }
}
########################################################################################################

########################################################################################################
ceph@cos1-dal-ceph-17:/root$ ceph-objectstore-tool --data-path 
/var/lib/ceph/osd/ceph-84 --op info --pgid 13.6as3 --no-mon-config --type 
bluestore
{
    "pgid": "13.aas3",
    "last_update": "792994'1304857017",
    "last_complete": "792994'1304857017",
    "log_tail": "792984'1252261577",
    "last_user_version": 1304857017,
    "last_backfill": "MAX",
    "purged_snaps": [],
    "history": {
        "epoch_created": 90048,
        "epoch_pool_created": 4002,
        "last_epoch_started": 792996,
        "last_interval_started": 792995,
        "last_epoch_clean": 792646,
        "last_interval_clean": 792645,
        "last_epoch_split": 90048,
        "last_epoch_marked_full": 0,
        "same_up_since": 795869,
        "same_interval_since": 795869,
        "same_primary_since": 795772,
        "last_scrub": "791841'1303832763",
        "last_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "last_deep_scrub": "791651'1250745157",
        "last_deep_scrub_stamp": "2024-09-14T11:44:10.435568+0200",
        "last_clean_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "prior_readable_until_ub": 0
    },
    "stats": {
        "version": "792994'1304857016",
        "reported_seq": 1487904260,
        "reported_epoch": 792994,
        "state": "active+clean",
        "last_fresh": "2024-09-15T08:11:21.052407+0200",
        "last_change": "2024-09-15T01:57:34.656131+0200",
        "last_active": "2024-09-15T08:11:21.052407+0200",
        "last_peered": "2024-09-15T08:11:21.052407+0200",
        "last_clean": "2024-09-15T08:11:21.052407+0200",
        "last_became_active": "2024-09-15T01:57:34.655123+0200",
        "last_became_peered": "2024-09-15T01:57:34.655123+0200",
        "last_unstale": "2024-09-15T08:11:21.052407+0200",
        "last_undegraded": "2024-09-15T08:11:21.052407+0200",
        "last_fullsized": "2024-09-15T08:11:21.052407+0200",
        "mapping_epoch": 795869,
        "log_start": "792984'1252261577",
        "ondisk_log_start": "792984'1252261577",
        "created": 90048,
        "last_epoch_clean": 792646,
        "parent": "0.0",
        "parent_split_bits": 9,
        "last_scrub": "791651'1250745157",
        "last_scrub_stamp": "2024-09-14T11:44:10.435568+0200",
        "last_deep_scrub": "791651'1250745157",
        "last_deep_scrub_stamp": "2024-09-14T11:44:10.435568+0200",
        "last_clean_scrub_stamp": "2024-09-14T11:44:10.435568+0200",
        "log_size": 10040,
        "ondisk_log_size": 10040,
        "stats_invalid": false,
        "dirty_stats_invalid": false,
        "omap_stats_invalid": false,
        "hitset_stats_invalid": false,
        "hitset_bytes_stats_invalid": false,
        "pin_stats_invalid": false,
        "manifest_stats_invalid": false,
        "snaptrimq_len": 0,
        "stat_sum": {
            "num_bytes": 1150405514240,
            "num_objects": 275089,
            "num_object_clones": 0,
            "num_object_copies": 1650534,
            "num_objects_missing_on_primary": 0,
            "num_objects_missing": 0,
            "num_objects_degraded": 0,
            "num_objects_misplaced": 0,
            "num_objects_unfound": 0,
            "num_objects_dirty": 275089,
            "num_whiteouts": 0,
            "num_read": 152074050,
            "num_read_kb": 19905362034,
            "num_write": 773238105,
            "num_write_kb": 37150690143,
            "num_scrub_errors": 0,
            "num_shallow_scrub_errors": 0,
            "num_deep_scrub_errors": 0,
            "num_objects_recovered": 1969547,
            "num_bytes_recovered": 8208403499520,
            "num_keys_recovered": 0,
            "num_objects_omap": 0,
            "num_objects_hit_set_archive": 0,
            "num_bytes_hit_set_archive": 0,
            "num_flush": 0,
            "num_flush_kb": 0,
            "num_evict": 0,
            "num_evict_kb": 0,
            "num_promote": 0,
            "num_flush_mode_high": 0,
            "num_flush_mode_low": 0,
            "num_evict_mode_some": 0,
            "num_evict_mode_full": 0,
            "num_objects_pinned": 0,
            "num_legacy_snapsets": 0,
            "num_large_omap_objects": 0,
            "num_objects_manifest": 0,
            "num_omap_bytes": 0,
            "num_omap_keys": 0,
            "num_objects_repaired": 0
        },
        "up": [
            20,
            253,
            254,
            84,
            500,
            56
        ],
        "acting": [
            20,
            253,
            254,
            84,
            500,
            56
        ],
        "avail_no_missing": [],
        "object_location_counts": [],
        "blocked_by": [],
        "up_primary": 20,
        "acting_primary": 20,
        "purged_snaps": []
    },
    "empty": 0,
    "dne": 0,
    "incomplete": 0,
    "last_epoch_started": 792646,
    "hit_set_history": {
        "current_last_update": "0'0",
        "history": []
    }
}
########################################################################################################

########################################################################################################
ceph@cos1-dal-ceph-32:/root$ ceph-objectstore-tool --data-path 
/var/lib/ceph/osd/ceph-500 --op info --pgid 13.6as4 --no-mon-config --type 
bluestore
{
    "pgid": "13.6as4",
    "last_update": "792997'1304859031",
    "last_complete": "792997'1304859031",
    "log_tail": "792986'1304849009",
    "last_user_version": 1304859031,
    "last_backfill": "MAX",
    "purged_snaps": [],
    "history": {
        "epoch_created": 34867,
        "epoch_pool_created": 4002,
        "last_epoch_started": 792996,
        "last_interval_started": 792995,
        "last_epoch_clean": 791031,
        "last_interval_clean": 791030,
        "last_epoch_split": 90048,
        "last_epoch_marked_full": 0,
        "same_up_since": 795860,
        "same_interval_since": 795860,
        "same_primary_since": 795772,
        "last_scrub": "791841'1303832763",
        "last_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "last_deep_scrub": "781724'1296647995",
        "last_deep_scrub_stamp": "2024-09-09T03:41:56.778209+0200",
        "last_clean_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "prior_readable_until_ub": 0
    },
    "stats": {
        "version": "792997'1304859031",
        "reported_seq": 1487906877,
        "reported_epoch": 792997,
        "state": "active+undersized+degraded",
        "last_fresh": "2024-09-15T08:13:35.443051+0200",
        "last_change": "2024-09-15T08:11:25.577367+0200",
        "last_active": "2024-09-15T08:13:35.443051+0200",
        "last_peered": "2024-09-15T08:13:35.443051+0200",
        "last_clean": "2024-09-15T08:11:21.052407+0200",
        "last_became_active": "2024-09-15T08:11:25.577367+0200",
        "last_became_peered": "2024-09-15T08:11:25.577367+0200",
        "last_unstale": "2024-09-15T08:13:35.443051+0200",
        "last_undegraded": "2024-09-15T08:11:25.558559+0200",
        "last_fullsized": "2024-09-15T08:11:25.558431+0200",
        "mapping_epoch": 795860,
        "log_start": "792986'1304849009",
        "ondisk_log_start": "792986'1304849009",
        "created": 34867,
        "last_epoch_clean": 791031,
        "parent": "0.0",
        "parent_split_bits": 7,
        "last_scrub": "791841'1303832763",
        "last_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "last_deep_scrub": "781724'1296647995",
        "last_deep_scrub_stamp": "2024-09-09T03:41:56.778209+0200",
        "last_clean_scrub_stamp": "2024-09-14T15:47:43.149821+0200",
        "log_size": 10022,
        "ondisk_log_size": 10022,
        "stats_invalid": false,
        "dirty_stats_invalid": false,
        "omap_stats_invalid": false,
        "hitset_stats_invalid": false,
        "hitset_bytes_stats_invalid": false,
        "pin_stats_invalid": false,
        "manifest_stats_invalid": false,
        "snaptrimq_len": 0,
        "stat_sum": {
            "num_bytes": 1150405518336,
            "num_objects": 275089,
            "num_object_clones": 2,
            "num_object_copies": 1650534,
            "num_objects_missing_on_primary": 0,
            "num_objects_missing": 0,
            "num_objects_degraded": 275089,
            "num_objects_misplaced": 0,
            "num_objects_unfound": 0,
            "num_objects_dirty": 275089,
            "num_whiteouts": 0,
            "num_read": 152074641,
            "num_read_kb": 19905452430,
            "num_write": 773240119,
            "num_write_kb": 37150859934,
            "num_scrub_errors": 0,
            "num_shallow_scrub_errors": 0,
            "num_deep_scrub_errors": 0,
            "num_objects_recovered": 1132336,
            "num_bytes_recovered": 4705340429824,
            "num_keys_recovered": 0,
            "num_objects_omap": 0,
            "num_objects_hit_set_archive": 0,
            "num_bytes_hit_set_archive": 0,
            "num_flush": 0,
            "num_flush_kb": 0,
            "num_evict": 0,
            "num_evict_kb": 0,
            "num_promote": 0,
            "num_flush_mode_high": 0,
            "num_flush_mode_low": 0,
            "num_evict_mode_some": 0,
            "num_evict_mode_full": 0,
            "num_objects_pinned": 0,
            "num_legacy_snapsets": 0,
            "num_large_omap_objects": 0,
            "num_objects_manifest": 0,
            "num_omap_bytes": 0,
            "num_omap_keys": 0,
            "num_objects_repaired": 0
        },
        "up": [
            20,
            253,
            254,
            84,
            500,
            56
        ],
        "acting": [
            20,
            253,
            254,
            84,
            500,
            56
        ],
        "avail_no_missing": [
            "11(0)",
            "56(5)",
            "253(1)",
            "254(2)",
            "500(4)"
        ],
        "object_location_counts": [
            {
                "shards": "11(0),56(5),253(1),254(2),500(4)",
                "objects": 275089
            }
        ],
        "blocked_by": [],
        "up_primary": 20,
        "acting_primary": 20,
        "purged_snaps": []
    },
    "empty": 0,
    "dne": 0,
    "incomplete": 0,
    "last_epoch_started": 792996,
    "hit_set_history": {
        "current_last_update": "0'0",
        "history": []
    }
}
########################################################################################################

########################################################################################################
ceph@cos1-dal-ceph-11:/root$ ceph-objectstore-tool --data-path 
/var/lib/ceph/osd/ceph-56 --op info --pgid 13.6as5 --no-mon-config --type 
bluestore
PG '13.6as5' not found
########################################################################################################


What we saw from these outputs:
It seems that the shard3 hosted on OSD.84 want to move to OSD.278
But on OSD.84 the output of the cli is incoherent!
We enter 'ceph-objectstore-tool --pgid 13.6as3 ...' but in the output we got 
'"pgid": "13.aas3"'!
We now suspect that it's because of this shard that the PG can’t peer correctly.

So, we decided to perform the move of the shard3 manually from OSD.84 to 
OSD.278 (that are on the same host):

########################################################################################################
        # Add flag on cluster to avoid migration of data during the copy of the 
shard3
        ceph osd set noout
        ceph osd set norebalance
        ceph osd set noup

        # Stop both OSD to perform the copy
        systemctl stop ceph-osd@84.service
        systemctl stop ceph-osd@278.service
        
        # Copy the PG.
        # We didn't have the volumetry on the host to extract the file and then 
import him. So we used stdin/out as a small hack.
        ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-84 
--no-mon-config --pgid 13.6as3 --op export --file /dev/stdout | 
ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-278 --no-mon-config 
--pgid 13.6as3 --op import --file /dev/stdin
########################################################################################################
        

After the copy, we used 'ceph-objectstore-tool --data-path 
/var/lib/ceph/osd/ceph-278 --op info --pgid 13.6as3' to confirm that the pgid 
in the output was the same and... finally there was no more incoherence.  
We then use ceph-objectstore-tool to remove the "corrupted" shard3 on the 
OSD.84:
        ceph-objectstore-tool --data-path /var/lib/ceph/osd/ceph-84 
--no-mon-config --pgid 13.6as3 --op remove

We then restart both OSD process and remove the flag on the ceph cluster.
And PG has finally peer!!!!
We waited for the rediscovery of this PG.
After few minutes, Ceph reported that we may have lost 127 objects (Certainly 
because of the 2 shards wiped and the strange state of shard3). 
We Act to delete the 127 lost objects permanently (on OSD 84) to finish the 
recovery:
        ceph pg 13.6a mark_unfound_lost delete

What we learned:
        When the usual actions do not correct the problem, analyzing the PG in 
detail makes it possible to identify the root cause and correct it.
        If this pg <PG ID> query does not render hand, it is necessary to find 
the OSD(s) that are at the origin of the problem.
        In case of low-level actions on the OSD, the cluster must be in 
maintenance to avoid scaling the crushmaps.
        For imports/exports, use stdin and stdout in case of high volumetry. 
        Peering problem:
                Multiple PGs: Probable network problem?
                Only one PG in peering: metadata corruption somewhere?

Conclusion:
=> Suspicion of blocking the pairing process of pg 13.6a due to corruption of 
the uuid of pg on osd 84.
=> Root cause of the issue is still unknown...
=> At each stage, ask yourself the question of the data you have, the data lost 
because of the actions and the location of the data still existing.
=> We plan to create a bug report in order to deep dive the issue and maybe 
understand/find the root cause. Even if it's a EOL version, the bug is maybe 
still present in the updated one.


-----Original Message-----
From: Frank Schilder <fr...@dtu.dk> 
Sent: Monday, September 23, 2024 9:27 AM
To: HARROUIN Loan (PRESTATAIRE CA-GIP) <loan.harrouin-prestata...@ca-gip.fr>; 
ceph-users@ceph.io
Subject: Re: [Ceph incident] PG stuck in peering.

EXPÉDITEUR EXTERNE / EXTERNAL SENDER:

[FR] Ne cliquez sur aucun lien et n'ouvrez aucune pièce jointe à moins qu'ils 
ne proviennent d'un expéditeur fiable, ou que vous ayez l'assurance que le 
contenu provient d'une source sûre.
[EN] Do not click on any links or open any attachments unless they come from a 
trusted sender, or you are sure the content is from a trusted source.

Hi, the ceph query states:

    "recovery_state": [
        {
            "name": "Started/Primary/Peering/Down",
            "enter_time": "2024-09-16T17:48:13.572414+0200",
            "comment": "not enough up instances of this PG to go active"
        },

Its missing an OSD (shard 4=2147483647 means "none"):

            "up": [
                20,
                253,
                254,
                84,
                2147483647,
                56
            ],

            "acting": [
                20,
                253,
                254,
                84,
                2147483647,
                56
            ],

The OSD in this place is down and for some reason ceph cannot find another OSD 
to take its place. Fastest way forward is probably to get this OSD up again and 
then look why ceph couldn't assign a replacement.

Depending on number of hosts and how your crush rule is defined, you might be 
in the "ceph gives up too soon" situation or similar.

PS: "We wipe the OSD 11, 148 and 280 (one by one and waiting of course the 
peering to avoid data loss on other PGs)."

I hope you mean "waited for recovery" or what does a wipe here mean.

Best regards,
=================
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14

________________________________________
From: HARROUIN Loan (PRESTATAIRE CA-GIP) <loan.harrouin-prestata...@ca-gip.fr>
Sent: Monday, September 16, 2024 7:33 PM
To: ceph-users@ceph.io
Cc: CAGIP_DEVOPS_OPENSTACK
Subject: [ceph-users] [Ceph incident] PG stuck in peering.

Hello dear ceph community,

We are facing a strange issue this weekend with a pg (13.6a) that is stuck in 
peering. Because of that we got lot of ops stuck of course.
We are running a ceph in Pacific version 16.2.10, we have only SSD disk and are 
using erasure coding.

  cluster:
    id:     f5c69b4a-89e0-4055-95f7-eddc6800d4fe
    health: HEALTH_WARN
            Reduced data availability: 1 pg inactive, 1 pg peering
            256 slow ops, oldest one blocked for 5274 sec, osd.20 has slow ops
  services:
    mon: 3 daemons, quorum 
cos1-dal-ceph-mon-01,cos1-dal-ceph-mon-02,cos1-dal-ceph-mon-03 (age 17h)
    mgr: cos1-dal-ceph-mon-02(active, since 17h), standbys: 
cos1-dal-ceph-mon-03, cos1-dal-ceph-mon-01
    osd: 647 osds: 646 up (since 27m), 643 in (since 2h)
  data:
    pools:   7 pools, 1921 pgs
    objects: 432.65M objects, 1.6 PiB
    usage:   2.4 PiB used, 2.0 PiB / 4.4 PiB avail
    pgs:     0.052% pgs not active
             1916 active+clean
             2    active+clean+scrubbing
             2    active+clean+scrubbing+deep
             1    peering
The ‘ceph pg 13.6a query’ hung, so we must restart one of the osd that are part 
of this PG to temporary unhung the query (because during some seconds the pg 
isn’t peering yet). In that case, the query only retrieves the information 
about the shard that was hosted on the OSD that we restart.
The result of the query is in attachment (shard 0).

First when the issue occurs, we check the logs and restart all the osd linked 
to this PG.
Sadly, it didn’t fix anything. We try to investigate the peering state to 
understand what was going on the primary OSD. We put the OSD in debug but at 
first glance anything seems strange (we are not use to deep dive that much into 
ceph).

We find that CERN faced something similar a long time ago: 
https://indico.cern.ch/event/617118/contributions/2490930/attachments/1422793/2181063/ceph_hep_stuck_pg.pdf
After reading it, we try to do the empty OSD method that they tried (diapo7). 
We identify that the shard0 seem in a weird state (and was primary) so it was 
our candidate. We wipe the OSD 11, 148 and 280 (one by one and waiting of 
course the peering to avoid data loss on other PGs).
After that, the OSD.20 was now elected as the primary but still, the PG stay 
huge in peering and now all OPS are stuck on OSD.20.

We are now in the dark. We plan to maybe deep dive deeper into the log of this 
new OSD.20, and see if we can plan to upgrade our ceph in order to have the 
most recent version.
Any help or suggestion is welcome 😊


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to