On 7/25/19 6:49 AM, Sangwhan Moon wrote:
> Hello,
>
> I've inherited a Ceph cluster from someone who has left zero documentation or
> any handover. A couple days ago it decided to show the entire company what it
> is capable of..
>
> The health report looks like this:
>
> [root@host mnt]# ceph -s
> cluster:
> id: 809718aa-3eac-4664-b8fa-38c46cdbfdab
> health: HEALTH_ERR
> 1 MDSs report damaged metadata
> 1 MDSs are read only
> 2 MDSs report slow requests
> 6 MDSs behind on trimming
> Reduced data availability: 2 pgs stale
> Degraded data redundancy: 2593/186803520 objects degraded
> (0.001%), 2 pgs degraded, 2 pgs undersized
> 1 slow requests are blocked > 32 sec. Implicated osds
> 716 stuck requests are blocked > 4096 sec. Implicated osds
> 25,31,38\
I would start here:
>
> services:
> mon: 3 daemons, quorum f,rook-ceph-mon2,rook-ceph-mon0
> mgr: a(active)
> mds: ceph-fs-2/2/2 up odd-fs-2/2/2 up
> {[ceph-fs:0]=ceph-fs-5b997cbf7b-5tjwh=up:active,[ceph-fs:1]=ceph-fs-5b997cbf
> 7b-nstqz=up:active,[user-fs:0]=odd-fs-5668c75f9f-hflps=up:active,[user-fs:1]=odd-fs-5668c75f9f-jf59x=up:active},
> 4 up:sta
> ndby-replay
> osd: 39 osds: 39 up, 38 in
>
> data:
> pools: 5 pools, 706 pgs
> objects: 91212k objects, 4415 GB
> usage: 10415 GB used, 13024 GB / 23439 GB avail
> pgs: 2593/186803520 objects degraded (0.001%)
> 703 active+clean
> 2 stale+active+undersized+degraded
This is a problem! Can you check:
$ ceph pg dump_stuck
The PGs will start with a number like 8.1a where '8' it the pool ID.
Then check:
$ ceph df
To which pools to those PGs belong?
Then check:
$ ceph pg <PGID> query
And the bottom somewhere should show why these PGs are not active. You
might even want to try a restart of these OSDs involved with those two PGs.
Wido
> 1 active+clean+scrubbing+deep
>
> io:
> client: 168 kB/s rd, 6336 B/s wr, 10 op/s rd, 1 op/s wr
>
> The offending broken MDS entry (damaged metadata) seems to be this:
>
> mds.ceph-fs-5b997cbf7b-5tjwh: [
> {
> "damage_type": "dir_frag",
> "id": 1190692215,
> "ino": 2199023258131,
> "frag": "*",
> "path": "/f/01/59"
> }
> ]
>
> Is there any idea how I can diagnose and find out what is wrong? For the
> other issues I'm not even sure what/where I need to look into.
>
> Cheers,
> Sangwhan
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com