Re: [ceph-users] Some pgs stuck unclean in active+remapped state

Burkhard Linke Mon, 19 Nov 2018 04:23:54 -0800

Hi,

On 11/19/18 12:49 PM, Thomas Klute wrote:

Hi,


we have a production cluster (3 nodes) stuck unclean after we had to
replace one osd.
Cluster recovered fine except some pgs that are stuck unclean for about
2-3 days now:



*snipsnap*

[root@ceph1 ~]# fgrep remapp /tmp/pgdump.txt
3.83    5423    0       0       5423    0       22046870528     3065
3065    active+remapped 2018-11-16 04:08:22.365825      85711'8469810
85711:8067280   [5,11]  5       [5,11,13]       5       83827'8450839

This PG is currently running on OSDs 5,11,13 and the reshuffling due toreplacing the OSD has lead to a problem with crush and getting threeOSDs following the crush rules. Crush came up with OSDs 5 and 11 forthis PG; a third OSD is missing.

You only have three nodes, so this is a corner case in the crushalgorithm and its pseudo random nature. To solve this problem you caneither add more nodes, or change some of the crush parameters, e.g. thenumber of tries.



Regards,

Burkhard


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Some pgs stuck unclean in active+remapped state

Reply via email to