Hi,
On 11/19/18 12:49 PM, Thomas Klute wrote:
Hi,
we have a production cluster (3 nodes) stuck unclean after we had to
replace one osd.
Cluster recovered fine except some pgs that are stuck unclean for about
2-3 days now:
*snipsnap*
[root@ceph1 ~]# fgrep remapp /tmp/pgdump.txt
3.83 5423 0 0 5423 0 22046870528 3065
3065 active+remapped 2018-11-16 04:08:22.365825 85711'8469810
85711:8067280 [5,11] 5 [5,11,13] 5 83827'8450839
This PG is currently running on OSDs 5,11,13 and the reshuffling due to
replacing the OSD has lead to a problem with crush and getting three
OSDs following the crush rules. Crush came up with OSDs 5 and 11 for
this PG; a third OSD is missing.
You only have three nodes, so this is a corner case in the crush
algorithm and its pseudo random nature. To solve this problem you can
either add more nodes, or change some of the crush parameters, e.g. the
number of tries.
Regards,
Burkhard
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com