Hi,

I am a complete n00b to CEPH and cannot seem to figure out why my cluster
isn't working as expected. We have 39 OSDs, 36 of which are 100 GB
volumes and 3 are 2 TB volumes managed under AWS EC2.

Yesterday I replaced one of the 100 GB volumes with a new 2 TB volume which
includes creating a snapshot, detaching the old volume, attaching the new
volume, then using parted to correctly set the start/end of the data
partition. This all went smoothly and no issues reported from AWS or the
server.

However, when I started reweighting the OSDs, the health status went to
HEALTH_WARN with over 500 pgs stuck unclean, and about 14% of objects
misplaced. I am adding the health detail, crushmap, and OSD tree here:

Crushmap: https://pastebin.com/HxiAChP3
Health Detail: https://pastebin.com/K7ZqLQH9
OSD Tree: https://pastebin.com/qGRk3R8S

We use CEPH to storage our image inventory which is about 5 million or so
images. If you do a search on our site, https://iconfinder.com, none of the
images is showing up.

This all started after doing the reweights when the new volume was added. I
tried setting all of the weights back to their original settings but this
did not help.

The only other thing that I changed was to set the max PID threads to the
max allowed. I reset this to the original setting but that didn't work
either.

sudo sysctl -w kernel.pid_max=32768

Thanks in advance for any help.

Scott Lewis
Sr. Developer & Head of Content
Iconfinder Aps

http://iconfinder.com
http://twitter.com/iconfinder

"Helping Designers Make a Living Doing What They Love"
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to