So we were forced out of our datacenter and had to move all our osd nodes to 
new racks. Accordingly, we changed the crush map to reflect our OSD nodes' new 
rack positions and that triggered a huge rebalance.


We're now getting OSD nearfull warnings on OSDs across all the racks. Started 
off with 1 nearfull, up to 5 now. OSDs within the same OSD node have a wide 
variance of capacity used. Within the same node there are OSDs which are 85% 
full and 49% full. We tried ceph osd reweight-by-utilization, but it didn't 
appear to do anything, with the nearfull OSDs still filling up. We have 
observed that the utilization of nearfull OSDs varies and will go up and back 
down.


There is also one backfillfull warning we're seeing, the OSD is only 77% 
utilized. Not exactly sure why it would be warning us when it's not near 
backfillfull_ratio.


$ ceph osd dump | grep ratio
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85


Our total capacity used is at 64%


Performance seems alright, a little slower than normal. There was a period of 
time where the CephFS was very slow with file operations taking between 10 - 30 
seconds to complete (rgw was fine during that time). That has seems to have 
cleared out now.


Does any of the behavior described seem normal? Should we be concerned about 
anything?


Thanks!

--

Vincent Chu

A-4: Advanced Research in Cyber Systems

Los Alamos National Laboratory
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to