Hi Guys,

I've got a funny one I'm hoping someone can point me in the right direction 
with:

We've got three identical(?) Ceph nodes running 4 OSDs, Mon, Man, and iSCSI G/W 
each (we're only a small shop) on Rocky Linux 8 / Ceph Quincy. Everything is 
running fine, no bottle-necks (as far as we can see) and the Cluster is holding 
up very well.

However, one of the boxes is constantly running out of space on the /var mount. 
Its 16 GiB in size, and it only takes a day or three to fill up, thus taking 
it's monitor service out of quorum.

The thing is, I can't find *what's* taking up all the space. At first we 
thought it was an overly large log file, but I've done searches to find the 
largest files, etc, and nothing is showing up (that I can find) - ie the log 
files on this box are comparable with the log files on the other two boxes and 
the other two boxes are sitting at around 10% full (via a df-H), while the 
problem box is at around 85% and growing (at time of posting).

Another interesting point is that the problem box, unrelated to this issue, was 
rebooted recently and when it came back on-line the space-issue was gone ie the 
/var mount was back down to around the 10% mark.

This suggests to me its some sort of "temporary" journal/log/dump/whatever/? 
that was "reset" (cleaned-up?) via the reboot.

I've had a look at the logs but I'm not sure what I should be looking for - so 
I don't even know if I'm looking in the *correct* logs...

Anyone got any ideas? I mean, rebooting the server every couple of days is not 
really a practical solution, and neither is turning off the monitor service on 
the box, and increasing the size of the /var mount just seems like it'll 
postpone the issue.

Any help would be greatly appreciated.

Cheers

Dulux-Oz
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to