[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-13 Thread Kalle Happonen
Hi Stefan, we had been seeing OSDs OOMing on 14.2.13, but on a larger scale. In our case we hit a some bugs with pg_log memory growth and buffer_anon memory growth. Can you check what's taking up the memory on the OSD with the following command? ceph daemon osd.123 dump_mempools Cheers, Kalle

[ceph-users] Re: PGs down

2020-12-13 Thread Jeremy Austin
OSD 12 looks much the same.I don't have logs back to the original date, but this looks very similar — db/sst corruption. The standard fsck approaches couldn't fix it. I believe it was a form of ATA failure — OSD 11 and 12, if I recall correctly, did not actually experience SMARTD-reportable errors.

[ceph-users] Re: pool nearfull, 300GB rbd image occupies 11TB!

2020-12-13 Thread mk
removing journaling feature flag from pool was running for 1h and pool shrinks interactively thx2u & thx2all Max > On 13. Dec 2020, at 16:49, Anthony D'Atri wrote: > > I suspect so, if rbd-mirror is fully disabled. If it’s still enabled for the > pool or image, removing it may fail. > > Tu

[ceph-users] Re: pool nearfull, 300GB rbd image occupies 11TB!

2020-12-13 Thread Anthony D'Atri
I suspect so, if rbd-mirror is fully disabled. If it’s still enabled for the pool or image, removing it may fail. Turn it off and we’ll both find out for sure. > On Dec 13, 2020, at 7:36 AM, mk wrote: > > In fact journaling was enabled, is it enough to disable feature and pool > shrin

[ceph-users] Re: pool nearfull, 300GB rbd image occupies 11TB!

2020-12-13 Thread mk
In fact journaling was enabled, is it enough to disable feature and pool shrinks automatically again? Or still any additional actions are required? — Max > On 13. Dec 2020, at 15:53, Anthony D'Atri wrote: > > rbd status > rbd info > > If the ‘journaling’ flag is enabled, use ‘rbd feature’ to

[ceph-users] Re: pool nearfull, 300GB rbd image occupies 11TB!

2020-12-13 Thread mk
Yes, few months ago I had enabled mirroring for few weeks and disabled again. Is there any additional actions has to be taken regarding journalling also ?? fyi. I also have copied rbd image into one newly created pool but after few weeks new pool grew up again to 11TB which is current state --

[ceph-users] Re: pool nearfull, 300GB rbd image occupies 11TB!

2020-12-13 Thread Jason Dillaman
On Sun, Dec 13, 2020 at 6:03 AM mk wrote: > > rados ls -p ssdshop > outputs 20MB of lines without any bench prefix > ... > rbd_data.d4993cc3c89825.74ec > rbd_data.d4993cc3c89825.1634 > journal_data.83.d4993cc3c89825.333485 > journal_data.83.d4993cc3c89825.380648 > journal_d

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-13 Thread Stefan Wild
Hi Igor, Full osd logs from startup to failed exit: https://tiltworks.com/osd.1.log In other news, can I expect osd.10 to go down next? Dec 13 07:40:14 ceph-tpa-server1 bash[1825010]: debug 2020-12-13T12:40:14.823+ 7ff37c2e1700 -1 osd.7 13375 heartbeat_check: no reply from 172.18.189.20:68

[ceph-users] Re: pool nearfull, 300GB rbd image occupies 11TB!

2020-12-13 Thread mk
rados ls -p ssdshop outputs 20MB of lines without any bench prefix ... rbd_data.d4993cc3c89825.74ec rbd_data.d4993cc3c89825.1634 journal_data.83.d4993cc3c89825.333485 journal_data.83.d4993cc3c89825.380648 journal_data.83.d4993cc3c89825.503838 ... > On 13. Dec 2020, at 11:0

[ceph-users] Re: pool nearfull, 300GB rbd image occupies 11TB!

2020-12-13 Thread Anthony D'Atri
Any chance you might have orphaned `rados bench` objects ? This happens more than one might think. `rados ls > /tmp/out` Inspect the result. You should see a few administrative objects, some header and data objects for the RBD volume. If you see a zillion with names like `bench*` there’s y