Hi Stefan,
we had been seeing OSDs OOMing on 14.2.13, but on a larger scale. In our case
we hit a some bugs with pg_log memory growth and buffer_anon memory growth. Can
you check what's taking up the memory on the OSD with the following command?
ceph daemon osd.123 dump_mempools
Cheers,
Kalle
OSD 12 looks much the same.I don't have logs back to the original date, but
this looks very similar — db/sst corruption. The standard fsck approaches
couldn't fix it. I believe it was a form of ATA failure — OSD 11 and 12, if
I recall correctly, did not actually experience SMARTD-reportable errors.
removing journaling feature flag from pool was running for 1h and pool shrinks
interactively
thx2u & thx2all
Max
> On 13. Dec 2020, at 16:49, Anthony D'Atri wrote:
>
> I suspect so, if rbd-mirror is fully disabled. If it’s still enabled for the
> pool or image, removing it may fail.
>
> Tu
I suspect so, if rbd-mirror is fully disabled. If it’s still enabled for the
pool or image, removing it may fail.
Turn it off and we’ll both find out for sure.
> On Dec 13, 2020, at 7:36 AM, mk wrote:
>
> In fact journaling was enabled, is it enough to disable feature and pool
> shrin
In fact journaling was enabled, is it enough to disable feature and pool
shrinks automatically again? Or still any additional actions are required?
—
Max
> On 13. Dec 2020, at 15:53, Anthony D'Atri wrote:
>
> rbd status
> rbd info
>
> If the ‘journaling’ flag is enabled, use ‘rbd feature’ to
Yes, few months ago I had enabled mirroring for few weeks and disabled again.
Is there any additional actions has to be taken regarding journalling also ??
fyi. I also have copied rbd image into one newly created pool but after few
weeks new pool grew up again to 11TB which is current state
--
On Sun, Dec 13, 2020 at 6:03 AM mk wrote:
>
> rados ls -p ssdshop
> outputs 20MB of lines without any bench prefix
> ...
> rbd_data.d4993cc3c89825.74ec
> rbd_data.d4993cc3c89825.1634
> journal_data.83.d4993cc3c89825.333485
> journal_data.83.d4993cc3c89825.380648
> journal_d
Hi Igor,
Full osd logs from startup to failed exit:
https://tiltworks.com/osd.1.log
In other news, can I expect osd.10 to go down next?
Dec 13 07:40:14 ceph-tpa-server1 bash[1825010]: debug
2020-12-13T12:40:14.823+ 7ff37c2e1700 -1 osd.7 13375 heartbeat_check: no
reply from 172.18.189.20:68
rados ls -p ssdshop
outputs 20MB of lines without any bench prefix
...
rbd_data.d4993cc3c89825.74ec
rbd_data.d4993cc3c89825.1634
journal_data.83.d4993cc3c89825.333485
journal_data.83.d4993cc3c89825.380648
journal_data.83.d4993cc3c89825.503838
...
> On 13. Dec 2020, at 11:0
Any chance you might have orphaned `rados bench` objects ? This happens more
than one might think.
`rados ls > /tmp/out`
Inspect the result. You should see a few administrative objects, some header
and data objects for the RBD volume. If you see a zillion with names like
`bench*` there’s y
10 matches
Mail list logo