Hi Jan,
w.r.t. osd.0 - if this is the only occurrence then I'd propose simply
redeploy the OSD. This looks like some BlueStore metadata inconsistency
which could occur long before the upgrade. Likely the upgrade just
revealed the issue. And honestly I can hardly imagine how to
investigate it
Hi Igor,
many thanks for advice!
I've tried to start osd.1 and it started already, now it's
resynchronizing data.
I will start daemons one-by-one.
What do you mean about osd.0, which have a problem with
bluestore fsck? Is there a way to repair it?
Sincerely
Jan
Dne Út, led 16, 2024 at 08:15:
Hi Jan,
I've just fired an upstream ticket for your case, see
https://tracker.ceph.com/issues/64053 for more details.
You might want to tune (or preferably just remove) your custom
bluestore_cache_.*_ratio settings to fix the issue.
This is reproducible and fixable in my lab this way.
Hop
Hi Jan,
unfortunately this wasn't very helpful. Moreover the log looks a bit
messy - looks like a mixture of outputs from multiple running instances
or something. I'm not an expert in using containerized setups though.
Could you please simplify things by running ceph-osd process manually
lik
Hi Jan,
indeed this looks like some memory allocation problem - may be OSD's RAM
usage threshold reached or something?
Curious if you have any custom OSD settings or may be any memory caps
for Ceph containers?
Could you please set debug_bluestore to 5/20 and debug_prioritycache to
10 and t
Hi Marek,
I haven't looked through those upgrade logs yet but here are some
comments regarding last OSD startup attempt.
First of answering your question
_init_alloc::NCB::restore_allocator() failed! Run Full Recovery from ONodes
(might take a while)
Is it a mandatory part of fsck?
Thi
Hi Jan,
indeed fsck logs for the OSDs other than osd.0 look good so it would be
interesting to see OSD startup logs for them. Preferably to have that
for multiple (e.g. 3-4) OSDs to get the pattern.
Original upgrade log(s) would be nice to see as well.
You might want to use Google Drive or a
Hi Igor,
I've tried to start only osd.1, which seems to be fsck'd OK, but
it crashed :-(
I search logs and I've found, that I have logs from 22.12.2023,
when I've did a upgrade (I have set logging to journald).
Would you be interested in those logs? This file have 30MB in
bzip2 format, how I can
Hi Jan,
may I see the fsck logs from all the failing OSDs to see the pattern.
IIUC the full node is suffering from the issue, right?
Thanks,
Igor
On 1/2/2024 10:53 AM, Jan Marek wrote:
Hello once again,
I've tried this:
export CEPH_ARGS="--log-file /tmp/osd.0.log --debug-bluestore 5/20"
Hi Jan,
this doesn't look like RocksDB corruption but rather like some BlueStore
metadata inconsistency. Also assertion backtrace in the new log looks
completely different from the original one. So in an attempt to find any
systematic pattern I'd suggest to run fsck with verbose logging for
e
Hi Jan,
IIUC the attached log is for ceph-kvstore-tool, right?
Can you please share full OSD startup log as well?
Thanks,
Igor
On 12/27/2023 4:30 PM, Jan Marek wrote:
Hello,
I've problem: my ceph cluster (3x mon nodes, 6x osd nodes, every
osd node have 12 rotational disk and one NVMe devic
11 matches
Mail list logo