[ceph-users] Re: Stuck in upgrade process to reef

2024-01-17 Thread Igor Fedotov
Hi Jan, w.r.t. osd.0 - if this is the only occurrence then I'd propose simply redeploy the OSD. This looks like some BlueStore metadata inconsistency which could occur long before the upgrade. Likely the upgrade just revealed the issue.  And honestly I can hardly imagine how to investigate it

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-17 Thread Jan Marek
Hi Igor, many thanks for advice! I've tried to start osd.1 and it started already, now it's resynchronizing data. I will start daemons one-by-one. What do you mean about osd.0, which have a problem with bluestore fsck? Is there a way to repair it? Sincerely Jan Dne Út, led 16, 2024 at 08:15:

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-16 Thread Igor Fedotov
Hi Jan, I've just fired an upstream ticket for your case, see https://tracker.ceph.com/issues/64053 for more details. You might want to tune (or preferably just remove) your custom bluestore_cache_.*_ratio settings to fix the issue. This is reproducible and fixable in my lab this way. Hop

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-11 Thread Igor Fedotov
Hi Jan, unfortunately this wasn't very helpful. Moreover the log looks a bit messy - looks like a mixture of outputs from multiple running instances or something. I'm not an expert in using containerized setups though. Could you please simplify things by running ceph-osd process manually lik

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-10 Thread Igor Fedotov
Hi Jan, indeed this looks like some memory allocation problem - may be OSD's RAM usage threshold reached or something? Curious if you have any custom OSD settings or may be any memory caps for Ceph containers? Could you please set debug_bluestore to 5/20 and debug_prioritycache to 10 and t

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-09 Thread Igor Fedotov
Hi Marek, I haven't looked through those upgrade logs yet but here are some comments regarding last OSD startup attempt. First of answering your question _init_alloc::NCB::restore_allocator() failed! Run Full Recovery from ONodes (might take a while) Is it a mandatory part of fsck? Thi

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-08 Thread Igor Fedotov
Hi Jan, indeed fsck logs for the OSDs other than osd.0 look good so it would be interesting to see OSD startup logs for them. Preferably to have that for multiple (e.g. 3-4) OSDs to get the pattern. Original upgrade log(s) would be nice to see as well. You might want to use Google Drive or a

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-04 Thread Jan Marek
Hi Igor, I've tried to start only osd.1, which seems to be fsck'd OK, but it crashed :-( I search logs and I've found, that I have logs from 22.12.2023, when I've did a upgrade (I have set logging to journald). Would you be interested in those logs? This file have 30MB in bzip2 format, how I can

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-04 Thread Igor Fedotov
Hi Jan, may I see the fsck logs from all the failing OSDs to see the pattern. IIUC the full node is suffering from the issue, right? Thanks, Igor On 1/2/2024 10:53 AM, Jan Marek wrote: Hello once again, I've tried this: export CEPH_ARGS="--log-file /tmp/osd.0.log --debug-bluestore 5/20"

[ceph-users] Re: Stuck in upgrade process to reef

2023-12-30 Thread Igor Fedotov
Hi Jan, this doesn't look like RocksDB corruption but rather like some BlueStore metadata inconsistency. Also assertion backtrace in the new log looks completely different from the original one. So in an attempt to find any systematic pattern I'd suggest to run fsck with verbose logging for e

[ceph-users] Re: Stuck in upgrade process to reef

2023-12-27 Thread Igor Fedotov
Hi Jan, IIUC the attached log is for ceph-kvstore-tool, right? Can you please share full OSD startup log as well? Thanks, Igor On 12/27/2023 4:30 PM, Jan Marek wrote: Hello, I've problem: my ceph cluster (3x mon nodes, 6x osd nodes, every osd node have 12 rotational disk and one NVMe devic