[ceph-users] ./install-deps.sh takes several hours

2023-03-31 Thread Arvid Picciani
Hi again, something is very wrong with my hardware it seems and i'm slowly turning insane. I'm trying to debug why ceph has incredibly poor performance for us. we've got - 3 EPYC 7713 dual-cpu systems - datacenter nvme drives (3GB/s top) - 100G infiniband ceph does 800MB/s read max, CPU is i

[ceph-users] Re: Unexpected slow read for HDD cluster (good write speed)

2023-03-27 Thread Arvid Picciani
Yes, during my last adventure of trying to get any reasonable performance out of ceph, i realized my testing methodology was wrong. Both the kernel client and qemu have queues everywhere that make the numbers hard to understand. fio has rbd support, which gives more useful values. https://subscri

[ceph-users] Re: Almalinux 9

2023-03-27 Thread Arvid Picciani
on rocky, which should be identical to alma (?), i had to do this: https://almalinux.discourse.group/t/nothing-provides-python3-pecan-in-almalinux-9/2017/4 because the rpm has a broken dependency to pecan. But switching from debian to the official ceph rpm packages was worth it. The systemd unit

[ceph-users] handle_read_frame_preamble_main read frame preamble failed r=-1 ((1) Operation not permitted)

2023-03-14 Thread Arvid Picciani
since quincy i'm randomly getting authentication issues from clients to osds. symptom is qemu hangs, but when it happens, i can reproduce it using: > ceph tell osd.\* version some - but only some - osds will never respond, but only to clients on _some_ hosts. the client gets stuck in a loop w

[ceph-users] quincy: test cluster on nvme: fast write, slow read

2023-03-12 Thread Arvid Picciani
Hi, Doing some lab tests to understand why ceph isnt working for us, and here's the first puzzle: setup: A completely fresh quincy cluster, 64 core EPYC 7713, 2 nvme drives > ceph osd crush rule create-replicated osd default osd ssd > ceph osd pool create rbd replicated osd --size 2 > dd if=/d

[ceph-users] tools to debug librbd / qemu

2023-02-25 Thread Arvid Picciani
Heya, ever since we had that one osd causing the entire cluster to hang (it's been removed since), we keep having hard to debug issues. for example sometimes on start, qemu just hangs forever. when i kill it manually, the next start works fine. when i map the same volume using krbd on another hos

[ceph-users] forever stuck "slow ops" osd

2023-02-16 Thread Arvid Picciani
Hi, today our entire cluster froze. or anything that uses librbd to be specific. ceph version 16.2.10 The message that saved me was "256 slow ops, oldest one blocked for 2893 sec, osd.7 has slow ops" , because it makes it immediately clear that this osd is the issue. I stopped the osd, which mad