[ceph-users] Grafana host overview -- "no data"?

2022-05-12 Thread Harry G. Coin
I've a 'healthy' cluster with a dashboard where Grafana correctly reports the number of osds on a host and the correct raw capacity -- and 'no data' for any time period, for any of the osd's (dockerized Quincy).  Meanwhile the top level dashboard cluster reports reasonable client throughput rea

[ceph-users] Re: The last 15 'degraded' items take as many hours as the first 15K?

2022-05-12 Thread Harry G. Coin
On 5/12/22 02:05, Janne Johansson wrote: Den tors 12 maj 2022 kl 00:03 skrev Harry G. Coin : Might someone explain why the count of degraded items can drop thousands, sometimes tens of thousands in the same number of hours it takes to go from 10 to 0? For example, when an OSD or a host with a f

[ceph-users] Re: How much IOPS can be expected on NVME OSDs

2022-05-12 Thread Mark Nelson
Hi Felix, Those are pretty good drives and shouldn't have too much trouble with O_DSYNC writes which can often be a bottleneck for lower end NVMe drives.  Usually if the drives are fast enough, it comes down to clock speed and cores.  Clock speed helps the kv sync thread write metadata to the

[ceph-users] Re: rbd mirroring - journal growing and snapshot high io load

2022-05-12 Thread Arthur Outhenin-Chalandre
On 5/12/22 14:31, ronny.lippold wrote: > many thanks, we will check the slides ... are looking great > > >>> >>> ok, you mean, that the growing came, cause of replication is to slow? >>> strange ... i thought our cluster is not so big ... but ok. >>> so, we cannot use journal ... >>> maybe some e

[ceph-users] Re: rbd mirroring - journal growing and snapshot high io load

2022-05-12 Thread ronny.lippold
many thanks, we will check the slides ... are looking great ok, you mean, that the growing came, cause of replication is to slow? strange ... i thought our cluster is not so big ... but ok. so, we cannot use journal ... maybe some else have same result? If you want a bit more details on this

[ceph-users] Re: MDS rejects clients causing hanging mountpoint on linux kernel client

2022-05-12 Thread Esther Accion
Hi, Sorry to rescue this old thread, but we are seeing the same problem reported by Florian and Dan. In our case, the MDS's are running 14.2.21 and the clients have kernel 3.10.0-1160.62.1.el7.x86_64. Did you manage to solve this issue in el7 kernels? Thanks! Esther El lun, 8 feb 2021 a las 11:

[ceph-users] Re: rbd mirroring - journal growing and snapshot high io load

2022-05-12 Thread Arthur Outhenin-Chalandre
On 5/12/22 13:25, ronny.lippold wrote: > hi arthur and thanks for answering, > > > Am 2022-05-12 13:06, schrieb Arthur Outhenin-Chalandre: >> Hi Ronny > >> >> Yes according to my test we were not able to have a good replication >> speed on a single image (I think it was 30Mb/s per image somethin

[ceph-users] Re: rbd mirroring - journal growing and snapshot high io load

2022-05-12 Thread ronny.lippold
hi arthur and thanks for answering, Am 2022-05-12 13:06, schrieb Arthur Outhenin-Chalandre: Hi Ronny Yes according to my test we were not able to have a good replication speed on a single image (I think it was 30Mb/s per image something like that). So you have probably a few image that writ

[ceph-users] Re: rbd mirroring - journal growing and snapshot high io load

2022-05-12 Thread Arthur Outhenin-Chalandre
Hi Ronny On 5/12/22 12:47, ronny.lippold wrote: > hi to all here > we tried a lot and now, we need your help ... > > we are using 5 proxmox 7.2-3 server with kernel 5.15.30-2-pve and ceph > 16.2.7.. > per server, we use 9 osd (8x 2tb, 1x8tb both sas ssd, connected via sas > hba) > the second cl

[ceph-users] How much IOPS can be expected on NVME OSDs

2022-05-12 Thread Stolte, Felix
Hi guys, we recently got new hardware with NVME disks (Samsung MZPLL3T2HAJQ) and i am trying to figure out, how to get the most of them. Vendor states 180k for 4k random writes and my fio testing was 160K (fine by me). I built an bluestore OSD on top of that (WAL, DB, Data all on the same disk)

[ceph-users] rbd mirroring - journal growing and snapshot high io load

2022-05-12 Thread ronny.lippold
hi to all here we tried a lot and now, we need your help ... we are using 5 proxmox 7.2-3 server with kernel 5.15.30-2-pve and ceph 16.2.7.. per server, we use 9 osd (8x 2tb, 1x8tb both sas ssd, connected via sas hba) the second cluster for replication is the same hardware. at first, we trie

[ceph-users] libceph in kernel stack trace prior to ceph client's crash

2022-05-12 Thread Alejo Aragon
Hi list, We have some ceph clients that would reboot intermittently. We always see this stack dump ​from dmesg prior to the hosts rebooting: Jan 10 06:52:33 xhostnamex kernel: [38386170.332063] [ cut here ] Jan 10 06:52:33 xhostnamex kernel: [38386170.3

[ceph-users] Re: The last 15 'degraded' items take as many hours as the first 15K?

2022-05-12 Thread Janne Johansson
Den tors 12 maj 2022 kl 00:03 skrev Harry G. Coin : > Might someone explain why the count of degraded items can drop > thousands, sometimes tens of thousands in the same number of hours it > takes to go from 10 to 0? For example, when an OSD or a host with a few > OSD's goes offline for a while, r