[ceph-users] Re: MDS replay questions

2021-10-06 Thread Daniel Persson
Hi Brian I'm not sure if it applies to your application, and I'm not an expert. However, we have been running our solution for about a year now, and we have one of our MDS's in standby-replay. Sadly we have found a bug with extensive memory usage, and when we needed to replay, it took up to a min

[ceph-users] Re: CEPH 16.2.x: disappointing I/O performance

2021-10-06 Thread Zakhar Kirpichenko
Hi, Thanks for this! As I mentioned in my original message, the latency is rather low, under 0.15 ms RTT. HDD write caches are disabled (I disabled them when setting the cluster up and verified just now with sdparm). /Z On Wed, Oct 6, 2021 at 9:18 AM Christian Wuerdig < christian.wuer...@gmail.

[ceph-users] Re: CEPH 16.2.x: disappointing I/O performance

2021-10-06 Thread Zakhar Kirpichenko
Hi, Indeed, that's a lot of CPU and RAM, the idea was to put sufficient resources in case we want to expand the nodes with more storage and do EC. I guess having excessive resources shouldn't hurt performance? :-) /Z On Wed, Oct 6, 2021 at 9:26 AM Stefan Kooman wrote: > On 10/5/21 17:06, Zakha

[ceph-users] Re: CEPH 16.2.x: disappointing I/O performance

2021-10-06 Thread Anthony D'Atri
> I guess having excessive resources shouldn't hurt performance? :-) You’d think so — but I’ve seen a situation where it seemed to. Dedicated mon nodes with dual CPUs far in excess of what they needed. C-state flapping appeared to negatively impact the NIC driver and network (and mon) perfor

[ceph-users] Re: 1 MDS report slow metadata IOs

2021-10-06 Thread Eugen Block
what is causing the slow MDS metadata IOs ? Your flapping OSDs. currently, there are 2 mds and 3 monitors deployed .. would it help to just one mds and one monitor ? No, you need to figure out why your OSDs crash. More details about your setup (ceph version, deployment method, hardware res

[ceph-users] Re: CEPH 16.2.x: disappointing I/O performance

2021-10-06 Thread Zakhar Kirpichenko
I've initially disabled power-saving features, which nicely improved the network latency. Btw, the first interesting find: I enabled 'rbd_balance_parent_reads' on the clients, and single-thread reads now scale much better, I routinely get similar readings from a single disk doing 4k reads with 1 t

[ceph-users] Re: Leader election loop reappears

2021-10-06 Thread Manuel Holtgrewe
Dear all, again answering my own emails... It turns out that the connection resets are not problematic. I took the liberty to document this here in the tracker in the hope that users with similar issues can find my results. https://tracker.ceph.com/issues/52825 As for my original issue: As descr

[ceph-users] Re: CEPH 16.2.x: disappointing I/O performance

2021-10-06 Thread Janne Johansson
Den ons 6 okt. 2021 kl 10:11 skrev Zakhar Kirpichenko : > > I've initially disabled power-saving features, which nicely improved the > network latency. > > Btw, the first interesting find: I enabled 'rbd_balance_parent_reads' on > the clients, and single-thread reads now scale much better, I routin

[ceph-users] Re: Orchestrator is internally ignoring applying a spec against SSDs, apparently determining they're rotational.

2021-10-06 Thread Eugen Block
You should either zap the devices with ceph orch device zap my_hostname my_path --force or with ceph-volume directly on that host: cephadm ceph-volume lvm zap --destroy /dev/sdX IIRC there's a backup of the partition table at the end of the partition. I would expect ceph-volume to identify t

[ceph-users] Re: CEPH 16.2.x: disappointing I/O performance

2021-10-06 Thread Zakhar Kirpichenko
That's true, but we do intend to scale this cluster as necessary. Some new nodes are already being prepared, the only thing I'm really worried about is further expansion: we're now getting 3+ months of lead time on some components and vendors suggest that it will get worse :-( /Z On Wed, Oct 6, 2

[ceph-users] Re: CEPH 16.2.x: disappointing I/O performance

2021-10-06 Thread Zakhar Kirpichenko
These are valid points, thank you for the input! /Z On Wed, Oct 6, 2021 at 11:39 AM Stefan Kooman wrote: > On 10/6/21 09:23, Zakhar Kirpichenko wrote: > > Hi, > > > > Indeed, that's a lot of CPU and RAM, the idea was to put sufficient > > resources in case we want to expand the nodes with more

[ceph-users] bluefs _allocate unable to allocate

2021-10-06 Thread José H . Freidhof
Hello together we have a running ceph pacific 16.2.5 cluster and i found this messages in the service logs of the osd daemons. we have three osd nodes .. each node has 20osds as bluestore with nvme/ssd/hdd is this a bug or maybe i have some settings wrong? cd88-ceph-osdh-01 bash[6283]: debug 2

[ceph-users] Re: bluefs _allocate unable to allocate

2021-10-06 Thread Igor Fedotov
Hey Jose, it looks like your WAL volume is out of space which looks weird given its capacity = 48Gb. Could you please share the output of the following commands: ceph daemon osd.N bluestore bluefs device info ceph daemon osd.N bluefs stats Thanks, Igor On 10/6/2021 12:24 PM, José H. Fre

[ceph-users] Re: CEPH 16.2.x: disappointing I/O performance

2021-10-06 Thread Stefan Kooman
On 10/5/21 17:06, Zakhar Kirpichenko wrote: Hi, I built a CEPH 16.2.x cluster with relatively fast and modern hardware, and its performance is kind of disappointing. I would very much appreciate an advice and/or pointers :-) The hardware is 3 x Supermicro SSG-6029P nodes, each equipped with: 2

[ceph-users] Re: bluefs _allocate unable to allocate

2021-10-06 Thread José H . Freidhof
Hello Igor, yes the volume is nvme wal partitions for the bluestore devicegroups are only 48gb each on each osd node are 1 nvme with 1tb splitted in 20 lvs with 48gb (WAL) on each osd node are 4 ssd with 1tb splitted in 5 lvs with 175gb (rock.db) on each osd node are 20 hdd with 5.5tb with 1 lv

[ceph-users] Re: MDS replay questions

2021-10-06 Thread Stefan Kooman
On 10/6/21 00:06, Brian Kim wrote: Dear ceph-users, We have a ceph cluster with 3 MDS's and recently had to replay our cache which is taking an extremely long time to complete. Is there some way to speed up this process as well as apply some checkpoint so it doesn't have to start all the way fro

[ceph-users] Re: bluefs _allocate unable to allocate

2021-10-06 Thread Igor Fedotov
Jose, In fact 48GB is a way too much for WAL drive - usually the write ahead log tend to be 2-4 GBs. But in your case it's ~150GB, while DB itself is very small (146MB!!!): WAL         45 GiB      111 GiB     0 B         0 B         0 B       154 GiB     2400 DB          0 B         164 Mi

[ceph-users] Re: bluefs _allocate unable to allocate

2021-10-06 Thread José H . Freidhof
Hi Igor, yes i have some osd settings set :-) here are my ceph config dump. those settings are from a redhat document for bluestore devices maybe it is that setting causing this problem? "advanced mon_compact_on_trimfalse"??? i will test it this afternoon... at the moment are everything semi

[ceph-users] Re: bluefs _allocate unable to allocate

2021-10-06 Thread Igor Fedotov
On 10/6/2021 2:16 PM, José H. Freidhof wrote: Hi Igor, yes i have some osd settings set :-) here are my ceph config dump. those settings are from a redhat document for bluestore devices maybe it is that setting causing this problem? "advanced mon_compact_on_trimfalse"??? OMG!!! No - mon

[ceph-users] Re: CEPH 16.2.x: disappointing I/O performance

2021-10-06 Thread Stefan Kooman
On 10/6/21 09:23, Zakhar Kirpichenko wrote: Hi, Indeed, that's a lot of CPU and RAM, the idea was to put sufficient resources in case we want to expand the nodes with more storage and do EC. I guess having excessive resources shouldn't hurt performance? :-) That was also my take. Untill an (

[ceph-users] Re: bluefs _allocate unable to allocate

2021-10-06 Thread José H . Freidhof
hi, no risk no fun 😂 okay I have reset the settings you mentioned to standard. what you exactly mean with taking offline the osd? ceph orch daemon stop osd.2? or mark down? for the command which path i use? you mean: bluestore-kv /var/lib/ceph/$fsid/osd.2 compact??? Igor Fedotov schrieb am

[ceph-users] Re: bluefs _allocate unable to allocate

2021-10-06 Thread Igor Fedotov
On 10/6/2021 4:25 PM, José H. Freidhof wrote: hi, no risk no fun 😂 okay I have reset the settings you mentioned to standard. what you exactly mean with taking offline the osd? ceph orch daemon stop osd.2? or mark down? "daemon stop" is enough. You  might want to set noout flag before that t

[ceph-users] Cephfs + inotify

2021-10-06 Thread nORKy
Hi, intotify does not work with cephfs. How can I make inotify work or build an alternative on my C program ? Thanks you 'Joffrey ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: bluefs _allocate unable to allocate

2021-10-06 Thread José H . Freidhof
Hi Igor, today i repaired one osd node and all osd´s on the node, creating them new again after that i waited for the rebalance/recovery process and the cluster was healthy after some hours.. i notices that the osd.2 does not have any more this error in the log. but i noticed it now on the sa

[ceph-users] data objects lost but RGW objects still listed

2021-10-06 Thread Mara Sophie Grosch
Hi, some weeks ago I had a mix of operator error and suspected rook bug (not sure on that, other topic) leading to the loss of quite some objects on my two EC pools - which are the data pools for a RGW zone/realm/whatever each. The problem I'm currently facing is RGW still listing the lost objec

[ceph-users] Cache tiers hit_set values

2021-10-06 Thread Daniel Persson
Hi everyone. I added a lot more storage to our cluster, and we now have a lot of slower hard drives that could contain archival data. So I thought setting up a cache tier for the fast drives should be a good idea. We want to retain data for about a week in the cache pool as the data could be inte

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-06 Thread Stefan Kooman
On 10/6/21 15:54, von Hoesslin, Volker wrote: okay, i run this command: rados rm -p cephfs_metadata mds0_openfiles.0 start the MDS daemons and the "Malformed input" error seems to fixed. i have 3 MDS daemons, two of them are running in "standby" modus and the active one (pve04) is restarting

[ceph-users] Re: mds openfiles table shards

2021-10-06 Thread Stefan Kooman
On 1/21/21 16:51, Dan van der Ster wrote: Hi all, During rejoin an MDS can sometimes go OOM if the openfiles table is too large. The workaround has been described by ceph devs as "rados rm -p cephfs_metadata mds0_openfiles.0". On our cluster we have several such objects for rank 0: mds0_openfi

[ceph-users] Re: CEPH 16.2.x: disappointing I/O performance

2021-10-06 Thread Fyodor Ustinov
Hi! > Btw, the first interesting find: I enabled 'rbd_balance_parent_reads' on > the clients, and single-thread reads now scale much better, I routinely get > similar readings from a single disk doing 4k reads with 1 thread: It seems to me that this function should not give any gain in "real" loa

[ceph-users] Multi-MDS CephFS upgrades limitation

2021-10-06 Thread Bryan Stillwell
One of the main limitations of using CephFS is the requirement to reduce the number of active MDS daemons to one during upgrades. As far as I can tell this has been a known problem since Luminous (~2017). This issue essentially requires downtime during upgrades for any CephFS cluster that needs m

[ceph-users] Re: CEPH 16.2.x: disappointing I/O performance

2021-10-06 Thread Zakhar Kirpichenko
Hi, I tried experimenting with RBD striping feature: rbd image 'volume-bd873c3f-c8c7-4270-81f8-951f65fc860c': size 50 GiB in 12800 objects order 22 (4 MiB objects) snapshot_count: 0 id: 187607b9ebf28a block_name_prefix: rbd_data.187607b9ebf28a format: 2 features: layering, striping, exclusive-loc