Hi Brian
I'm not sure if it applies to your application, and I'm not an expert.
However, we have been running our solution for about a year now, and we
have one of our MDS's in standby-replay.
Sadly we have found a bug with extensive memory usage, and when we needed
to replay, it took up to a min
Hi,
Thanks for this!
As I mentioned in my original message, the latency is rather low, under
0.15 ms RTT. HDD write caches are disabled (I disabled them when setting
the cluster up and verified just now with sdparm).
/Z
On Wed, Oct 6, 2021 at 9:18 AM Christian Wuerdig <
christian.wuer...@gmail.
Hi,
Indeed, that's a lot of CPU and RAM, the idea was to put sufficient
resources in case we want to expand the nodes with more storage and do EC.
I guess having excessive resources shouldn't hurt performance? :-)
/Z
On Wed, Oct 6, 2021 at 9:26 AM Stefan Kooman wrote:
> On 10/5/21 17:06, Zakha
> I guess having excessive resources shouldn't hurt performance? :-)
You’d think so — but I’ve seen a situation where it seemed to.
Dedicated mon nodes with dual CPUs far in excess of what they needed. C-state
flapping appeared to negatively impact the NIC driver and network (and mon)
perfor
what is causing the slow MDS metadata IOs ?
Your flapping OSDs.
currently, there are 2 mds and 3 monitors deployed ..
would it help to just one mds and one monitor ?
No, you need to figure out why your OSDs crash. More details about
your setup (ceph version, deployment method, hardware res
I've initially disabled power-saving features, which nicely improved the
network latency.
Btw, the first interesting find: I enabled 'rbd_balance_parent_reads' on
the clients, and single-thread reads now scale much better, I routinely get
similar readings from a single disk doing 4k reads with 1 t
Dear all,
again answering my own emails... It turns out that the connection resets
are not problematic. I took the liberty to document this here in the
tracker in the hope that users with similar issues can find my results.
https://tracker.ceph.com/issues/52825
As for my original issue: As descr
Den ons 6 okt. 2021 kl 10:11 skrev Zakhar Kirpichenko :
>
> I've initially disabled power-saving features, which nicely improved the
> network latency.
>
> Btw, the first interesting find: I enabled 'rbd_balance_parent_reads' on
> the clients, and single-thread reads now scale much better, I routin
You should either zap the devices with
ceph orch device zap my_hostname my_path --force
or with ceph-volume directly on that host:
cephadm ceph-volume lvm zap --destroy /dev/sdX
IIRC there's a backup of the partition table at the end of the
partition. I would expect ceph-volume to identify t
That's true, but we do intend to scale this cluster as necessary. Some new
nodes are already being prepared, the only thing I'm really worried about
is further expansion: we're now getting 3+ months of lead time on some
components and vendors suggest that it will get worse :-(
/Z
On Wed, Oct 6, 2
These are valid points, thank you for the input!
/Z
On Wed, Oct 6, 2021 at 11:39 AM Stefan Kooman wrote:
> On 10/6/21 09:23, Zakhar Kirpichenko wrote:
> > Hi,
> >
> > Indeed, that's a lot of CPU and RAM, the idea was to put sufficient
> > resources in case we want to expand the nodes with more
Hello together
we have a running ceph pacific 16.2.5 cluster and i found this messages in
the service logs of the osd daemons.
we have three osd nodes .. each node has 20osds as bluestore with
nvme/ssd/hdd
is this a bug or maybe i have some settings wrong?
cd88-ceph-osdh-01 bash[6283]: debug 2
Hey Jose,
it looks like your WAL volume is out of space which looks weird given
its capacity = 48Gb.
Could you please share the output of the following commands:
ceph daemon osd.N bluestore bluefs device info
ceph daemon osd.N bluefs stats
Thanks,
Igor
On 10/6/2021 12:24 PM, José H. Fre
On 10/5/21 17:06, Zakhar Kirpichenko wrote:
Hi,
I built a CEPH 16.2.x cluster with relatively fast and modern hardware, and
its performance is kind of disappointing. I would very much appreciate an
advice and/or pointers :-)
The hardware is 3 x Supermicro SSG-6029P nodes, each equipped with:
2
Hello Igor,
yes the volume is nvme wal partitions for the bluestore devicegroups are
only 48gb each
on each osd node are 1 nvme with 1tb splitted in 20 lvs with 48gb (WAL)
on each osd node are 4 ssd with 1tb splitted in 5 lvs with 175gb (rock.db)
on each osd node are 20 hdd with 5.5tb with 1 lv
On 10/6/21 00:06, Brian Kim wrote:
Dear ceph-users,
We have a ceph cluster with 3 MDS's and recently had to replay our cache
which is taking an extremely long time to complete. Is there some way to
speed up this process as well as apply some checkpoint so it doesn't have
to start all the way fro
Jose,
In fact 48GB is a way too much for WAL drive - usually the write ahead
log tend to be 2-4 GBs.
But in your case it's ~150GB, while DB itself is very small (146MB!!!):
WAL 45 GiB 111 GiB 0 B 0 B 0 B
154 GiB 2400
DB 0 B 164 Mi
Hi Igor,
yes i have some osd settings set :-) here are my ceph config dump. those
settings are from a redhat document for bluestore devices
maybe it is that setting causing this problem? "advanced
mon_compact_on_trimfalse"???
i will test it this afternoon... at the moment are everything semi
On 10/6/2021 2:16 PM, José H. Freidhof wrote:
Hi Igor,
yes i have some osd settings set :-) here are my ceph config dump. those
settings are from a redhat document for bluestore devices
maybe it is that setting causing this problem? "advanced
mon_compact_on_trimfalse"???
OMG!!!
No - mon
On 10/6/21 09:23, Zakhar Kirpichenko wrote:
Hi,
Indeed, that's a lot of CPU and RAM, the idea was to put sufficient
resources in case we want to expand the nodes with more storage and do
EC. I guess having excessive resources shouldn't hurt performance? :-)
That was also my take. Untill an (
hi,
no risk no fun 😂 okay
I have reset the settings you mentioned to standard.
what you exactly mean with taking offline the osd? ceph orch daemon stop
osd.2? or mark down?
for the command which path i use? you mean:
bluestore-kv /var/lib/ceph/$fsid/osd.2 compact???
Igor Fedotov schrieb am
On 10/6/2021 4:25 PM, José H. Freidhof wrote:
hi,
no risk no fun 😂 okay
I have reset the settings you mentioned to standard.
what you exactly mean with taking offline the osd? ceph orch daemon stop
osd.2? or mark down?
"daemon stop" is enough. You might want to set noout flag before that
t
Hi,
intotify does not work with cephfs. How can I make inotify work or build
an alternative on my C program ?
Thanks you
'Joffrey
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hi Igor,
today i repaired one osd node and all osd´s on the node, creating them new
again
after that i waited for the rebalance/recovery process and the cluster was
healthy after some hours..
i notices that the osd.2 does not have any more this error in the log.
but i noticed it now on the sa
Hi,
some weeks ago I had a mix of operator error and suspected rook bug (not
sure on that, other topic) leading to the loss of quite some objects on
my two EC pools - which are the data pools for a RGW zone/realm/whatever
each.
The problem I'm currently facing is RGW still listing the lost objec
Hi everyone.
I added a lot more storage to our cluster, and we now have a lot of slower
hard drives that could contain archival data. So I thought setting up a
cache tier for the fast drives should be a good idea.
We want to retain data for about a week in the cache pool as the data could
be inte
On 10/6/21 15:54, von Hoesslin, Volker wrote:
okay, i run this command: rados rm -p cephfs_metadata mds0_openfiles.0
start the MDS daemons and the "Malformed input" error seems to fixed. i
have 3 MDS daemons, two of them are running in "standby" modus and the
active one (pve04) is restarting
On 1/21/21 16:51, Dan van der Ster wrote:
Hi all,
During rejoin an MDS can sometimes go OOM if the openfiles table is too large.
The workaround has been described by ceph devs as "rados rm -p
cephfs_metadata mds0_openfiles.0".
On our cluster we have several such objects for rank 0:
mds0_openfi
Hi!
> Btw, the first interesting find: I enabled 'rbd_balance_parent_reads' on
> the clients, and single-thread reads now scale much better, I routinely get
> similar readings from a single disk doing 4k reads with 1 thread:
It seems to me that this function should not give any gain in "real" loa
One of the main limitations of using CephFS is the requirement to reduce the
number of active MDS daemons to one during upgrades. As far as I can tell this
has been a known problem since Luminous (~2017). This issue essentially
requires downtime during upgrades for any CephFS cluster that needs m
Hi,
I tried experimenting with RBD striping feature:
rbd image 'volume-bd873c3f-c8c7-4270-81f8-951f65fc860c':
size 50 GiB in 12800 objects
order 22 (4 MiB objects)
snapshot_count: 0
id: 187607b9ebf28a
block_name_prefix: rbd_data.187607b9ebf28a
format: 2
features: layering, striping, exclusive-loc
31 matches
Mail list logo