I would not call a ceph page, a random tuning tip. At least I hope they
are not. NVMe-only with 100Gbit is not really a standard setup. I assume
with such setup you have the luxury to not notice many optimizations.
What I mostly read is that changing to mtu 9000 will allow you to better
sat
To elaborate on some aspects that have been mentioned already and add
some others::
* Test using iperf3.
* Don't try to use jumbos on networks where you don't have complete
control over every host. This usually includes the main ceph
network. It's just too much grief. You can consider us
Hi,
(un)fortunately I can't test it because I managed to repair the pg.
snaptrim and snaptrim_wait have been a part of this particular pg's
status. As I was trying to look deeper into the case I had a watch on
ceph health detail and noticed that snaptrim/snaptrim_wait was suddenly
not a part of t
Interesting table. I have this on a production cluster 10gbit at a
datacenter (obviously doing not that much).
[@]# iperf3 -c 10.0.0.13 -P 1 -M 9000
Connecting to host 10.0.0.13, port 5201
[ 4] local 10.0.0.14 port 52788 connected to 10.0.0.13 port 5201
[ ID] Interval Transfer
Anyone can share their table with other MTU values?
Also interested into Switch CPU load
KR,
Manuel
-Mensaje original-
De: Marc Roos
Enviado el: miércoles, 27 de mayo de 2020 12:01
Para: chris.palmer ; paul.emmerich
CC: amudhan83 ; anthony.datri ;
ceph-users ; doustar ; kdhall
; ss
hi there -
On 5/19/20 3:11 PM, thoralf schulze wrote:
> […] and report back …
i tried to reproduce the issue with osds each using 37gb of ssd storage
for db and wal. everything went fine - so yes, spillovers are to be avoided.
thank you very much & with kind regards,
thoralf.
signature.asc
Hi,
We experienced random and relative high latency spikes (around 0.5-10 sec)
in our ceph cluster which consists 6 osd nodes, all osd nodes have 6 osd-s.
One osd built with one spinning disk and two nvme device.
We use a bcache device for osd back end (mixed with hdd and an nvme
partition as cachi
Hi,
since this bug may lead to data loss when several OSDs crash at the same
time (e.g., after a power outage): can we pull the release from the mirrors
and docker hub?
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
8
Common problem for FileStore and really no point in debugging this: upgrade
everything to a recent version and migrate to BlueStore.
99% of random latency spikes are just fixed by doing that.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit G
Hallo Dan, all.
My attempt with ceph-bluestore-tool did not lead to a working OSD.
So I decided to re-create all OSDs, as they were quite many and my
cluster was rather unbalanced.
Too bad I could not get any insight as to what caused the issue on the
OSDs for object storage: however, I will up
Hi,
I'm not sure if the repair waits for snaptrim; but it does need a
scrub reservation on all the related OSDs, hence our script. And I've
also observed that the repair req isn't queued up -- if the OSDs are
busy with other scrubs, the repair req is forgotten.
-- Dan
On Wed, May 27, 2020 at 11:
On Wed, May 27, 2020 at 5:28 AM Daniel Aberger - Profihost AG <
d.aber...@profihost.ag> wrote:
> Hi,
>
> (un)fortunately I can't test it because I managed to repair the pg.
>
> snaptrim and snaptrim_wait have been a part of this particular pg's
> status. As I was trying to look deeper into the cas
Hi, trying to migrate a second ceph cluster to Cephadm. All the host
successfully migrated from "legacy" except one of the OSD hosts (cephadm kept
duplicating osd ids e.g. two "osd.5", still not sure why). To make things
easier, we re-provisioned the node (reinstalled from netinstall, applied th
Hello,
if I understand correctly:
if we upgrade from an running nautilus cluster to octopus we have a
downtime on an update of MDS.
Is this correct?
Mit freundlichen Grüßen / Kind regards
Andreas Schiefer
Leiter Systemadministration / Head of systemadministration
---
HOME OF LOYALTY
CRM- &
I noticed the luks volumes were open, even though luksOpen hung. I killed
cryptsetup (once per disk) and ceph-volume continued and eventually created the
osd's for the host (yes, this node will be slated for another reinstall when
cephadm is stabilized).
Is there a way to remove an osd service
FYI. Hope to see some awesome CephFS submissions for our virtual IO500 BoF!
Thanks,
John
-- Forwarded message -
From: committee--- via IO-500
Date: Fri, May 22, 2020 at 1:53 PM
Subject: [IO-500] IO500 ISC20 Call for Submission
To:
*Deadline*: 08 June 2020 AoE
The IO500
On 5/27/20 8:43 PM, Andreas Schiefer wrote:
if I understand correctly:
if we upgrade from an running nautilus cluster to octopus we have a
downtime on an update of MDS.
Is this correct?
This is always when upgrade major or minor version for MDS. It's hang
for restart, actually clients will
Hi all,
The single active MDS on one of our Ceph clusters is close to running out of
RAM.
MDS total system RAM = 528GB
MDS current free system RAM = 4GB
mds_cache_memory_limit = 451GB
current mds cache usage = 426GB
Presumably we need to reduce our mds_cache_memory_limit and/or
mds_max_caps_pe
18 matches
Mail list logo