Hi Paul!
Thanks for your answer. Yep, bluestore_min_alloc_size and your
calculation sounds very reasonable to me :)
Am 29.03.2019 um 23:56 schrieb Paul Emmerich:
Are you running on HDDs? The minimum allocation size is 64kb by
default here. You can control that via the parameter
bluestore_min_
As we fixed failed node next day, cluster rebalanced to it's original state
without any issues, so crush dump would be irrelevant at this point I
guess. Will have to wait for next occurence.
Here's a tunables part, maybe it will help to shed some light:
"tunables": {
"choose_local_trie
"
"
Thanks for this advice. It helped me to identify a subset of devices (only 3
of the whole cluster) where was this problem happening. The SAS adapter (LSI
SAS 3008) on my Supermicro board was the issue. There is a RAID mode enabled
by default. I have flashed the latest firmware (v
Hi all,
We have been benchmarking a hyperconverged cephfs cluster (kernel
clients + osd on same machines) for awhile. Over the weekend (for the
first time) we had one cephfs mount deadlock while some clients were
running ior.
All the ior processes are stuck in D state with this stack:
[] wait_on
There are no problems with mixed bluestore_min_alloc_size; that's an
abstraction layer lower than the concept of multiple OSDs. (Also, you
always have that when mixing SSDs and HDDs)
I'm not sure about the real-world impacts of a lower min alloc size or
the rationale behind the default values for
Which kernel version are you using? We've had lots of problems with
random deadlocks in kernels with cephfs but 4.19 seems to be pretty
stable.
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
It's the latest CentOS 7.6 kernel. Known pain there?
The user was running a 1.95TiB ior benchmark -- so, trying to do
parallel writes to one single 1.95TiB file.
We have
max_file_size 219902322 (exactly 2 TiB)
so it should fit.
Thanks!
Dan
On Mon, Apr 1, 2019 at 1:06 PM Paul Emmerich wr
Hi Fabian,
We've just started building a cluster using the PM983 for the bucket index. Let
me know if you want us to perform any test on them.
Thanks,
Martin
> -Original Message-
> From: ceph-users On Behalf Of
> Fabian Figueredo
> Sent: 30. marts 2019 07:55
> To: ceph-users@lists.cep
Hello Ceph Users,
I am finding that the write latency across my ceph clusters isn't great and I
wanted to see what other people are getting for op_w_latency. Generally I am
getting 70-110ms latency.
I am using: ceph --admin-daemon /var/run/ceph/ceph-osd.102.asok perf dump |
grep -A3 '\"op_w_la
On Mon, Apr 1, 2019 at 6:45 PM Dan van der Ster wrote:
>
> Hi all,
>
> We have been benchmarking a hyperconverged cephfs cluster (kernel
> clients + osd on same machines) for awhile. Over the weekend (for the
> first time) we had one cephfs mount deadlock while some clients were
> running ior.
>
>
I haven't had any issues with 4k allocation size in cluster holding 189M files.
April 1, 2019 2:04 PM, "Paul Emmerich" wrote:
> I'm not sure about the real-world impacts of a lower min alloc size or
> the rationale behind the default values for HDDs (64) and SSDs (16kb).
>
> Paul
__
Hi,
Please let us know how this ended for you!
--
Mark Schouten
Tuxis, Ede, https://www.tuxis.nl
T: +31 318 200208
- Originele bericht -
Van: Stadsnet (jwil...@stads.net)
Datum: 26-03-2019 16:42
Naar: Ashley Merrick (singap...@amerrick.co.uk)
Cc: ceph-users@lists.ceph.com
Onderwerp: R
Hello
We are experiencing an issue where our ceph MDS gobbles up 500G of RAM, is
killed by the kernel, dies, then repeats. We have 3 MDS daemons on different
machines, and all are exhibiting this behavior. We are running the following
versions (from Docker):
* ceph/daemon:v3.2.1-stable-3
What happens when you run "rados -p rbd lock list gateway.conf"?
On Fri, Mar 29, 2019 at 12:19 PM Matthias Leopold
wrote:
>
> Hi,
>
> I upgraded my test Ceph iSCSI gateways to
> ceph-iscsi-3.0-6.g433bbaa.el7.noarch.
> I'm trying to use the new parameter "cluster_client_name", which - to me
> - so
We decided to go ahead and try truncating the journal, but before we did, we
would try to back it up. However, there are ridiculous values in the header. It
can't write a journal this large because (I presume) my ext4 filesystem can't
seek to this position in the (sparse) file.
I would not be
Since my problem is going to be archived on the Internet I'll keep following
up, so the next person with this problem might save some time.
The seek was because ext4 can't seek to 23TB, but changing to an xfs mount to
create this file resulted in success.
Here is what I wound up doing to fix
These steps pretty well correspond to
http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/
(http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/)
Were you able to replay journal manually with no issues? IIRC,
"cephfs-journal-tool recover_dentries" would lead to OOM in case of MDS doing
Hi all,
I've been having an issue with the dashboard being unable to list block images.
In the mimic and luminous dashboards it would take a very long time to load,
eventually telling me it was showing a cached list, and after a few auto
refreshes it would finally show all rbd images and their
Sorry slight correction, nautilus dashboard has finally listed the images, it
just took even longer still. It's also reporting the same "Warning Displaying
previously cached data for pools rbd, rbd_repl_ssd." messages as before and is
clearly struggling.
Thanks,
Wes Cilldhaire
Sol1
- Or
Hi,
Our ceph production cluster is down when updating crushmap. Now we can't
get out monitors to come online and when they come online for a fraction of
a second we see crush map errors in logs. How can we update crushmap when
monitors are down as none of the ceph commands are working.
Thanks,
Pa
Can you provide detail error logs when mon crash?
Pardhiv Karri 于2019年4月2日周二 上午9:02写道:
>
> Hi,
>
> Our ceph production cluster is down when updating crushmap. Now we can't get
> out monitors to come online and when they come online for a fraction of a
> second we see crush map errors in logs.
Hi Huang,
We are on ceph Luminous 12.2.11
The primary is sh1ora1300 but that is not coming up at all. sh1ora1301 and
sh1ora1302 are coming up and are in quorum as per log but still not able to
run any ceph commands. Below is part of the log.
2019-04-02 00:48:51.644339 mon.sh1ora1302 mon.2 10.15.
Hi,
This happens after we restart the active MDS, and somehow the standby MDS
daemon cannot take over successfully and is stuck at up:replaying. It is
showing the following log. Any idea on how to fix this?
2019-04-02 12:54:00.985079 7f6f70670700 1 mds.WXS0023 respawn
2019-04-02 12:54:00.985095
23 matches
Mail list logo