Summary: I am having problems with inconsistent PG's that the 'ceph pg repair'
command does not fix. Below are the details. Any help would be appreciated.
# Find the inconsistent PG's
~# ceph pg dump | grep inconsistent
dumped all in format plain
2.439 42080 00 017279507143 31033103 active+clea
f ceph-osd -v).
>-Sam
>
>On Mon, Aug 3, 2015 at 12:34 PM, Andras Pataki
> wrote:
>> Summary: I am having problems with inconsistent PG's that the 'ceph pg
>> repair' command does not fix. Below are the details. Any help would be
>> appreciated.
>>
"ceph version 0.94.3
(95cefea9fd9ab740263bf8bb4796fd864d9afe2b)"
Could you have another look?
Thanks,
Andras
________
From: Andras Pataki
Sent: Monday, August 3, 2015 4:09 PM
To: Samuel Just
Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: Re: [ceph-users]
Cool, thanks!
Andras
From: Sage Weil
Sent: Tuesday, September 8, 2015 2:07 PM
To: Andras Pataki
Cc: Samuel Just; ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org
Subject: Re: [ceph-users] Inconsistent PGs that ceph pg repair does not fix
On Tue, 8
Hi ceph users,
I am using CephFS for file storage and I have noticed that the data gets
distributed very unevenly across OSDs.
* I have about 90 OSDs across 8 hosts, and 4096 PGs for the cephfs_data
pool with 2 replicas, which is in line with the total PG recommendation if
"Total PGs = (OS
ment groups.
>
>How many data pools are being used for storing objects?
>
>'ceph osd dump |grep pool'
>
>Also how are these 90 OSD's laid out across the 8 hosts and is there any
>discrepancy between disk sizes and weight?
>
>'ceph osd tree'
>
>
Hi,
Is there a way to find out which radios objects a file in cephfs is mapped to
from the command line? Or vice versa, which file a particular radios object
belongs to?
Our ceph cluster has some inconsistencies/corruptions and I am trying to find
out which files are impacted in cephfs.
Thank
Thanks, that worked. Is there a mapping in the other direction easily
available, I.e. To find where all the 4MB pieces of a file are?
On 9/28/15, 4:56 PM, "John Spray" wrote:
>On Mon, Sep 28, 2015 at 9:46 PM, Andras Pataki
> wrote:
>> Hi,
>>
>> Is there a way
d to run crush, by making use of the crushtool or
>similar.
>-Greg
>
>On Tue, Sep 29, 2015 at 6:29 AM, Andras Pataki
> wrote:
>> Thanks, that worked. Is there a mapping in the other direction easily
>> available, I.e. To find where all the 4MB pieces of a file are?
>&g
Hi ceph users,
We’ve upgraded to 0.94.4 (all ceph daemons got restarted) – and are in the
middle of doing some rebalancing due to crush changes (removing some disks).
During the rebalance, I see that some placement groups get stuck in
‘active+clean+replay’ for a long time (essentially until I
, "Gregory Farnum" wrote:
>On Tue, Oct 27, 2015 at 11:03 AM, Gregory Farnum
>wrote:
>> On Thu, Oct 22, 2015 at 3:58 PM, Andras Pataki
>> wrote:
>>> Hi ceph users,
>>>
>>> We¹ve upgraded to 0.94.4 (all ceph daemons got restarted) and are in
&
centos RPMs) before a planned larger rebalance.
Andras
On 10/27/15, 2:36 PM, "Gregory Farnum" wrote:
>On Tue, Oct 27, 2015 at 11:22 AM, Andras Pataki
> wrote:
>> Hi Greg,
>>
>> No, unfortunately I haven¹t found any resolution to it. We are using
>> cephfs, th
Hi Greg,
I’ve tested the patch below on top of the 0.94.5 hammer sources, and it
works beautifully. No more active+clean+replay stuck PGs.
Thanks!
Andras
On 10/27/15, 4:46 PM, "Andras Pataki" wrote:
>Yes, this definitely sounds plausible (the peering/activating process does
On 06/30/2017 04:38 PM, Gregory Farnum wrote:
On Wed, Jun 21, 2017 at 6:57 AM Andras Pataki
mailto:apat...@flatironinstitute.org>>
wrote:
Hi cephers,
I noticed something I don't understand about ceph's behavior when
adding an OSD. When I start with a clean cluster
l pools are now with replicated size 3 and min size 2. Let me know if
any other info would be helpful.
Andras
On 07/06/2017 02:30 PM, Andras Pataki wrote:
Hi Greg,
At the moment our cluster is all in balance. We have one failed drive
that will be replaced in a few days (the OSD has been removed fr
We are having some difficulties with cephfs access to the same file from
multiple nodes concurrently. After debugging some large-ish
applications with noticeable performance problems using CephFS (with the
fuse client), I have a small test program to reproduce the problem.
The core of the pro
the issue down further.
Andras
On 07/21/2017 05:41 AM, John Spray wrote:
On Thu, Jul 20, 2017 at 9:19 PM, Andras Pataki
wrote:
We are having some difficulties with cephfs access to the same file from
multiple nodes concurrently. After debugging some large-ish applications
with noticeable p
I've filed a tracker bug for this: http://tracker.ceph.com/issues/20938
Andras
On 08/01/2017 10:26 AM, Andras Pataki wrote:
Hi John,
Sorry for the delay, it took a bit of work to set up a luminous test
environment. I'm sorry to have to report that the 12.1.1 RC version
also su
After upgrading to the latest Luminous RC (12.1.3), all our OSD's are
crashing with the following assert:
0> 2017-08-15 08:28:49.479238 7f9b7615cd00 -1
/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/rel
development packages that should resolve the issue in the
meantime.
[1] http://tracker.ceph.com/issues/20985
On Tue, Aug 15, 2017 at 9:08 AM, Andras Pataki
wrote:
After upgrading to the latest Luminous RC (12.1.3), all our OSD's are
crashing with the following assert:
0> 2017-08-
Is there any guidance on the sizes for the WAL and DB devices when they
are separated to an SSD/NVMe? I understand that probably there isn't a
one size fits all number, but perhaps something as a function of
cluster/usage parameters like OSD size and usage pattern (amount of
writes, number/siz
We've been running into a strange problem with Ceph using ceph-fuse and
the filesystem. All the back end nodes are on 10.2.10, the fuse clients
are on 10.2.7.
After some hours of runs, some processes get stuck waiting for fuse like:
[root@worker1144 ~]# cat /proc/58193/stack
[] wait_answer_int
-fuse? This does sound vaguely familiar
and is an issue I'd generally expect to have the fix backported for,
once it was identified.
On Thu, Nov 2, 2017 at 11:40 AM Andras Pataki
mailto:apat...@flatironinstitute.org>>
wrote:
We've been running into a strange problem wit
r ought to work fine.
On Thu, Nov 2, 2017 at 4:58 PM Andras Pataki
mailto:apat...@flatironinstitute.org>>
wrote:
I'm planning to test the newer ceph-fuse tomorrow. Would it be
better to stay with the Jewel 10.2.10 client, or would the 12.2.1
Luminous client be better (eve
Hello ceph users,
I've seen threads about Luminous OSDs using more memory than they should
due to some memory accounting bugs. Does this issue apply to ceph-fuse
also?
After upgrading to the latest ceph-fuse luminous client (12.2.1), we see
some ceph-fuse processes using excessive memory.
Dear ceph users,
After upgrading to the Luminous 12.2.1 ceph-fuse client, we've seen
clients on various nodes randomly crash at the assert
FAILED assert(0 == "failed to remount for kernel dentry trimming")
with the stack:
ceph version 12.2.1 (3e7492b9ada8bdc9a5cd0feafd42fbca27f9c38e)
rash to be
definite about it.
Andras
On 11/27/2017 06:06 PM, Patrick Donnelly wrote:
Hello Andras,
On Mon, Nov 27, 2017 at 2:31 PM, Andras Pataki
wrote:
After upgrading to the Luminous 12.2.1 ceph-fuse client, we've seen clients
on various nodes randomly crash at the assert
Dear Cephers,
We've upgraded the back end of our cluster from Jewel (10.2.10) to
Luminous (12.2.2). The upgrade went smoothly for the most part, except
we seem to be hitting an issue with cephfs. After about a day or two of
use, the MDS start complaining about clients failing to respond to c
18 09:50 PM, Andras Pataki wrote:
Dear Cephers,
*snipsnap*
We are running with a larger MDS cache than usual, we have
mds_cache_size set to 4 million. All other MDS configs are the
defaults.
AFAIK the MDS cache management in luminous has changed, focusing on
memory size instead of numb
06:09 AM, John Spray wrote:
On Tue, Jan 16, 2018 at 8:50 PM, Andras Pataki
wrote:
Dear Cephers,
We've upgraded the back end of our cluster from Jewel (10.2.10) to Luminous
(12.2.2). The upgrade went smoothly for the most part, except we seem to be
hitting an issue with cephfs. After ab
PM, John Spray wrote:
On Wed, Jan 17, 2018 at 3:36 PM, Andras Pataki
wrote:
Hi John,
All our hosts are CentOS 7 hosts, the majority are 7.4 with kernel
3.10.0-693.5.2.el7.x86_64, with fuse 2.9.2-8.el7. We have some hosts that
have slight variations in kernel versions, the oldest one are a handful
ache memory limit' of 16GB and
bounced them, we have had no performance or cache pressure issues, and
as expected they hover around 22-23GB of RSS.
Thanks everyone for the help,
Andras
On 01/18/2018 12:34 PM, Patrick Donnelly wrote:
Hi Andras,
On Thu, Jan 18, 2018 at 3:38 AM, Andras Pataki
wr
There is a config option "mon osd min up ratio" (defaults to 0.3) - and
if too many OSDs are down, the monitors will not mark further OSDs
down. Perhaps that's the culprit here?
Andras
On 01/31/2018 02:21 PM, Marc Roos wrote:
Maybe the process is still responding on an active session?
If
Hi everyone,
Yesterday scrubbing turned up an inconsistency in one of our placement
groups. We are running ceph 10.2.3, using CephFS and RBD for some VM
images.
[root@hyperv017 ~]# ceph -s
cluster d7b33135-0940-4e48-8aa6-1d2026597c2f
health HEALTH_ERR
1 pgs inconsistent
s not match object info size (3014656) adjusted for
ondisk to (3014656)
2016-12-20 16:27:35.885496 7f3e17cac700 -1 log_channel(cluster) log
[ERR] : 6.92c repair 1 errors, 0 fixed
Any help/hints would be appreciated.
Thanks,
Andras
On 12/15/2016 10:13 AM, Andras Pataki wrote:
Hi eve
ou could have a look on this
object on each related osd for the pg, compare them and delete the
Different object. I assume you have size = 3.
Then again pg repair.
But be carefull iirc the replica will be recovered from the primary pg.
Hth
Am 20. Dezember 2016 22:39:44 MEZ, schrieb Andra
"omap_stats_invalid": false,
"hitset_stats_invalid": false,
"hitset_bytes_stats_invalid": false,
"pin_stats_invalid": true,
"stat_sum": {
"num_bytes"
out in
https://www.spinics.net/lists/ceph-devel/msg16519.html to set the size
value to zero. Your target value is 1c.
$ printf '%x\n' 1835008
1c
Make sure you check it is right before injecting it back in with "attr -s"
What version is this? Did you look for a simila
# ceph pg debug unfound_objects_exist
FALSE
Andras
On 01/03/2017 11:38 PM, Shinobu Kinjo wrote:
Would you run:
# ceph pg debug unfound_objects_exist
On Wed, Jan 4, 2017 at 5:31 AM, Andras Pataki
wrote:
Here is the output of ceph pg query for one of hte active+clean+inconsistent
PGs
ects). But
it'd be perhaps good to do some searching on how/why this problem came
about before doing this.
andras
On 01/07/2017 06:48 PM, Shinobu Kinjo wrote:
Sorry for the late.
Are you still facing inconsistent pg status?
On Wed, Jan 4, 2017 at 11:39 PM, Andras Pataki
Hi cephers,
Is there a way to see what a crush map change does to the PG mappings
(i.e. what placement groups end up on what OSDs) without actually
setting the crush map (and have the map take effect)? I'm looking for
some way I could test hypothetical crush map changes without any effect
on
We are also running some fairly dense nodes with CentOS 7.4 and ran into
similar problems. The nodes ran filestore OSDs (Jewel, then Luminous).
Sometimes a node would be so unresponsive that one couldn't even ssh to
it (even though the root disk was a physically separate drive on a
separate c
: Tyler Bishop [mailto:tyler.bis...@beyondhosting.net]
Sent: vrijdag 24 augustus 2018 3:11
To: Andras Pataki
Cc: ceph-users@lists.ceph.com
Subject: Re: [ceph-users] Stability Issue with 52 OSD hosts
Thanks for the info. I was investigating bluestore as well. My host
dont go unresponsive but I do see
Hi cephers,
Every so often we have a ceph-fuse process that grows to rather large
size (up to eating up the whole memory of the machine). Here is an
example of a 200GB RSS size ceph-fuse instance:
# ceph daemon /var/run/ceph/ceph-client.admin.asok dump_mempools
{
"bloom_filter": {
,
"omap_wr": 0,
"omap_rd": 0,
"omap_del": 0
},
"throttle-msgr_dispatch_throttler-client": {
"val": 0,
"max": 104857600,
"get_started": 0,
"get": 673934,
"get
increased memory related settings in
/etc/ceph.conf, but based on my understanding of the parameters, they
shouldn't amount to such high memory usage.
Andras
On 09/05/2018 10:15 AM, Andras Pataki wrote:
Below are the performance counters. Some scientific workflows trigger
this - some parts of
{
"items": 16782690,
"bytes": 68783148465
}
}
Andras
On 9/6/18 11:58 PM, Yan, Zheng wrote:
Could you please try make ceph-fuse use simple messenger (add "ms type
= simple" to client section of ceph.conf).
Regards
Yan, Zheng
On Wed, Sep 5, 2018
06 PM, Andras Pataki wrote:
Hi Zheng,
It looks like the memory growth happens even with the simple messenger:
[root@worker1032 ~]# ceph daemon /var/run/ceph/ceph-client.admin.asok
config get ms_type
{
"ms_type": "simple"
}
[root@worker1032 ~]# ps -auxwww | grep ceph-fuse
r
The whole cluster, including ceph-fuse is version 12.2.7.
Andras
On 9/24/18 6:27 AM, Yan, Zheng wrote:
On Fri, Sep 21, 2018 at 5:40 AM Andras Pataki
wrote:
I've done some more experiments playing with client config parameters,
and it seems like the the client_oc_size parameter is
running. I can reproduce
this issue at will in about 6 to 8 hours of running one of our
scientific jobs - and I can also run a more instrumented/patched/etc.
code to try.
Andras
On 9/24/18 10:06 PM, Yan, Zheng wrote:
On Tue, Sep 25, 2018 at 2:23 AM Andras Pataki
wrote:
The whole cluster
We have so far been using ceph-fuse for mounting cephfs, but the small
file performance of ceph-fuse is often problematic. We've been testing
the kernel client, and have seen some pretty bad crashes/hangs.
What is the policy on fixes to the kernel client? Is only the latest
stable kernel upd
x
-Original Message-
From: Andras Pataki [mailto:apat...@flatironinstitute.org]
Sent: maandag 1 oktober 2018 20:10
To: ceph-users
Subject: [ceph-users] cephfs kernel client stability
We have so far been using ceph-fuse for mounting cephfs, but the small
file performance of ceph-fuse
logs there are no entries around
the problematic times either. And after all this, the mount is unusable:
[root@worker1004 ~]# ls -l /mnt/cephtest
ls: cannot access /mnt/cephtest: Permission denied
Andras
On 10/1/18 3:02 PM, Andras Pataki wrote:
These hangs happen during random I/O fio benchma
After replacing failing drive I'd like to recreate the OSD with the same
osd-id using ceph-volume (now that we've moved to ceph-volume from
ceph-disk). However, I seem to not be successful. The command I'm using:
ceph-volume lvm prepare --bluestore --osd-id 747 --data H901D44/H901D44
--block
oval procedure, osd crush remove, auth del, osd rm)?
Thanks,
Andras
On 10/3/18 10:36 AM, Alfredo Deza wrote:
On Wed, Oct 3, 2018 at 9:57 AM Andras Pataki
wrote:
After replacing failing drive I'd like to recreate the OSD with the same
osd-id using ceph-volume (now that we've move
en ID from scratch
would be nice (given that the underlying raw ceph commands already
support it).
Andras
On 10/3/18 11:41 AM, Alfredo Deza wrote:
On Wed, Oct 3, 2018 at 11:23 AM Andras Pataki
wrote:
Thanks - I didn't realize that was such a recent fix.
I've now tried 12.2.8, and perh
Dear ceph users,
I've been experimenting setting up a new node with ceph-volume and
bluestore. Most of the setup works right, but I'm running into a
strange interaction between ceph-volume and systemd when starting OSDs.
After preparing/activating the OSD, a systemd unit instance is created
Done: tracker #24152
Thanks,
Andras
On 05/16/2018 04:58 PM, Alfredo Deza wrote:
On Wed, May 16, 2018 at 4:50 PM, Andras Pataki
wrote:
Dear ceph users,
I've been experimenting setting up a new node with ceph-volume and
bluestore. Most of the setup works right, but I'm runn
I've been trying to wrap my head around crush rules, and I need some
help/advice. I'm thinking of using erasure coding instead of
replication, and trying to understand the possibilities for planning for
failure cases.
For a simplified example, consider a 2 level topology, OSDs live on
hosts,
can't put any erasure code with more than 9 chunks.
Andras
On 05/18/2018 06:30 PM, Gregory Farnum wrote:
On Thu, May 17, 2018 at 9:05 AM Andras Pataki
mailto:apat...@flatironinstitute.org>>
wrote:
I've been trying to wrap my head around crush rules, and I need some
We're using CephFS with Luminous 12.2.5 and the fuse client (on CentOS
7.4, kernel 3.10.0-693.5.2.el7.x86_64). Performance has been very good
generally, but we're currently running into some strange performance
issues with one of our applications. The client in this case is on a
higher latenc
Hi Greg,
The docs say that client_cache_size is the number of inodes that are
cached, not bytes of data. Is that incorrect?
Andras
On 06/06/2018 11:25 AM, Gregory Farnum wrote:
On Wed, Jun 6, 2018 at 5:52 AM, Andras Pataki
wrote:
We're using CephFS with Luminous 12.2.5 and the
x27;t look at the caches any longer of the client.
Andras
On 06/06/2018 12:22 PM, Andras Pataki wrote:
Hi Greg,
The docs say that client_cache_size is the number of inodes that are
cached, not bytes of data. Is that incorrect?
Andras
On 06/06/2018 11:25 AM, Gregory Farnum wrote:
On Wed
Dear Cephers,
We're using the ceph file system with the fuse client, and lately some
of our processes are getting stuck seemingly waiting for fuse
operations. At the same time, the cluster is healthy, no slow requests,
all OSDs up and running, and both the MDS and the fuse client think that
x27;ve also tried kick_stale_sessions on the
fuse client, which didn't help (I guess since it doesn't think the
session is stale).
Let me know if there is anything else I can do to help.
Andras
On 03/13/2017 06:08 PM, John Spray wrote:
On Mon, Mar 13, 2017 at 8:15 PM, Andras Pataki
On 03/14/2017 12:55 PM, John Spray wrote:
On Tue, Mar 14, 2017 at 2:10 PM, Andras Pataki
wrote:
Hi John,
I've checked the MDS session list, and the fuse client does appear on that
with 'state' as 'open'. So both the fuse client and the MDS agree on an
open connection.
A
Below is a crash we had on a few machines with the ceph-fuse client on
the latest Jewel release 10.2.6. A total of 5 ceph-fuse processes
crashed more or less the same way at different times. The full logs are
at
http://voms.simonsfoundation.org:50013/9SXnEpflYPmE6UhM9EgOR3us341eqym/ceph-20170
0331/
Some help/advice with this would very much be appreciated. Thanks in
advance!
Andras
On 03/14/2017 12:55 PM, John Spray wrote:
On Tue, Mar 14, 2017 at 2:10 PM, Andras Pataki
wrote:
Hi John,
I've checked the MDS session list, and the fuse client does appear on that
with 'stat
27 PM, Andras Pataki
wrote:
Hi John,
It took a while but I believe now I have a reproducible test case for the
capabilities being lost issue in CephFS I wrote about a couple of weeks ago.
The quick summary of problem is that often processes hang using CephFS
either for a while or sometimes indefin
ation is
trying to achieve, that'd be super helpful.
Andras
On 03/31/2017 02:07 PM, Andras Pataki wrote:
Several clients on one node also works well for me (I guess the fuse
client arbitrates then and the MDS perhaps doesn't need to do so
much). So the clients need to be on diff
Hi Dan,
I don't have a solution to the problem, I can only second that we've
also been seeing strange problems when more than one node accesses the
same file in ceph and at least one of them opens it for writing. I've
tried verbose logging on the client (fuse), and it seems that the fuse
cli
Hi cephers,
I noticed something I don't understand about ceph's behavior when adding
an OSD. When I start with a clean cluster (all PG's active+clean) and
add an OSD (via ceph-deploy for example), the crush map gets updated and
PGs get reassigned to different OSDs, and the new OSD starts gett
Moving data between pools when a file is moved to a different directory
is most likely problematic - for example an inode can be hard linked to
two different directories that are in two different pools - then what
happens to the file? Unix/posix semantics don't really specify a parent
director
Dear ceph users,
We have a large-ish ceph cluster with about 3500 osds. We run 3 mons on
dedicated hosts, and the mons typically use a few percent of a core, and
generate about 50Mbits/sec network traffic. They are connected at
20Gbits/sec (bonded dual 10Gbit) and are running on 2x14 core se
Forgot to mention: all nodes are on Luminous 12.2.8 currently on CentOS 7.5.
On 12/19/18 5:34 PM, Andras Pataki wrote:
Dear ceph users,
We have a large-ish ceph cluster with about 3500 osds. We run 3 mons
on dedicated hosts, and the mons typically use a few percent of a
core, and generate
luggish on a 3400 osd cluster, perhaps
taking a few 10s of seconds). the pgs should be active+clean at this
point.
5. unset nodown, noin, noout. which should change nothing provided all
went well.
Hope that helps for next time!
Dan
On Wed, Dec 19, 2018 at 11:39 PM Andras Pataki
mailto:ap
We've been using ceph-fuse with a pretty good stability record (against
the Luminous 12.2.8 back end). Unfortunately ceph-fuse has extremely
poor small file performance (understandably), so we've been testing the
kernel client. The latest RedHat kernel 3.10.0-957.1.3.el7.x86_64 seems
to work
ne of these mon
communication sessions only lasts half a second. Then it reconnects to
another mon, and gets the same result, etc. Any way around this?
Andras
On 12/26/18 7:55 PM, Andras Pataki wrote:
We've been using ceph-fuse with a pretty good stability record
(against the Luminous 1
maps need to be
reencoded (to jewel), or how to improve this behavior would be much
appreciated. We would really be interested in using the kernel client
instead of fuse, but this seems to be a stumbling block.
Thanks,
Andras
On 1/3/19 6:49 AM, Andras Pataki wrote:
I wonder if anyone could
Hi Ilya/Kjetil,
I've done some debugging and tcpdump-ing to see what the interaction
between the kernel client and the mon looks like. Indeed -
CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon
messages for our cluster (with osd_mon_messages_max at 100). We have
about 3500 os
at 7:12 PM Andras Pataki
wrote:
Hi Ilya/Kjetil,
I've done some debugging and tcpdump-ing to see what the interaction
between the kernel client and the mon looks like. Indeed -
CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon
messages for our cluster (with osd_mon_messages_max at
Hi ceph users,
As I understand, cephfs in Mimic had significant issues up to and
including version 13.2.2. With some critical patches in Mimic 13.2.4,
is cephfs now production quality in Mimic? Are there folks out there
using it in a production setting? If so, could you share your
experien
Hi cephers,
I'm working through our testing cycle to upgrade our main ceph cluster
from Luminous to Mimic, and I ran into a problem with ceph_fuse. With
Luminous, a single client can pretty much max out a 10Gbps network
connection writing sequentially on our cluster with Luminous ceph_fuse.
Dear ceph users,
After upgrading our ceph-fuse clients to 14.2.2, we've been seeing
sporadic segfaults with not super revealing stack traces:
in thread 7fff5a7fc700 thread_name:ceph-fuse
ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be)
nautilus (stable)
1: (()+0x
We have a new ceph Nautilus setup (Nautilus from scratch - not upgraded):
# ceph versions
{
"mon": {
"ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba)
nautilus (stable)": 3
},
"mgr": {
"ceph version 14.2.4 (75f4de193b3ea58512f204623e6c5a16e6c1e1ba)
nau
Hi Philipp,
Given 256 PG's triple replicated onto 4 OSD's you might be encountering
the "PG overdose protection" of OSDs. Take a look at 'ceph osd df' and
see the number of PG's that are mapped to each OSD (last column or near
the last). The default limit is 200, so if any OSD exceeds that,
Dear cephers,
We've had a few (dozen or so) rather odd scrub errors in our Mimic
(13.2.6) cephfs:
2019-11-15 07:52:52.614 7fffcc41f700 0 log_channel(cluster) log [DBG] :
2.b5b scrub starts
2019-11-15 07:52:55.190 7fffcc41f700 -1 log_channel(cluster) log [ERR] :
2.b5b shard 599 soid 2:dad015
87 matches
Mail list logo