Den tors 5 dec. 2019 kl 00:28 skrev Milan Kupcevic <
milan_kupce...@harvard.edu>:
>
>
> There is plenty of space to take more than a few failed nodes. But the
> question was about what is going on inside a node with a few failed
> drives. Current Ceph behavior keeps increasing number of placement
I tried to dig in the mailinglist archives but couldn't find a clear answer
to the following situation:
Ceph encountered a scrub error resulting in HEALTH_ERR
Two PG's are active+clean+inconsistent. When investigating the PG i see a
"read_error" on the primary OSD. Both PG's are replicated PG's
On Thu, Dec 5, 2019 at 4:40 AM Stefan Kooman wrote:
>
> Quoting Stefan Kooman (ste...@bit.nl):
> > and it crashed again (and again) ... until we stopped the mds and
> > deleted the mds0_openfiles.0 from the metadata pool.
> >
> > Here is the (debug) output:
> >
> > A specific workload that *m
On 2019-12-04 04:11, Janne Johansson wrote:
> Den ons 4 dec. 2019 kl 01:37 skrev Milan Kupcevic
> mailto:milan_kupce...@harvard.edu>>:
>
> This cluster can handle this case at this moment as it has got plenty of
> free space. I wonder how is this going to play out when we get to 90% of
>
Quoting Stefan Kooman (ste...@bit.nl):
> and it crashed again (and again) ... until we stopped the mds and
> deleted the mds0_openfiles.0 from the metadata pool.
>
> Here is the (debug) output:
>
> A specific workload that *might* have triggered this: recursively deleting a
> long
> list of
1 pool per storage class (e.g., SSD and HDD), at least one RBD per
gateway per pool for load balancing (failover-only load balancing
policy).
Paul
--
Paul Emmerich
Looking for help with your Ceph cluster? Contact us at https://croit.io
croit GmbH
Freseniusstr. 31h
81247 München
www.croit.io
Te
Hi,
Quoting Stefan Kooman (ste...@bit.nl):
> > please apply following patch, thanks.
> >
> > diff --git a/src/mds/OpenFileTable.cc b/src/mds/OpenFileTable.cc
> > index c0f72d581d..2ca737470d 100644
> > --- a/src/mds/OpenFileTable.cc
> > +++ b/src/mds/OpenFileTable.cc
> > @@ -470,7 +470,11 @@ voi
Lets say that you had roughly 60 OSDs that you wanted to use to provide storage
for VMware, through RBDs served through iscsi.
Target VM types are completely mixed. Web front ends, app tier.. a few
databases.. and the kitchen sink.
Estimated number of VMs: 50-200
b
How would people recommend th
We'll get https://github.com/ceph/ceph/pull/32000 out in 13.2.8 as
quickly as possible.
Neha
On Wed, Dec 4, 2019 at 6:56 AM Dan van der Ster wrote:
>
> My advice is to wait.
>
> We built a 13.2.7 + https://github.com/ceph/ceph/pull/26448 cherry
> picked and the OSDs no longer crash.
>
> My vote
My advice is to wait.
We built a 13.2.7 + https://github.com/ceph/ceph/pull/26448 cherry
picked and the OSDs no longer crash.
My vote would be for a quick 13.2.8.
-- Dan
On Wed, Dec 4, 2019 at 2:41 PM Frank Schilder wrote:
>
> Is this issue now a no-go for updating to 13.2.7 or are there only
Is this issue now a no-go for updating to 13.2.7 or are there only some
specific unsafe scenarios?
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: ceph-users on behalf of Dan van der
Ster
Sent: 03 December 201
Hi all,
I tried to dig in the mailinglist archives but couldn't find a clear answer
to the following situation:
Ceph encountered a scrub error resulting in HEALTH_ERR
Two PG's are active+clean+inconsistent. When investigating the PG i see a
"read_error" on the primary OSD. Both PG's are replicat
>
> >> There's a bug in the current stable Nautilus release that causes a loop
> and/or crash in get_obj_data::flush (you should be able to see it gobbling
> up CPU in perf top). This is the related issue:
> https://tracker.ceph.com/issues/39660 -- it should be fixed as soon as
> 14.2.5 is released
The version is Nautilus. There is a small mismatch in some of the OSD
version numbers, but this has been running for a long time and we have nit
seen this behaviiour.
It is also worth saying that I removed (ahem) then replaced the key for an
osd yesterday. Thanks to Wido for suggesting the fix to
smime.p7m
Description: S/MIME encrypted message
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
Den ons 4 dec. 2019 kl 09:57 skrev Marc Roos :
>
> But I guess that in 'ceph osd tree' the ssd's were then also displayed
> as hdd?
>
Probably, and the difference in perf would be the different defaults hdd
gets vs ssd OSDs with regards to bluestore caches.
--
May the most significant bit of yo
Den ons 4 dec. 2019 kl 01:37 skrev Milan Kupcevic <
milan_kupce...@harvard.edu>:
> This cluster can handle this case at this moment as it has got plenty of
> free space. I wonder how is this going to play out when we get to 90% of
> usage on the whole cluster. A single backplane failure in a node
But I guess that in 'ceph osd tree' the ssd's were then also displayed
as hdd?
-Original Message-
From: Stolte, Felix [mailto:f.sto...@fz-juelich.de]
Sent: woensdag 4 december 2019 9:12
To: ceph-users
Subject: [ceph-users] SSDs behind Hardware Raid
Hi guys,
maybe this is common
Quoting John Hearns (j...@kheironmed.com):
> And me again for the second time in one day.
>
> ceph -w is now showing messages like this:
>
> 2019-12-03 15:17:22.426988 osd.6 [WRN] failed to encode map e28961 with
> expected crc
I have seen messages like this when there are daemons running with
d
smime.p7m
Description: S/MIME encrypted message
___
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
20 matches
Mail list logo