Re: [ceph-users] Shall host weight auto reduce on hdd failure?

2019-12-04 Thread Janne Johansson
Den tors 5 dec. 2019 kl 00:28 skrev Milan Kupcevic < milan_kupce...@harvard.edu>: > > > There is plenty of space to take more than a few failed nodes. But the > question was about what is going on inside a node with a few failed > drives. Current Ceph behavior keeps increasing number of placement

Re: [ceph-users] Is a scrub error (read_error) on a primary osd safe to repair?

2019-12-04 Thread Konstantin Shalygin
I tried to dig in the mailinglist archives but couldn't find a clear answer to the following situation: Ceph encountered a scrub error resulting in HEALTH_ERR Two PG's are active+clean+inconsistent. When investigating the PG i see a "read_error" on the primary OSD. Both PG's are replicated PG's

Re: [ceph-users] MDS crash - FAILED assert(omap_num_objs <= MAX_OBJECTS)

2019-12-04 Thread Yan, Zheng
On Thu, Dec 5, 2019 at 4:40 AM Stefan Kooman wrote: > > Quoting Stefan Kooman (ste...@bit.nl): > > and it crashed again (and again) ... until we stopped the mds and > > deleted the mds0_openfiles.0 from the metadata pool. > > > > Here is the (debug) output: > > > > A specific workload that *m

Re: [ceph-users] Shall host weight auto reduce on hdd failure?

2019-12-04 Thread Milan Kupcevic
On 2019-12-04 04:11, Janne Johansson wrote: > Den ons 4 dec. 2019 kl 01:37 skrev Milan Kupcevic > mailto:milan_kupce...@harvard.edu>>: > > This cluster can handle this case at this moment as it has got plenty of > free space. I wonder how is this going to play out when we get to 90% of >

Re: [ceph-users] MDS crash - FAILED assert(omap_num_objs <= MAX_OBJECTS)

2019-12-04 Thread Stefan Kooman
Quoting Stefan Kooman (ste...@bit.nl): > and it crashed again (and again) ... until we stopped the mds and > deleted the mds0_openfiles.0 from the metadata pool. > > Here is the (debug) output: > > A specific workload that *might* have triggered this: recursively deleting a > long > list of

Re: [ceph-users] best pool usage for vmware backing

2019-12-04 Thread Paul Emmerich
1 pool per storage class (e.g., SSD and HDD), at least one RBD per gateway per pool for load balancing (failover-only load balancing policy). Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Te

Re: [ceph-users] MDS crash - FAILED assert(omap_num_objs <= MAX_OBJECTS)

2019-12-04 Thread Stefan Kooman
Hi, Quoting Stefan Kooman (ste...@bit.nl): > > please apply following patch, thanks. > > > > diff --git a/src/mds/OpenFileTable.cc b/src/mds/OpenFileTable.cc > > index c0f72d581d..2ca737470d 100644 > > --- a/src/mds/OpenFileTable.cc > > +++ b/src/mds/OpenFileTable.cc > > @@ -470,7 +470,11 @@ voi

[ceph-users] best pool usage for vmware backing

2019-12-04 Thread Philip Brown
Lets say that you had roughly 60 OSDs that you wanted to use to provide storage for VMware, through RBDs served through iscsi. Target VM types are completely mixed. Web front ends, app tier.. a few databases.. and the kitchen sink. Estimated number of VMs: 50-200 b How would people recommend th

Re: [ceph-users] v13.2.7 osds crash in build_incremental_map_msg

2019-12-04 Thread Neha Ojha
We'll get https://github.com/ceph/ceph/pull/32000 out in 13.2.8 as quickly as possible. Neha On Wed, Dec 4, 2019 at 6:56 AM Dan van der Ster wrote: > > My advice is to wait. > > We built a 13.2.7 + https://github.com/ceph/ceph/pull/26448 cherry > picked and the OSDs no longer crash. > > My vote

Re: [ceph-users] v13.2.7 osds crash in build_incremental_map_msg

2019-12-04 Thread Dan van der Ster
My advice is to wait. We built a 13.2.7 + https://github.com/ceph/ceph/pull/26448 cherry picked and the OSDs no longer crash. My vote would be for a quick 13.2.8. -- Dan On Wed, Dec 4, 2019 at 2:41 PM Frank Schilder wrote: > > Is this issue now a no-go for updating to 13.2.7 or are there only

Re: [ceph-users] v13.2.7 osds crash in build_incremental_map_msg

2019-12-04 Thread Frank Schilder
Is this issue now a no-go for updating to 13.2.7 or are there only some specific unsafe scenarios? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: ceph-users on behalf of Dan van der Ster Sent: 03 December 201

[ceph-users] Is a scrub error (read_error) on a primary osd safe to repair?

2019-12-04 Thread Caspar Smit
Hi all, I tried to dig in the mailinglist archives but couldn't find a clear answer to the following situation: Ceph encountered a scrub error resulting in HEALTH_ERR Two PG's are active+clean+inconsistent. When investigating the PG i see a "read_error" on the primary OSD. Both PG's are replicat

Re: [ceph-users] RGW performance with low object sizes

2019-12-04 Thread Christian
> > >> There's a bug in the current stable Nautilus release that causes a loop > and/or crash in get_obj_data::flush (you should be able to see it gobbling > up CPU in perf top). This is the related issue: > https://tracker.ceph.com/issues/39660 -- it should be fixed as soon as > 14.2.5 is released

Re: [ceph-users] Failed to encode map errors

2019-12-04 Thread John Hearns
The version is Nautilus. There is a small mismatch in some of the OSD version numbers, but this has been running for a long time and we have nit seen this behaviiour. It is also worth saying that I removed (ahem) then replaced the key for an osd yesterday. Thanks to Wido for suggesting the fix to

Re: [ceph-users] SSDs behind Hardware Raid

2019-12-04 Thread Stolte, Felix
smime.p7m Description: S/MIME encrypted message ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] SSDs behind Hardware Raid

2019-12-04 Thread Janne Johansson
Den ons 4 dec. 2019 kl 09:57 skrev Marc Roos : > > But I guess that in 'ceph osd tree' the ssd's were then also displayed > as hdd? > Probably, and the difference in perf would be the different defaults hdd gets vs ssd OSDs with regards to bluestore caches. -- May the most significant bit of yo

Re: [ceph-users] Shall host weight auto reduce on hdd failure?

2019-12-04 Thread Janne Johansson
Den ons 4 dec. 2019 kl 01:37 skrev Milan Kupcevic < milan_kupce...@harvard.edu>: > This cluster can handle this case at this moment as it has got plenty of > free space. I wonder how is this going to play out when we get to 90% of > usage on the whole cluster. A single backplane failure in a node

Re: [ceph-users] SSDs behind Hardware Raid

2019-12-04 Thread Marc Roos
But I guess that in 'ceph osd tree' the ssd's were then also displayed as hdd? -Original Message- From: Stolte, Felix [mailto:f.sto...@fz-juelich.de] Sent: woensdag 4 december 2019 9:12 To: ceph-users Subject: [ceph-users] SSDs behind Hardware Raid Hi guys, maybe this is common

Re: [ceph-users] Failed to encode map errors

2019-12-04 Thread Stefan Kooman
Quoting John Hearns (j...@kheironmed.com): > And me again for the second time in one day. > > ceph -w is now showing messages like this: > > 2019-12-03 15:17:22.426988 osd.6 [WRN] failed to encode map e28961 with > expected crc I have seen messages like this when there are daemons running with d

[ceph-users] SSDs behind Hardware Raid

2019-12-04 Thread Stolte, Felix
smime.p7m Description: S/MIME encrypted message ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com