Re: [ceph-users] understanding the bluestore blob, chunk and compression params

2019-06-20 Thread Dan van der Ster
Fedotov wrote: > > I'd like to see more details (preferably backed with logs) on this... > > On 6/20/2019 6:23 PM, Dan van der Ster wrote: > > P.S. I know this has been discussed before, but the > > compression_(mode|algorithm) pool options [1] seem completely bro

Re: [ceph-users] understanding the bluestore blob, chunk and compression params

2019-06-21 Thread Dan van der Ster
http://tracker.ceph.com/issues/40480 On Thu, Jun 20, 2019 at 9:12 PM Dan van der Ster wrote: > > I will try to reproduce with logs and create a tracker once I find the > smoking gun... > > It's very strange -- I had the osd mode set to 'passive', and pool > optio

Re: [ceph-users] Erasure Coding performance for IO < stripe_width

2019-07-08 Thread Dan van der Ster
Hi Lars, Is there a specific bench result you're concerned about? I would think that small write perf could be kept reasonable thanks to bluestore's deferred writes. FWIW, our bench results (all flash cluster) didn't show a massive performance difference between 3 replica and 4+2 EC. I agree abou

Re: [ceph-users] Erasure Coding performance for IO < stripe_width

2019-07-08 Thread Dan van der Ster
On Mon, Jul 8, 2019 at 1:02 PM Lars Marowsky-Bree wrote: > > On 2019-07-08T12:25:30, Dan van der Ster wrote: > > > Is there a specific bench result you're concerned about? > > We're seeing ~5800 IOPS, ~23 MiB/s on 4 KiB IO (stripe_width 8192) on a > pool that

[ceph-users] how to power off a cephfs cluster cleanly

2019-07-25 Thread Dan van der Ster
Hi all, In September we'll need to power down a CephFS cluster (currently mimic) for a several-hour electrical intervention. Having never done this before, I thought I'd check with the list. Here's our planned procedure: 1. umounts /cephfs from all hpc clients. 2. ceph osd set noout 3. wait unti

[ceph-users] loaded dup inode (but no mds crash)

2019-07-26 Thread Dan van der Ster
Hi all, Last night we had 60 ERRs like this: 2019-07-26 00:56:44.479240 7efc6cca1700 0 mds.2.cache.dir(0x617) _fetched badness: got (but i already had) [inode 0x10006289992 [...2,head] ~mds2/stray1/10006289992 auth v14438219972 dirtyparent s=116637332 nl=8 n(v0 rc2019-07-26 00:56:17.199090 b116

Re: [ceph-users] loaded dup inode (but no mds crash)

2019-07-29 Thread Dan van der Ster
On Mon, Jul 29, 2019 at 2:52 PM Yan, Zheng wrote: > > On Fri, Jul 26, 2019 at 4:45 PM Dan van der Ster wrote: > > > > Hi all, > > > > Last night we had 60 ERRs like this: > > > > 2019-07-26 00:56:44.479240 7efc6cca1700 0 mds.2.cache.dir(0x617) >

Re: [ceph-users] loaded dup inode (but no mds crash)

2019-07-29 Thread Dan van der Ster
On Mon, Jul 29, 2019 at 3:47 PM Yan, Zheng wrote: > > On Mon, Jul 29, 2019 at 9:13 PM Dan van der Ster wrote: > > > > On Mon, Jul 29, 2019 at 2:52 PM Yan, Zheng wrote: > > > > > > On Fri, Jul 26, 2019 at 4:45 PM Dan van der Ster > > > wrote: >

Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Dan van der Ster
Hi, Which version of ceph are you using? Which balancer mode? The balancer score isn't a percent-error or anything humanly usable. `ceph osd df tree` can better show you exactly which osds are over/under utilized and by how much. You might be able to manually fix things by using `ceph osd reweigh

Re: [ceph-users] cephfs full, 2/3 Raw capacity used

2019-08-26 Thread Dan van der Ster
Thanks. The version and balancer config look good. So you can try `ceph osd reweight osd.10 0.8` to see if it helps to get you out of this. -- dan On Mon, Aug 26, 2019 at 11:35 AM Simon Oosthoek wrote: > > On 26-08-19 11:16, Dan van der Ster wrote: > > Hi, > > > > Which

Re: [ceph-users] ceph mdss keep on crashing after update to 14.2.3

2019-09-19 Thread Dan van der Ster
You were running v14.2.2 before? It seems that that ceph_assert you're hitting was indeed added between v14.2.2. and v14.2.3 in this commit https://github.com/ceph/ceph/commit/12f8b813b0118b13e0cdac15b19ba8a7e127730b There's a comment in the tracker for that commit which says the original fix wa

[ceph-users] v13.2.7 osds crash in build_incremental_map_msg

2019-12-03 Thread Dan van der Ster
Hi all, We're midway through an update from 13.2.6 to 13.2.7 and started getting OSDs crashing regularly like this [1]. Does anyone obviously know what the issue is? (Maybe https://github.com/ceph/ceph/pull/26448/files ?) Or is it some temporary problem while we still have v13.2.6 and v13.2.7 osds

Re: [ceph-users] v13.2.7 osds crash in build_incremental_map_msg

2019-12-03 Thread Dan van der Ster
I created https://tracker.ceph.com/issues/43106 and we're downgrading our osds back to 13.2.6. -- dan On Tue, Dec 3, 2019 at 4:09 PM Dan van der Ster wrote: > > Hi all, > > We're midway through an update from 13.2.6 to 13.2.7 and started > getting OSDs crashing regula

Re: [ceph-users] v13.2.7 osds crash in build_incremental_map_msg

2019-12-04 Thread Dan van der Ster
e only some > specific unsafe scenarios? > > Best regards, > > = > Frank Schilder > AIT Risø Campus > Bygning 109, rum S14 > > > From: ceph-users on behalf of Dan van der > Ster > Sent: 03 December

Re: [ceph-users] Acting sets sometimes may violate crush rule ?

2020-01-13 Thread Dan van der Ster
Hi, One way this can happen is if you change the crush rule of a pool after the balancer has been running awhile. This is because the balancer upmaps are only validated when they are initially created. ceph osd dump | grep upmap Does it explain your issue? .. Dan On Tue, 14 Jan 2020, 04:17 Yi

Re: [ceph-users] OSD's hang after network blip

2020-01-16 Thread Dan van der Ster
Hi Nick, We saw the exact same problem yesterday after a network outage -- a few of our down OSDs were stuck down until we restarted their processes. -- Dan On Wed, Jan 15, 2020 at 3:37 PM Nick Fisk wrote: > Hi All, > > Running 14.2.5, currently experiencing some network blips isolated to a >

Re: [ceph-users] OSD's hang after network blip

2020-01-16 Thread Dan van der Ster
t 12:08 PM Nick Fisk wrote: > On Thursday, January 16, 2020 09:15 GMT, Dan van der Ster < > d...@vanderster.com> wrote: > > > Hi Nick, > > > > We saw the exact same problem yesterday after a network outage -- a few > of > > our down OSDs were stuc

Re: [ceph-users] MDS: obscene buffer_anon memory use when scanning lots of files

2020-01-21 Thread Dan van der Ster
On Wed, Jan 22, 2020 at 12:24 AM Patrick Donnelly wrote: > On Tue, Jan 21, 2020 at 8:32 AM John Madden wrote: > > > > On 14.2.5 but also present in Luminous, buffer_anon memory use spirals > > out of control when scanning many thousands of files. The use case is > > more or less "look up this fi

<    4   5   6   7   8   9