Re: [ceph-users] cls_rbd ops on rbd_id.$name objects in EC pool

2016-02-11 Thread Robert LeBlanc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Is this only a problem with EC base tiers or would replicated base tiers see this too? - Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Feb 11, 2016 at 6:09 PM, Sage Weil wrote: > On

Re: [ceph-users] Reducing the impact of OSD restarts (noout ain't uptosnuff)

2016-02-12 Thread Robert LeBlanc
the code to resolve this by the time I'm finished with the queue optimizations I'm doing (hopefully in a week or two), I plan on looking into this to see if there is something that can be done to prevent the OPs from being accepted until the OSD is ready for them. - Robert LeBl

Re: [ceph-users] Reducing the impact of OSD restarts (noout ain't uptosnuff)

2016-02-12 Thread Robert LeBlanc
in a few weeks time I can have a report on what I find. Hopefully we can have it fixed for Jewel and Hammer. Fingers crossed. Robert LeBlanc Sent from a mobile device please excuse any typos. On Feb 12, 2016 10:32 PM, "Christian Balzer" wrote: > > Hello, > > for the record

Re: [ceph-users] Reducing the impact of OSD restarts (noout ain't uptosnuff)

2016-02-13 Thread Robert LeBlanc
JjGTgi1xtQu9L74u5KPD yHbZ =rnWI -END PGP SIGNATURE- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Sat, Feb 13, 2016 at 8:51 PM, Tom Christensen wrote: >> > Next this : > --- > 2016-02-12 01:35:33.915981 7f75be4d57c0

Re: [ceph-users] Ceph and its failures

2016-02-23 Thread Robert LeBlanc
ceXOM HP9Wi3MrVJtXDLFrnQRglB2dfFWvBlrlBTj3uG7Ebn5DO6glxPEAvzrOgsJ2 O8D5+AMvooc41T74aUcWQK8NHNrrN+eL18yhRfjCgyadA2VYvWeu6K7sIUFo NKFE66ahsxrNKZUrLjeCo69iP4Zf5+AgY7rCau81vzQNtmFUPjzUKyOzgpsb Y2fQ =TGcG -END PGP SIGNATURE- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Tue

Re: [ceph-users] Incorrect output from ceph osd map command

2016-02-23 Thread Robert LeBlanc
/T8bRYeIzINkkB60k6gSvrF5TO2Kq+x7UiYUQ82KyHE+zlTryXW 0BEj2bK9s4NtAItkx3F7bcmnusOOlb1AMMJFssMQV/LmjDOR9xJUYiuqXxrb 6AB3 =hv6I -END PGP SIGNATURE- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Tue, Feb 23, 2016 at 3:33 PM, Vickey Singh wrote

Re: [ceph-users] Crush map customization for production use

2016-02-24 Thread Robert LeBlanc
HRiFDl 7cD0IpScVkSFHVn4MfOeB4Z+qw9ow9SwGB75BYm98axxsRdNlPNiQzxRcb5z Tdal =iMwX -END PGP SIGNATURE- ---- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Wed, Feb 24, 2016 at 4:09 AM, Vickey Singh wrote: > Hello Geeks > > Can someone pleas

Re: [ceph-users] Can not disable rbd cache

2016-02-24 Thread Robert LeBlanc
D PGP SIGNATURE- -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Wed, Feb 24, 2016 at 4:29 AM, Oliver Dzombic wrote: > Hi Esta, > > how do you know, that its still active ? > > -- > Mit freundlichen Gruessen / Best regards > > Olive

Re: [ceph-users] ceph hammer : rbd info/Status : operation not supported (95) (EC+RBD tier pools)

2016-02-24 Thread Robert LeBlanc
We have not seen this issue, but we don't run EC pools yet (we are waiting for multiple layers to be available). We are not running 0.94.6 in production yet either. We have adopted the policy to only run released versions in production unless there is a really pressing need to have a patch. We are

Re: [ceph-users] List of SSDs

2016-02-24 Thread Robert LeBlanc
We are moving to the Intel S3610, from our testing it is a good balance between price, performance and longevity. But as with all things, do your testing ahead of time. This will be our third model of SSDs for our cluster. The S3500s didn't have enough life and performance tapers off add it gets fu

Re: [ceph-users] Observations with a SSD based pool under Hammer

2016-02-24 Thread Robert LeBlanc
With my S3500 drives in my test cluster, the latest master branch gave me an almost 2x increase in performance compare to just a month or two ago. There looks to be some really nice things coming in Jewel around SSD performance. My drives are now 80-85% busy doing about 10-12K IOPS when doing 4K fi

Re: [ceph-users] Can not disable rbd cache

2016-02-25 Thread Robert LeBlanc
My guess would be that if you are already running hammer on the client it is already using the new watcher API. This would be a fix on the OSDs to allow the object to be moved because the current client is smart enough to try again. It would be watchers per object. Sent from a mobile device, pleas

Re: [ceph-users] Observations with a SSD based pool under Hammer

2016-02-25 Thread Robert LeBlanc
erall, but there was some. Sent from a mobile device, please excuse any typos. On Feb 25, 2016 9:15 PM, "Christian Balzer" wrote: > > Hello, > > On Wed, 24 Feb 2016 23:01:43 -0700 Robert LeBlanc wrote: > > > With my S3500 drives in my test cluster, the latest master

Re: [ceph-users] List of SSDs

2016-02-25 Thread Robert LeBlanc
benchmarks. Some of the data about the S3500s is from my test cluster that has them. Sent from a mobile device, please excuse any typos. On Feb 25, 2016 9:20 PM, "Christian Balzer" wrote: > > Hello, > > On Wed, 24 Feb 2016 22:56:15 -0700 Robert LeBlanc wrote: > >

Re: [ceph-users] List of SSDs

2016-02-26 Thread Robert LeBlanc
5GX28 AeraQSHLBtOtyrXBcFCtZv2YVbl2juwwC2lNXHJZBd0b/iUDnrBA358U0crm +TqyYR7LoZiUjUMI0HZzjeyVIsST201R6uQ1Tv9b6DFAOxDMPWD7ViJLcSIO yAiI =vXUO -END PGP SIGNATURE- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Fri, Feb 26, 2016 at 4:05 PM, Shinobu Kinjo

Re: [ceph-users] List of SSDs

2016-02-26 Thread Robert LeBlanc
MQkHs3IKKL/0TgsfN2bszoXbHk1rN1NqMVt9BDqHr ZGb++dyfjUFaMOM/S8WXfkxV3dtYi7LKGEn4pSQ2IyZ92REwcTWej2TPV5r9 Nq0g =LM6/ -END PGP SIGNATURE- -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Fri, Feb 26, 2016 at 5:41 PM, Shinobu Kinjo wrote: &

Re: [ceph-users] Replacing OSD drive without rempaping pg's

2016-03-01 Thread Robert LeBlanc
+JRJLBY5X ITts =yIjM -END PGP SIGNATURE----- -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Mon, Feb 29, 2016 at 10:29 PM, Lindsay Mathieson wrote: > I was looking at replacing an osd drive in place as per the procedure here: > > http

Re: [ceph-users] Cache Pool and EC: objects didn't flush to a cold EC storage

2016-03-07 Thread Robert LeBlanc
j8JHpSga4WkKqlz e3Oz9PsDU9Tw2UVyo4zLEqgpcWcbY8E1VAAoirKAGcCqnwzwjvhGM2e1h66L yYjepiUQ9oLbIct9MXJOSAMwctsrAYgvR1veG+vqND5ZLr+OIR7at9Vpeg8m +oBVG+4PgxlIEfxVGf+8OjLK9sJUTm+AtLMzsbDqMFX9VQtpoTlsqYGd5gTW 9t/H =7sfH -END PGP SIGNATURE- -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1

Re: [ceph-users] how to downgrade when upgrade from firefly to hammer fail

2016-03-07 Thread Robert LeBlanc
+9n4M0jW14n2BXejMZjpKXxNa86N5cF7yO/hILCtz1CVJgNcqT2z+kIDZ3z 50aZva/SHsvxmdwK+UxrB3jnFldhzPUB6nU/xJCQWN+BBTSQByFmAg+JkEuX 13qV0h4yWRfH4uaKYdKuzTVSX0zY8HkAA4ZHTatxiPXiVET+NwNE+4aqdbTz hw+f =nLNP -END PGP SIGNATURE- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Sun

Re: [ceph-users] data corruption with hammer

2016-03-15 Thread Robert LeBlanc
dVbdpnMNRt bMwQ+yVQ8WB5AQmEqN6p6enBCxpvr42p8Eu484dO0xqjIiEOfsMANT/8V63y RzjPMOaFKFnl3JoYNm61RGAUYszNBeX/Plm/3mP0qiiGBAeHYoxh7DNYlrs/ gUb/O9V0yNuHQIRTs8ZRyrzZKpmh9YMYo8hCsfIqWZjMwEyQaRFuysQB3NaR lQCO/o12Khv2cygmTCQxS2L7vp2zrkPaS/KietqQ0gwkV1XbynK0XyLkAVDw zTLa =Wk/a -END PGP SIGNATURE- ----

Re: [ceph-users] data corruption with hammer

2016-03-18 Thread Robert LeBlanc
D6sJ5+ kinZO+CgjbC2AQPdoEKMuvRwBgnftH0YuZJFl0sQPkgBg23r+eCfIxfW/9v/ iLgk =6It+ -END PGP SIGNATURE- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Mar 17, 2016 at 11:55 AM, Robert LeBlanc wrote: > Cherry-picking that commit onto

Re: [ceph-users] data corruption with hammer

2016-03-19 Thread Robert LeBlanc
Also, is this ceph_test_rados rewriting objects quickly? I think that the issue is with rewriting objects so if we can tailor the ceph_test_rados to do that, it might be easier to reproduce. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu

Re: [ceph-users] data corruption with hammer

2016-03-19 Thread Robert LeBlanc
e executable, or `objdump -rdS ` is needed to interpret this. terminate called after throwing an instance of 'ceph::FailedAssertion' Aborted Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Mar 17, 2016 at 10:39 AM, Sage Weil wrote: &

Re: [ceph-users] data corruption with hammer

2016-03-19 Thread Robert LeBlanc
Cherry-picking that commit onto v0.94.6 wasn't clean so I'm just building your branch. I'm not sure what the difference between your branch and 0.94.6 is, I don't see any commits against osd/ReplicatedPG.cc in the last 5 months other than the one you did today. -------

Re: [ceph-users] data corruption with hammer

2016-03-19 Thread Robert LeBlanc
> What would be *really* great is if you could reproduce this with a > ceph_test_rados workload (from ceph-tests). I.e., get ceph_test_rados > running, and then find the sequence of operations that are sufficient to > trigger a failure. > > sage > > > > > > > > > > >

Re: [ceph-users] data corruption with hammer

2016-03-19 Thread Robert LeBlanc
ERGv94qH/E CppgbnchgRHuI68rNM6nFYPJa4C3MlyQhu2WmOialAGgQi+IQP/g6h70e0RR eqLX =DcjE -END PGP SIGNATURE- -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Wed, Mar 16, 2016 at 1:40 PM, Gregory Farnum wrote: > This tracker ticket happened to go by my eyes

Re: [ceph-users] data corruption with hammer

2016-03-19 Thread Robert LeBlanc
Yep, let me pull and build that branch. I tried installing the dbg packages and running it in gdb, but it didn't load the symbols. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Mar 17, 2016 at 11:36 AM, Sage Weil wrote: > On

Re: [ceph-users] data corruption with hammer

2016-03-20 Thread Robert LeBlanc
eKnxMtPoEcuozEp Su1Iud2fYdma5w8MFStjp1BAV3osg1WgIM6KYzsSZI1BkCQAqU58ROZ0ZsMb D05/AEK/A6fp0ROXUczhXDcXlXcGEWyJm1QEtg7cSu3C+9qu5qvQQxyrrwbZ MK8C5lhVb44sqSVcSIZ+KCrPC+x8UKodDQZCz6O6NrJjZLn2g06583cMFWK8 qLo+ =qgB7 -END PGP SIGNATURE- -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Mar 1

Re: [ceph-users] Ceph InfiniBand Cluster - Jewel - Performance

2016-04-07 Thread Robert LeBlanc
fr7w9 V8Mhs7mmkEQtwcvyaYQ0bx0Bs3o4cvTTeYbJUpLWEgMmGAEBZbf7Sx+y3dIp jUHb2jPEchBb83BGeLvAkCTfouq/J3pzQK96gA2Kh/KOlVJTpFdKUU5x+wpM ACqD+S/AFkgnfGm4fcgBexhro7GImiO6VIaOdxvTSdQbSsaoKckZOxFhVWih XyBJ =EF9A -END PGP SIGNATURE- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Apr 7,

[ceph-users] CephFS object mapping.

2019-05-21 Thread Robert LeBlanc
Thank you, Robert LeBlanc -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] CephFS object mapping.

2019-05-22 Thread Robert LeBlanc
On Wed, May 22, 2019 at 12:22 AM Burkhard Linke < burkhard.li...@computational.bio.uni-giessen.de> wrote: > Hi, > > On 5/21/19 9:46 PM, Robert LeBlanc wrote: > > I'm at a new job working with Ceph again and am excited to back in the > > community! > > > &g

Re: [ceph-users] Major ceph disaster

2019-05-22 Thread Robert LeBlanc
proceed. 1. Do a deep-scrub on each PG that is inconsistent. (This may fix some of them) 2. Print out the inconsistent report for each inconsistent PG. `rados list-inconsistent-obj --format=json-pretty` 3. You will want to look at the error messages and see if all

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Robert LeBlanc
You need to use the first stripe of the object as that is the only one with the metadata. Try "rados -p ec31 getxattr 10004dfce92. parent" instead. Robert LeBlanc Sent from a mobile device, please excuse any typos. On Fri, May 24, 2019, 4:42 AM Kevin Flöh wrote: > Hi, &

Re: [ceph-users] Major ceph disaster

2019-05-24 Thread Robert LeBlanc
tory > > Does this mean that the lost object isn't even a file that appears in the > ceph directory. Maybe a leftover of a file that has not been deleted > properly? It wouldn't be an issue to mark the object as lost in that case. > On 24.05.19 5:08 nachm., Robert LeBlanc wro

Re: [ceph-users] CephFS object mapping.

2019-05-24 Thread Robert LeBlanc
On Fri, May 24, 2019 at 2:14 AM Burkhard Linke < burkhard.li...@computational.bio.uni-giessen.de> wrote: > Hi, > On 5/22/19 5:53 PM, Robert LeBlanc wrote: > > When you say 'some' is it a fixed offset that the file data starts? Is the > first stripe just metadata? &g

Re: [ceph-users] performance in a small cluster

2019-05-24 Thread Robert LeBlanc
h short tests are small amounts of data, but once the drive started getting full, the performance dropped off a cliff. Considering that Ceph is really hard on drives, it's good to test the extreme. Robert LeBlanc ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-07 Thread Robert LeBlanc
and making backfills not so disruptive. -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Jun 6, 2019 at 1:43 AM BASSAGET Cédric wrote: > Hello, > > I see messages related to REQUEST_SLOW a few times per day. > > here's my ceph

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-10 Thread Robert LeBlanc
; help in this case ? > Regards > Your disk times look okay, just a lot more unbalanced than I would expect. I'd give wpq a try, I use it all the time, just be sure to also include the op_cutoff setting too or it doesn't have much effect. Let me know how it

Re: [ceph-users] slow requests are blocked > 32 sec. Implicated osds 0, 2, 3, 4, 5 (REQUEST_SLOW)

2019-06-10 Thread Robert LeBlanc
EQUEST_SLOW) warning, even if my OSD disk usage goes above 95% (fio >> ran from 4 diffrent hosts) >> >> On my prod cluster, release 12.2.9, as soon as I run fio on a single >> host, I see a lot of REQUEST_SLOW warninr gmessages, but "iostat -xd 1" >> does

Re: [ceph-users] rebalancing ceph cluster

2019-06-25 Thread Robert LeBlanc
t {osd-num} {weight}``` -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Mon, Jun 24, 2019 at 2:25 AM jinguk.k...@ungleich.ch < jinguk.k...@ungleich.ch> wrote: > Hello everyone, > > We have some osd on the ceph. > Some osd&

Re: [ceph-users] CephFS : Kernel/Fuse technical differences

2019-06-25 Thread Robert LeBlanc
There may also be more memory coping involved instead of just passing pointers around as well, but I'm not 100% sure. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Mon, Jun 24, 2019 at 10:28 AM Jeff Layton wrote: > On Mon, 2019-

Re: [ceph-users] How does monitor know OSD is dead?

2019-06-28 Thread Robert LeBlanc
llow IO to continue. Then when the down timeout expires it will start backfilling and recovering the PGs that were affected. Double check that size != min_size for your pools. -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Jun 27, 2019 at 5:2

Re: [ceph-users] Migrating a cephfs data pool

2019-06-28 Thread Robert LeBlanc
ugh downtime to move hundreds of Terabytes, we need something that can be done online, and if it has a minute or two of downtime would be okay. -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Fri, Jun 28, 2019 at 9:02 AM Marc Roos wrote: &g

Re: [ceph-users] Migrating a cephfs data pool

2019-06-28 Thread Robert LeBlanc
at is done and the eviction is done, then you can remove the pool from cephfs and the overlay. That way the OSDs are the one doing the data movement. I don't know that part of the code, so I can't quickly propose any patches. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4

Re: [ceph-users] How does monitor know OSD is dead?

2019-06-29 Thread Robert LeBlanc
quot; I mention is the "mon osd down out interval". The rest of what I wrote is correct. Just to make sure I don't confuse anyone else. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How does monitor know OSD is dead?

2019-06-29 Thread Robert LeBlanc
if 600 seconds pass with the monitor not hearing from the OSD, it will mark it down. It 'should' only take 20 seconds to detect a downed OSD. Usually, the problem is that an OSD gets too busy and misses heartbeats so other OSDs wrongly mark them d

Re: [ceph-users] increase pg_num error

2019-07-01 Thread Robert LeBlanc
I believe he needs to increase the pgp_num first, then pg_num. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Mon, Jul 1, 2019 at 7:21 AM Nathan Fish wrote: > I ran into this recently. Try running "ceph osd require-osd-release &g

Re: [ceph-users] increase pg_num error

2019-07-01 Thread Robert LeBlanc
On Mon, Jul 1, 2019 at 11:57 AM Brett Chancellor wrote: > In Nautilus just pg_num is sufficient for both increases and decreases. > > Good to know, I haven't gotten to Nautilus yet. -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654

[ceph-users] ceph-ansible with docker

2019-07-01 Thread Robert LeBlanc
cker have their own IP address and are bridges created like LXD or does it share the host IP? Thank you, Robert LeBlanc -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing

Re: [ceph-users] cannot add fuse options to ceph-fuse command

2019-07-05 Thread Robert LeBlanc
Is this a Ceph specific option? If so, you may need to prefix it with "ceph.", at least I had to for FUSE to pass it to the Ceph module/code portion. -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Jul 4, 2019 at 7:35 AM s

Re: [ceph-users] To backport or not to backport

2019-07-05 Thread Robert LeBlanc
xtremely well so it "Just Works". By not back porting new features, I think it gives more time to bake the features into the new version and frees up the developers to focus on the forward direction of the product. If I want a new feature, then

Re: [ceph-users] enterprise support

2019-07-15 Thread Robert LeBlanc
We recently used Croit (https://croit.io/) and they were really good. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Mon, Jul 15, 2019 at 12:53 PM Void Star Nill wrote: > Hello, > > Other than Redhat and SUSE, are there other

[ceph-users] Allocation recommendations for separate blocks.db and WAL

2019-07-17 Thread Robert LeBlanc
/Leveled-Compaction Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Investigating Config Error, 300x reduction in IOPs performance on RGW layer

2019-07-17 Thread Robert LeBlanc
I'm pretty new to RGW, but I'm needing to get max performance as well. Have you tried moving your RGW metadata pools to nvme? Carve out a bit of NVMe space and then pin the pool to the SSD class in CRUSH, that way the small metadata ops aren't on slow media. -------- Rob

[ceph-users] Mark CephFS inode as lost

2019-07-22 Thread Robert LeBlanc
d_ops, how can we tell MDS that the inode is lost and to forget about it without trying to do any checks on it (checking the RADOS objects may be part of the problem)? Once the inode is out of CephFS, we can clean up the RADOS objects manually or leave them there to rot. Thanks, Robert Le

Re: [ceph-users] Mark CephFS inode as lost

2019-07-23 Thread Robert LeBlanc
Thanks, I created a ticket. http://tracker.ceph.com/issues/40906 Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Mon, Jul 22, 2019 at 11:45 PM Yan, Zheng wrote: > please create a ticket at http://tracker.ceph.com/projects/cephfs

Re: [ceph-users] How to add 100 new OSDs...

2019-08-02 Thread Robert LeBlanc
s lots of client I/O, but the clients haven't noticed that huge backfills have been going on. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Problems understanding 'ceph-features' output

2019-08-02 Thread Robert LeBlanc
On Tue, Jul 30, 2019 at 2:06 AM Janne Johansson wrote: > Someone should make a webpage where you can enter that hex-string and get > a list back. > Providing a minimum bitmap would allow someone to do so, and someone like me to do it manually until then. ---- Robert Le

Re: [ceph-users] How to add 100 new OSDs...

2019-08-03 Thread Robert LeBlanc
Alex Gorbachev wrote: > On Fri, Aug 2, 2019 at 6:57 PM Robert LeBlanc > wrote: > > > > On Fri, Jul 26, 2019 at 1:02 PM Peter Sabaini wrote: > >> > >> On 26.07.19 15:03, Stefan Kooman wrote: > >> > Quoting Peter Sabaini (pe...@sabaini.at): > &g

Re: [ceph-users] Built-in HA?

2019-08-05 Thread Robert LeBlanc
Routing and bind the source port on the connection (not the easiest, but allows you to have multiple NICs in the same broadcast domain). I don't have experience with Ceph in this type of configuration. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62

[ceph-users] New CRUSH device class questions

2019-08-06 Thread Robert LeBlanc
h_location_hook (potentially using a file with a list of partition UUIDs that should be in the metadata pool).? Any other options I may not be considering? Thank you, Robert LeBlanc -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654

Re: [ceph-users] New CRUSH device class questions

2019-08-06 Thread Robert LeBlanc
On Tue, Aug 6, 2019 at 11:11 AM Paul Emmerich wrote: > On Tue, Aug 6, 2019 at 7:45 PM Robert LeBlanc > wrote: > > We have a 12.2.8 luminous cluster with all NVMe and we want to take some > of the NVMe OSDs and allocate them strictly to metadata pools (we have a > problem wi

Re: [ceph-users] New CRUSH device class questions

2019-08-06 Thread Robert LeBlanc
of the pool capacity and sets the quota if the current quota is 1% out of balance. This is run by cron every 5 minutes. If there is a way to reserve some capacity for a pool that no other pool can use, please provide an example. Think of reserved inode space in ext4/XFS/etc. Thank you. --

Re: [ceph-users] New CRUSH device class questions

2019-08-07 Thread Robert LeBlanc
On Wed, Aug 7, 2019 at 12:08 AM Konstantin Shalygin wrote: > On 8/7/19 1:40 PM, Robert LeBlanc wrote: > > > Maybe it's the lateness of the day, but I'm not sure how to do that. > > Do you have an example where all the OSDs are of class ssd? > Can't parse wh

[ceph-users] Replay MDS server stuck

2019-08-09 Thread Robert LeBlanc
improve that would be appreciated. Thank you, Robert LeBlanc -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] New CRUSH device class questions

2019-08-12 Thread Robert LeBlanc
the space of a pool, which is not what I'm looking for. Thank you, Robert LeBlanc Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] How does CephFS find a file?

2019-08-19 Thread Robert LeBlanc
pe size to know how many objects to fetch for the whole object. The file is stored by the inode (in hex) appended by the object offset. The inode corresponds to the same value in `ls -li` in CephFS converted to hex. I hope that is correct and useful as a starting point for you. -------- Robe

Re: [ceph-users] MDSs report damaged metadata

2019-08-22 Thread Robert LeBlanc
one. When I deleted the directories with the damage the active MDS crashed, but the replay took over just fine. I haven't had the messages now for almost a week. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Mon, Aug 19, 2019 at 10:30 PM

[ceph-users] Failure to start ceph-mon in docker

2019-08-28 Thread Robert LeBlanc
rw-r--r-- 1 167 167 37 Jul 30 22:15 IDENTITY -rw-r--r-- 1 167 1670 Jul 30 22:15 LOCK -rw-r--r-- 1 167 167 1.3M Aug 28 19:16 MANIFEST-027846 -rw-r--r-- 1 167 167 4.7K Aug 1 23:38 OPTIONS-002825 -rw-r--r-- 1 167 167 4.7K Aug 16 07:40 OPTIONS-027849

Re: [ceph-users] Failure to start ceph-mon in docker

2019-08-28 Thread Robert LeBlanc
Turns out /var/lib/ceph was ceph.ceph and not 167.167, chowning it made things work. I guess only monitor needs that permission, rgw,mgr,osd are all happy without needing it to be 167.167. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Wed

[ceph-users] Specify OSD size and OSD journal size with ceph-ansible

2019-08-28 Thread Robert LeBlanc
27; - data: '/dev/sdk' db: '/dev/nvme0n1' crush_device_class: 'hdd' - data: '/dev/sdl' db: '/dev/nvme0n1' crush_device_class: 'hdd'

Re: [ceph-users] Failure to start ceph-mon in docker

2019-08-29 Thread Robert LeBlanc
grading to the Ceph distributed packages didn't change the UID. Thanks, Robert LeBlanc -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Aug 29, 2019 at 12:33 AM Frank Schilder wrote: > Hi Robert, > > this is a bit less tr

Re: [ceph-users] cephfs performance issue MDSs report slow requests and osd memory usage

2019-09-23 Thread Robert LeBlanc
ncluded below; oldest blocked for > 62497.675728 secs > > 2019-09-19 08:53:47.528891 mds.icadmin007 [WRN] 3 slow requests, 0 included > > below; oldest blocked for > 62501.243214 secs > > 2019-09-19 08:53:52.529021 mds.icadmin007 [WRN] 3 slow requests, 0 included > >

Re: [ceph-users] ceph; pg scrub errors

2019-09-23 Thread Robert LeBlanc
m is repaired and when it deep-scrubs to check it, the problem has reappeared or another problem was found and the disk needs to be replaced. Try running: rados list-inconsistent-obj ${PG} --format=json and see what the exact problems are. -------- Robert LeBlanc PGP Fingerprint 79A2 9CA

Re: [ceph-users] hanging slow requests: failed to authpin, subtree is being exported

2019-09-23 Thread Robert LeBlanc
t: 141 KiB/s rd, 54 MiB/s wr, 62 op/s rd, 577 op/s wr > > > > > [root@mds02 ~]# ceph health detail > > HEALTH_WARN 1 MDSs report slow requests; 2 MDSs behind on trimming > > MDS_SLOW_REQUEST 1 MDSs report slow requests > > mdsmds02(mds.1): 2 slow reque

Re: [ceph-users] cephfs performance issue MDSs report slow requests and osd memory usage

2019-09-24 Thread Robert LeBlanc
I wanted all my config in a single file, so I put it in my inventory file, but it looks like you have the right idea. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Commit and Apply latency on nautilus

2019-10-01 Thread Robert LeBlanc
lse I can > try. > > Any suggestions? If you haven't already tried this, add this to your ceph.conf and restart your OSDs, this should help bring down the variance in latency (It will be the default in Octopus): osd op queue = wpq osd op queue cut off = high Rober

Re: [ceph-users] Commit and Apply latency on nautilus

2019-10-01 Thread Robert LeBlanc
On Tue, Oct 1, 2019 at 7:54 AM Robert LeBlanc wrote: > > On Mon, Sep 30, 2019 at 5:12 PM Sasha Litvak > wrote: > > > > At this point, I ran out of ideas. I changed nr_requests and readahead > > parameters to 128->1024 and 128->4096, tuned nodes to > > pe

Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery

2019-10-14 Thread Robert LeBlanc
ontrol? > > best regards, > > Samuel Not sure which version of Ceph you are on, but add these to your /etc/ceph/ceph.conf on all your OSDs and restart them. osd op queue = wpq osd op queue cut off = high That should really help and make backfills and recovery be non-impactful. This wi

Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery

2019-10-16 Thread Robert LeBlanc
settings on and it really helped both of them. ---- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery

2019-10-17 Thread Robert LeBlanc
eter? Wow! Dusting off the cobwebs here. I think this is what lead me to dig into the code and write the WPQ scheduler. I can't remember doing anything specific. I'm sorry I'm not much help in this regard. Robert LeBlanc PGP Fingerprint 79A2 9CA

Re: [ceph-users] Openstack VM IOPS drops dramatically during Ceph recovery

2019-10-17 Thread Robert LeBlanc
mpact for client traffic. Those would need to be set on all OSDs to be completely effective. Maybe go back to the defaults? Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Decreasing the impact of reweighting osds

2019-10-25 Thread Robert LeBlanc
You can try adding Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Tue, Oct 22, 2019 at 8:36 PM David Turner wrote: > > Most times you are better served with simpler settings like > osd_recovery_sleep, which has 3 variants if

Re: [ceph-users] Decreasing the impact of reweighting osds

2019-10-25 Thread Robert LeBlanc
Yout can try adding osd op queue = wpq osd op queue cut off = high To all the osd ceph configs and restarting, That has made reweighting pretty painless for us. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Tue, Oct 22, 2019 at 8:36 PM

Re: [ceph-users] Revert a CephFS snapshot?

2019-12-03 Thread Robert LeBlanc
y and roll back each file's content. The MDS could do this more > efficiently than rsync give what it knows about the snapped inodes > (skipping untouched inodes or, eventually, entire subtrees) but it's a > non-trivial amount of work to implement. > > Would it m

Re: [ceph-users] RGW performance with low object sizes

2019-12-03 Thread Robert LeBlanc
--+ > | 8 | 196.3 MB/s |2 1 2 2 3 3 5 > 5 |2 1 2 2 3 3 5 5 | > > +-++++ > [...section CLEAN

Re: [ceph-users] RGW performance with low object sizes

2019-12-03 Thread Robert LeBlanc
On Tue, Dec 3, 2019 at 9:11 AM Ed Fisher wrote: > > > On Dec 3, 2019, at 10:28 AM, Robert LeBlanc wrote: > > Did you make progress on this? We have a ton of < 64K objects as well and > are struggling to get good performance out of our RGW. Sometimes we have > RGW

[ceph-users] Cephfs metadata fix tool

2019-12-07 Thread Robert LeBlanc
Our Jewel cluster is exhibiting some similar issues to the one in this thread [0] and it was indicated that a tool would need to be written to fix that kind of corruption. Has the tool been written? How would I go about repair this 16EB directories that won't delete? Thank you, Robert LeBlan

[ceph-users] Annoying PGs not deep-scrubbed in time messages in Nautilus.

2019-12-09 Thread Robert LeBlanc
33 GiB 0 1.8 PiB default.rgw.buckets.non-ec 8 8.1 MiB 22 8.1 MiB 0 1.8 PiB Please help me figure out what I'm doing wrong with these settings. Thanks, Robert LeBlanc -------- Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD

Re: [ceph-users] Annoying PGs not deep-scrubbed in time messages in Nautilus.

2019-12-09 Thread Robert LeBlanc
; > The solution for you is to simply put the option under global and restart > ceph-mgr (or use daemon config set; it doesn't support changing config via > ceph tell for some reason) > > > Paul > > On Mon, Dec 9, 2019 at 8:32 PM Paul Emmerich > wrote: > >>

Re: [ceph-users] units of metrics

2020-01-13 Thread Robert LeBlanc
The link that you referenced above is no longer available, do you have a new link?. We upgraded from 12.2.8 to 12.2.12 and the MDS metrics all changed, so I'm trying to may the old values to the new values. Might just have to look in the code. :( Thanks! Robert LeBlan

Re: [ceph-users] units of metrics

2020-01-14 Thread Robert LeBlanc
On Tue, Jan 14, 2020 at 12:30 AM Stefan Kooman wrote: > Quoting Robert LeBlanc (rob...@leblancnet.us): > > The link that you referenced above is no longer available, do you have a > > new link?. We upgraded from 12.2.8 to 12.2.12 and the MDS metrics all > > changed, so I&#x

<    1   2   3   4   5