Re: [ceph-users] jewel to luminous upgrade, chooseleaf_vary_r and chooseleaf_stable

2018-05-15 Thread Adrian
Thanks Dan, After talking it through we've decided to adopt your approach too and leave the tunables till after the upgrade. Regards, Adrian. On Mon, May 14, 2018 at 5:14 PM, Dan van der Ster wrote: > Hi Adrian, > > Is there a strict reason why you *must* upgrade the tunables? > > It is normal

Re: [ceph-users] Ceph Luminous - OSD constantly crashing caused by corrupted placement group

2018-05-15 Thread Gregory Farnum
Looks like something went a little wrong with the snapshot metadata in that PG. If the PG is still going active from the other copies, you're probably best off using the ceph-objectstore-tool to remove it on the OSD that is crashing. You could either replace it with an export from one of the other

Re: [ceph-users] Too many active mds servers

2018-05-15 Thread Thomas Bennett
Hi Patric, Thanks! Much appreciate. On Tue, 15 May 2018 at 14:52, Patrick Donnelly wrote: > Hello Thomas, > > On Tue, May 15, 2018 at 2:35 PM, Thomas Bennett wrote: > > Hi, > > > > I'm running Luminous 12.2.5 and I'm testing cephfs. > > > > However, I seem to have too many active mds servers o

Re: [ceph-users] Too many active mds servers

2018-05-15 Thread Patrick Donnelly
Hello Thomas, On Tue, May 15, 2018 at 2:35 PM, Thomas Bennett wrote: > Hi, > > I'm running Luminous 12.2.5 and I'm testing cephfs. > > However, I seem to have too many active mds servers on my test cluster. > > How do I set one of my mds servers to become standby? > > I've run ceph fs set cephfs

[ceph-users] Too many active mds servers

2018-05-15 Thread Thomas Bennett
Hi, I'm running Luminous 12.2.5 and I'm testing cephfs. However, I seem to have too many active mds servers on my test cluster. How do I set one of my mds servers to become standby? I've run ceph fs set cephfs max_mds 2 which set the max_mds from 3 to 2 but has no effect on my running configura

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread David Turner
I've been happening into slow requests with my rgw metadata pools just this week. I tracked it down because the slow requests were on my nmve osds. I haven't solved the issue yet, but I can confirm that no resharding was taking place and that the auto-resharder is working as all of my larger bucket

Re: [ceph-users] which kernel support object-map, fast-diff

2018-05-15 Thread Paul Emmerich
The following RBD features are supported since these kernel versions: Kernel 3.8: RBD_FEATURE_LAYERING https://github.com/ceph/ceph-client/commit/d889140c4a1c5edb6a7bd90392b9d878bfaccfb6 Kernel 3.10: RBD_FEATURE_STRIPINGV2 https://github.com/ceph/ceph-client/commit/5cbf6f12c48121199cc214c93dea98cc

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread Paul Emmerich
Looks like it's mostly RGW metadata stuff; are you running your non-data RGW pools on SSDs (you should, that can help *a lot*)? Paul 2018-05-15 18:49 GMT+02:00 Grigory Murashov : > Hello guys! > > I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph > daemon osd.16 dump_historic

Re: [ceph-users] Cephfs write fail when node goes down

2018-05-15 Thread Paul Emmerich
Kernel 4.4 is ancient in terms of Ceph support; we've also encountered a lot of similar hangs with older kernels and cephfs. Paul 2018-05-15 16:56 GMT+02:00 David C : > I've seen similar behavior with cephfs client around that age, try 4.14+ > > On 15 May 2018 1:57 p.m., "Josef Zelenka" > wrot

Re: [ceph-users] Node crash, filesytem not usable

2018-05-15 Thread Webert de Souza Lima
I'm sorry I wouldn't know, I'm on Jewel. is your cluster HEALTH_OK now? Regards, Webert Lima DevOps Engineer at MAV Tecnologia *Belo Horizonte - Brasil* *IRC NICK - WebertRLZ* On Sun, May 13, 2018 at 6:29 AM Marc Roos wrote: > > In luminous > osd_recovery_threads = osd_disk_threads ? > osd_re

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread LOPEZ Jean-Charles
Hi Grigory, looks like osd.16 is having a hard time acknowledging the write request (for bucket resharding operations from what it looks like) as it takes about 15 seconds for osd.16 to receive the commit confirmation from osd.21 on subop communication. Have a go and check at the journal devic

Re: [ceph-users] Single ceph cluster for the object storage service of 2 OpenStack clouds

2018-05-15 Thread David Turner
Yeah, that's how we do multiple zones. I find following the documentation for multi-site (but not actually setting up a second site) to work well for setting up multiple realms in a single cluster. On Tue, May 15, 2018 at 9:29 AM Massimo Sgaravatto < massimo.sgarava...@gmail.com> wrote: > Hi > >

Re: [ceph-users] slow requests are blocked

2018-05-15 Thread Grigory Murashov
Hello guys! I collected output of ceph daemon osd.16 dump_ops_in_flight and ceph daemon osd.16 dump_historic_ops. Here is the output of ceph heath details in the moment of problem HEALTH_WARN 20 slow requests are blocked > 32 sec REQUEST_SLOW 20 slow requests are blocked > 32 sec     20 ops a

Re: [ceph-users] RBD bench read performance vs rados bench

2018-05-15 Thread Jorge Pinilla López
rbd bench --io-type read 2tb/test --io-size 4M bench type read io_size 4194304 io_threads 16 bytes 1073741824 pattern sequential SEC OPS OPS/SEC BYTES/SEC 1 8 22.96 96306849.22 212 11.74 49250368.05 314 9.85 41294366.71 4

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-15 Thread Wido den Hollander
On 05/15/2018 02:51 PM, Blair Bethwaite wrote: > Sorry, bit late to get back to this... > > On Wed., 2 May 2018, 06:19 Nick Fisk, > wrote: > > 4.16 required? > > > Looks like it - thanks for pointing that out. > > Wido, I don't think you are doing anything wrong

Re: [ceph-users] Cephfs write fail when node goes down

2018-05-15 Thread David C
I've seen similar behavior with cephfs client around that age, try 4.14+ On 15 May 2018 1:57 p.m., "Josef Zelenka" wrote: Client's kernel is 4.4.0. Regarding the hung osd request, i'll have to check, the issue is gone now, so i'm not sure if i'll find what you are suggesting. It's rather odd, be

Re: [ceph-users] RBD bench read performance vs rados bench

2018-05-15 Thread Jason Dillaman
On Tue, May 15, 2018 at 6:23 AM, Jorge Pinilla López wrote: > rbd bench --io-type read 2tb/test --io-size 4M > bench type read io_size 4194304 io_threads 16 bytes 1073741824 pattern > sequential > SEC OPS OPS/SEC BYTES/SEC > 123 36.13 151560621.45 > 243

Re: [ceph-users] RBD imagen-level permissions

2018-05-15 Thread Jason Dillaman
On Tue, May 15, 2018 at 6:27 AM, Jorge Pinilla López wrote: > Hey, I would like to know if there is any way on luminous to set > imagen-level permissions per user instead of pool-level. If I only have > pool level, then I could have 1 not-secured pool with clients accession > any rbd or hundreds o

[ceph-users] Single ceph cluster for the object storage service of 2 OpenStack clouds

2018-05-15 Thread Massimo Sgaravatto
Hi I have been using for a while a single ceph cluster for the image and block storage services of two Openstack clouds. Now I want to use this ceph cluster also for the object storage services of the two OpenStack clouds and I want to implement that having a clear separation between the two clou

[ceph-users] RBD imagen-level permissions

2018-05-15 Thread Jorge Pinilla López
Hey, I would like to know if there is any way on luminous to set imagen-level permissions per user instead of pool-level. If I only have pool level, then I could have 1 not-secured pool with clients accession any rbd or hundreds of little pools which are a mess. I have read than previously some

[ceph-users] RBD bench read performance vs rados bench

2018-05-15 Thread Jorge Pinilla López
rbd bench --io-type read 2tb/test --io-size 4M bench type read io_size 4194304 io_threads 16 bytes 1073741824 pattern sequential SEC OPS OPS/SEC BYTES/SEC 123 36.13 151560621.45 243 28.61 119988170.65 354 23.02 96555723.10

Re: [ceph-users] Cephfs write fail when node goes down

2018-05-15 Thread Josef Zelenka
Client's kernel is 4.4.0. Regarding the hung osd request, i'll have to check, the issue is gone now, so i'm not sure if i'll find what you are suggesting. It's rather odd, because Ceph's failover worked for us every time, so i'm trying to figure out whether it is a ceph or app issue. On 15/05

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-15 Thread Blair Bethwaite
Sorry, bit late to get back to this... On Wed., 2 May 2018, 06:19 Nick Fisk, wrote: > 4.16 required? > Looks like it - thanks for pointing that out. Wido, I don't think you are doing anything wrong here, maybe this is a bug... I've got RHEL7 + Broadwell based Ceph nodes here for which the sam

Re: [ceph-users] rbd feature map fail

2018-05-15 Thread Ilya Dryomov
On Tue, May 15, 2018 at 10:07 AM, wrote: > Hi, all! > > I use rbd to do something and find below issue: > > when i create a rbd image with feature: > layering,exclusive-lock,object-map,fast-diff > > failed to map: > rbd: sysfs write failed > RBD image feature set mismatch. Try disabling features

Re: [ceph-users] rbd feature map fail

2018-05-15 Thread Jason Dillaman
I believe this is documented by this tracker ticket [1]. [1] http://tracker.ceph.com/issues/11418 On Tue, May 15, 2018 at 1:07 AM, wrote: > Hi, all! > > I use rbd to do something and find below issue: > > when i create a rbd image with feature: > layering,exclusive-lock,object-map,fast-diff > >

[ceph-users] Ceph Luminous - OSD constantly crashing caused by corrupted placement group

2018-05-15 Thread Siegfried Höllrigl
Hi ! We have upgraded our Ceph cluster (3 Mon Servers, 9 OSD Servers, 190 OSDs total) From 10.2.10 to Ceph 12.2.4 and then to 12.2.5. (A mixture of Ubuntu 14 and 16 with the Repos from https://download.ceph.com/debian-luminous/) Now we have the Problem that One ODS is crashing again and aga

Re: [ceph-users] Intel Xeon Scalable and CPU frequency scaling on NVMe/SSD Ceph OSDs

2018-05-15 Thread Wido den Hollander
On 05/14/2018 04:46 PM, Nick Fisk wrote: > Hi Wido, > > Are you trying this setting? > > /sys/devices/system/cpu/intel_pstate/min_perf_pct > Yes, but that doesn't help. I can set it to 80, 100 or any value I like, the CPUs keep clocking down to 800Mhz. At first I was having some issues with

Re: [ceph-users] which kernel support object-map, fast-diff

2018-05-15 Thread xiang....@sky-data.cn
Could give a list about enable or not? - Original Message - From: "Konstantin Shalygin" To: "ceph-users" Cc: "xiang dai" Sent: Tuesday, May 15, 2018 4:57:00 PM Subject: Re: [ceph-users] which kernel support object-map, fast-diff > So which kernel version support those feature? No one

Re: [ceph-users] which kernel support object-map, fast-diff

2018-05-15 Thread Konstantin Shalygin
So which kernel version support those feature? No one kernel support this features yet. k ___ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[ceph-users] which kernel support object-map, fast-diff

2018-05-15 Thread xiang . dai
Hi, all! I use Centos 7.4 and want to use ceph rbd. I found that object-map, fast-diff can not work. rbd image 'app': size 500 GB in 128000 objects order 22 (4096 kB objects) block_name_prefix: rbd_data.10a2643c9869 format: 2 features: layering, exclusive-lock, object-map, fast-diff <===

[ceph-users] Cache Tiering not flushing and evicting due to missing scrub

2018-05-15 Thread Micha Krause
Hi, increasing pg_num for a cache pool gives you a warning, that pools must be scrubed afterwards. Turns out If you ignore this flushing and evicting will not work. You realy should do something like this: for pg in $(ceph pg dump | awk '$1 ~ "^." { print $1 }'); do ceph pg scrub $pg; done

[ceph-users] rbd feature map fail

2018-05-15 Thread xiang . dai
Hi, all! I use rbd to do something and find below issue: when i create a rbd image with feature: layering,exclusive-lock,object-map,fast-diff failed to map: rbd: sysfs write failed RBD image feature set mismatch. Try disabling features unsupported by the kernel with "rbd feature disable".

Re: [ceph-users] ceph's UID/GID 65045 in conflict with user's UID/GID in a ldap

2018-05-15 Thread Yoann Moulin
Hello John, > Hello Yoann. I am working with similar issues at the moment in a biotech > company in Denmark. > > First of all what authentication setup are you using? ldap with sssd > If you are using sssd there is a very simple and useful utility called > sss_override > You can óverride' the

Re: [ceph-users] ceph's UID/GID 65045 in conflict with user's UID/GID in a ldap

2018-05-15 Thread John Hearns
Hello Yoann. I am working with similar issues at the moment in a biotech company in Denmark. First of all what authentication setup are you using? If you are using sssd there is a very simple and useful utility called sss_override You can óverride' the uid which you get from LDAP with the genuine

Re: [ceph-users] a big cluster or several small

2018-05-15 Thread Piotr Dałek
On 18-05-14 06:49 PM, Marc Boisis wrote: Hi, Hello, Currently we have a 294 OSD (21 hosts/3 racks) cluster with RBD clients only, 1 single pool (size=3). We want to divide this cluster into several to minimize the risk in case of failure/crash. For example, a cluster for the mail, another f