[ceph-users] Orphaned objects after removing rbd image

2018-02-26 Thread Krzysztof Dajka
Hi, Recently I discovered that my pool after deleting volumes from openstack doesn't reclaim all the space. I didn't pinpoint problem if it is caused by client (cinder volume) or whether it's within backend itself. For now I've came to conclusion that ceph pool volumes has orphaned objects: Total

Re: [ceph-users] Corrupted files on CephFS since Luminous upgrade

2018-02-26 Thread Jan Pekař - Imatic
I think I hit the same issue. I have corrupted data on cephfs and I don't remember the same issue before Luminous (i did the same tests before). It is on my test 1 node cluster with lower memory then recommended (so server is swapping) but it shouldn't lose data (it never did before). So slow

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Oliver Freyermuth
Am 27.02.2018 um 00:10 schrieb Gregory Farnum: > On Mon, Feb 26, 2018 at 2:59 PM Oliver Freyermuth > mailto:freyerm...@physik.uni-bonn.de>> wrote: > > > >     Does this match expectations? > > > > > > Can you get the output of eg "ceph pg 2.7cd query"? Want to make sure > the ba

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 2:59 PM Oliver Freyermuth < freyerm...@physik.uni-bonn.de> wrote: > > > Does this match expectations? > > > > > > Can you get the output of eg "ceph pg 2.7cd query"? Want to make sure > the backfilling versus acting sets and things are correct. > > You'll find attached:

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 23:48 schrieb Gregory Farnum: > > > On Mon, Feb 26, 2018 at 2:30 PM Oliver Freyermuth > mailto:freyerm...@physik.uni-bonn.de>> wrote: > > Am 26.02.2018 um 23:15 schrieb Gregory Farnum: > > > > > > On Mon, Feb 26, 2018 at 11:48 AM Oliver Freyermuth > mailto:frey

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 23:29 schrieb Gregory Farnum: > > > On Mon, Feb 26, 2018 at 2:23 PM Reed Dier > wrote: > > Quick turn around, > > Changing/injecting osd_recovery_sleep_hdd into the running SSD OSD’s on > bluestore opened the floodgates. > > > Oh right,

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 2:30 PM Oliver Freyermuth < freyerm...@physik.uni-bonn.de> wrote: > Am 26.02.2018 um 23:15 schrieb Gregory Farnum: > > > > > > On Mon, Feb 26, 2018 at 11:48 AM Oliver Freyermuth < > freyerm...@physik.uni-bonn.de > > wrote: > > > > >

Re: [ceph-users] CephFS Single Threaded Performance

2018-02-26 Thread John Spray
On Mon, Feb 26, 2018 at 6:25 PM, Brian Woods wrote: > I have a small test cluster (just two nodes) and after rebuilding it several > times I found my latest configuration that SHOULD be the fastest is by far > the slowest (per thread). > > > I have around 10 spinals that I have an erasure encoded

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 23:15 schrieb Gregory Farnum: > > > On Mon, Feb 26, 2018 at 11:48 AM Oliver Freyermuth > mailto:freyerm...@physik.uni-bonn.de>> wrote: > > >     >     The EC pool I am considering is k=4 m=2 with failure domain > host, on 6 hosts. > >     >     So necessarily, there is

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 2:23 PM Reed Dier wrote: > Quick turn around, > > Changing/injecting osd_recovery_sleep_hdd into the running SSD OSD’s on > bluestore opened the floodgates. > Oh right, the OSD does not (think it can) have anything it can really do if you've got a rotational journal and a

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
Quick turn around, Changing/injecting osd_recovery_sleep_hdd into the running SSD OSD’s on bluestore opened the floodgates. > pool objects-ssd id 20 > recovery io 1512 MB/s, 21547 objects/s > > pool fs-metadata-ssd id 16 > recovery io 0 B/s, 6494 keys/s, 271 objects/s > client io 82325 B/

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread David Turner
I just realized the difference between the internal realm, local realm, and local-atl realm. local-atl is a Luminous cluster while the other 2 are Jewel. It looks like that option was completely ignored in Jewel and now Luminous is taking it into account (which is better imo). I think you're rig

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 11:48 AM Oliver Freyermuth < freyerm...@physik.uni-bonn.de> wrote: > > > The EC pool I am considering is k=4 m=2 with failure domain > host, on 6 hosts. > > > So necessarily, there is one shard for each host. If one host > goes down for a prolonged time, > >

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
For the record, I am not seeing a demonstrative fix by injecting the value of 0 into the OSDs running. > osd_recovery_sleep_hybrid = '0.00' (not observed, change may require > restart) If it does indeed need to be restarted, I will need to wait for the current backfills to finish their proc

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 3:23 AM Caspar Smit wrote: > 2018-02-24 7:10 GMT+01:00 David Turner : > >> Caspar, it looks like your idea should work. Worst case scenario seems >> like the osd wouldn't start, you'd put the old SSD back in and go back to >> the idea to weight them to 0, backfilling, then

Re: [ceph-users] How to correctly purge a "ceph-volume lvm" OSD

2018-02-26 Thread Alfredo Deza
On Mon, Feb 26, 2018 at 12:51 PM, David Turner wrote: > I don't follow what ceph-deploy has to do with the man page for ceph-volume. > Is ceph-volume also out-of-tree and as such the man pages aren't version > specific with its capabilities? It's very disconcerting to need to ignore > the man pag

Re: [ceph-users] erasure-code-profile: what's "w=" ?

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 5:09 AM Wolfgang Lendl < wolfgang.le...@meduniwien.ac.at> wrote: > hi, > > I have no idea what "w=8" means and can't find any hints in docs ... > maybe someone can explain > > > ceph 12.2.2 > > # ceph osd erasure-code-profile get ec42 > crush-device-class=hdd > crush-failur

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 12:26 PM Reed Dier wrote: > I will try to set the hybrid sleeps to 0 on the affected OSDs as an > interim solution to getting the metadata configured correctly. > Yes, that's a good workaround as long as you don't have any actual hybrid OSDs (or aren't worried about them

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread Yehuda Sadeh-Weinraub
I don't know why 'us' works for you, but it could be that s3cmd is just not sending any location constraint when 'us' is set. You can try looking at the capture for this. You can try using wireshark for the capture (assuming http endpoint and not https). Yehuda On Mon, Feb 26, 2018 at 1:21 PM, Da

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread David Turner
I set it to that for randomness. I don't have a zonegroup named 'us' either, but that works fine. I don't see why 'cn' should be any different. The bucket_location that triggered me noticing this was 'gd1'. I don't know where that one came from, but I don't see why we should force people setting

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread Yehuda Sadeh-Weinraub
If that's what you set in the config file, I assume that's what passed in. Why did you set that in your config file? You don't have a zonegroup named 'cn', right? On Mon, Feb 26, 2018 at 1:10 PM, David Turner wrote: > I'm also not certain how to do the tcpdump for this. Do you have any > pointer

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread David Turner
I'm also not certain how to do the tcpdump for this. Do you have any pointers to how to capture that for you? On Mon, Feb 26, 2018 at 4:09 PM David Turner wrote: > That's what I set it to in the config file. I probably should have > mentioned that. > > On Mon, Feb 26, 2018 at 4:07 PM Yehuda Sa

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread David Turner
That's what I set it to in the config file. I probably should have mentioned that. On Mon, Feb 26, 2018 at 4:07 PM Yehuda Sadeh-Weinraub wrote: > According to the log here, it says that the location constraint it got > is "cn", can you take a look at a tcpdump, see if that's actually > what's p

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread Yehuda Sadeh-Weinraub
According to the log here, it says that the location constraint it got is "cn", can you take a look at a tcpdump, see if that's actually what's passed in? On Mon, Feb 26, 2018 at 12:02 PM, David Turner wrote: > I run with `debug rgw = 10` and was able to find these lines at the end of a > request

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
I will try to set the hybrid sleeps to 0 on the affected OSDs as an interim solution to getting the metadata configured correctly. For reference, here is the complete metadata for osd.24, bluestore SATA SSD with NVMe block.db. > { > "id": 24, > "arch": "x86_64", > "back_

Re: [ceph-users] planning a new cluster

2018-02-26 Thread David Turner
Depending on what your security requirements are, you may not have a choice. If your OpenStack deployment shouldn't be able to load the Kubernetes RBDs (or vice versa), then you need to keep them separate and maintain different keyrings for the 2 services. If that is going to be how you go about

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread David Turner
I run with `debug rgw = 10` and was able to find these lines at the end of a request to create the bucket. Successfully creating a bucket with `bucket_location = US` looks like [1]this. Failing to create a bucket has "ERROR: S3 error: 400 (InvalidLocationConstraint): The specified location-constr

[ceph-users] planning a new cluster

2018-02-26 Thread Frank Ritchie
Hi all, I am planning for a new Ceph cluster that will provide RBD storage for OpenStack and Kubernetes. Additionally, there may need a need for a small amount of RGW storage. Which option would be better: 1. Defining separate pools for OpenStack images/ephemeral vms/volumes/backups (as seen her

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 20:42 schrieb Gregory Farnum: > On Mon, Feb 26, 2018 at 11:33 AM Oliver Freyermuth > mailto:freyerm...@physik.uni-bonn.de>> wrote: > > Am 26.02.2018 um 20:23 schrieb Gregory Farnum: > > > > > > On Mon, Feb 26, 2018 at 11:06 AM Oliver Freyermuth > mailto:freyerm..

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 11:33 AM Oliver Freyermuth < freyerm...@physik.uni-bonn.de> wrote: > Am 26.02.2018 um 20:23 schrieb Gregory Farnum: > > > > > > On Mon, Feb 26, 2018 at 11:06 AM Oliver Freyermuth < > freyerm...@physik.uni-bonn.de > > wrote: > > > >

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 20:31 schrieb Gregory Farnum: > On Mon, Feb 26, 2018 at 11:26 AM Oliver Freyermuth > mailto:freyerm...@physik.uni-bonn.de>> wrote: > > Am 26.02.2018 um 20:09 schrieb Oliver Freyermuth: > > Am 26.02.2018 um 19:56 schrieb Gregory Farnum: > >> > >> > >> On Mon, F

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread Yehuda Sadeh-Weinraub
I'm not sure if the rgw logs (debug rgw = 20) specify explicitly why a bucket creation is rejected in these cases, but it might be worth trying to look at these. If not, then a tcpdump of the specific failed request might shed some light (would be interesting to look at the generated LocationConstr

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 20:23 schrieb Gregory Farnum: > > > On Mon, Feb 26, 2018 at 11:06 AM Oliver Freyermuth > mailto:freyerm...@physik.uni-bonn.de>> wrote: > > Am 26.02.2018 um 19:45 schrieb Gregory Farnum: > > On Mon, Feb 26, 2018 at 10:35 AM Oliver Freyermuth > mailto:freyerm...@physik.u

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 11:26 AM Oliver Freyermuth < freyerm...@physik.uni-bonn.de> wrote: > Am 26.02.2018 um 20:09 schrieb Oliver Freyermuth: > > Am 26.02.2018 um 19:56 schrieb Gregory Farnum: > >> > >> > >> On Mon, Feb 26, 2018 at 8:25 AM Oliver Freyermuth < > freyerm...@physik.uni-bonn.de

Re: [ceph-users] Significance of the us-east-1 region when using S3 clients to talk to RGW

2018-02-26 Thread David Turner
Our problem only appeared to be present in bucket creation. Listing, putting, etc objects in a bucket work just fine regardless of the bucket_location setting. I ran this test on a few different realms to see what would happen and only 1 of them had a problem. There isn't an obvious thing that s

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 11:21 AM Reed Dier wrote: > The ‘good perf’ that I reported below was the result of beginning 5 new > bluestore conversions which results in a leading edge of ‘good’ > performance, before trickling off. > > This performance lasted about 20 minutes, where it backfilled a sm

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 20:09 schrieb Oliver Freyermuth: > Am 26.02.2018 um 19:56 schrieb Gregory Farnum: >> >> >> On Mon, Feb 26, 2018 at 8:25 AM Oliver Freyermuth >> mailto:freyerm...@physik.uni-bonn.de>> wrote: >> >> Am 26.02.2018 um 16:59 schrieb Patrick Donnelly: >> > On Sun, Feb 25, 2018 at

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 11:06 AM Oliver Freyermuth < freyerm...@physik.uni-bonn.de> wrote: > Am 26.02.2018 um 19:45 schrieb Gregory Farnum: > > On Mon, Feb 26, 2018 at 10:35 AM Oliver Freyermuth < > freyerm...@physik.uni-bonn.de > > wrote: > > > > Am 26.02

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
The ‘good perf’ that I reported below was the result of beginning 5 new bluestore conversions which results in a leading edge of ‘good’ performance, before trickling off. This performance lasted about 20 minutes, where it backfilled a small set of PGs off of non-bluestore OSDs. Current perform

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 19:56 schrieb Gregory Farnum: > > > On Mon, Feb 26, 2018 at 8:25 AM Oliver Freyermuth > mailto:freyerm...@physik.uni-bonn.de>> wrote: > > Am 26.02.2018 um 16:59 schrieb Patrick Donnelly: > > On Sun, Feb 25, 2018 at 10:26 AM, Oliver Freyermuth > > mailto:freyerm...@p

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 19:45 schrieb Gregory Farnum: > On Mon, Feb 26, 2018 at 10:35 AM Oliver Freyermuth > mailto:freyerm...@physik.uni-bonn.de>> wrote: > > Am 26.02.2018 um 19:24 schrieb Gregory Farnum: > > I don’t actually know this option, but based on your results it’s clear > that “fast

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 9:12 AM Reed Dier wrote: > After my last round of backfills completed, I started 5 more bluestore > conversions, which helped me recognize a very specific pattern of > performance. > > pool objects-ssd id 20 > recovery io 757 MB/s, 10845 objects/s > > pool fs-metadata-ss

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 8:25 AM Oliver Freyermuth < freyerm...@physik.uni-bonn.de> wrote: > Am 26.02.2018 um 16:59 schrieb Patrick Donnelly: > > On Sun, Feb 25, 2018 at 10:26 AM, Oliver Freyermuth > > wrote: > >> Looking with: > >> ceph daemon osd.2 perf dump > >> I get: > >> "bluefs": { > >>

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Gregory Farnum
On Mon, Feb 26, 2018 at 10:35 AM Oliver Freyermuth < freyerm...@physik.uni-bonn.de> wrote: > Am 26.02.2018 um 19:24 schrieb Gregory Farnum: > > I don’t actually know this option, but based on your results it’s clear > that “fast read” is telling the OSD it should issue reads to all k+m OSDs > stor

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 19:24 schrieb Gregory Farnum: > I don’t actually know this option, but based on your results it’s clear that > “fast read” is telling the OSD it should issue reads to all k+m OSDs storing > data and then reconstruct the data from the first k to respond. Without the > fast read i

Re: [ceph-users] rados cppool, very low speed

2018-02-26 Thread Gregory Farnum
“tacos cppool” is a toy. Please don’t use it for anything that matters. :) On Sun, Feb 25, 2018 at 10:16 PM Behnam Loghmani wrote: > Hi, > > I want to copy objects from one of my pools to another pool with "rados > cppool" but the speed of this operation is so low. on the other hand, the > speed

[ceph-users] CephFS Single Threaded Performance

2018-02-26 Thread Brian Woods
I have a small test cluster (just two nodes) and after rebuilding it several times I found my latest configuration that SHOULD be the fastest is by far the slowest (per thread). I have around 10 spinals that I have an erasure encoded CephFS on. When I installed several SSDs and recreated it with

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Gregory Farnum
I don’t actually know this option, but based on your results it’s clear that “fast read” is telling the OSD it should issue reads to all k+m OSDs storing data and then reconstruct the data from the first k to respond. Without the fast read it simply asks the regular k data nodes to read it back str

Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David Turner
The slow requests are absolutely expected on filestore subfolder splitting. You can however stop an OSD, split it's subfolders, and start it back up. I perform this maintenance once/month. I changed my settings to [1]these, but I only suggest doing something this drastic if you're committed to m

Re: [ceph-users] How to correctly purge a "ceph-volume lvm" OSD

2018-02-26 Thread David Turner
I don't follow what ceph-deploy has to do with the man page for ceph-volume. Is ceph-volume also out-of-tree and as such the man pages aren't version specific with its capabilities? It's very disconcerting to need to ignore the man pages for CLI tools. On Mon, Feb 26, 2018 at 12:10 PM Alfredo De

Re: [ceph-users] CephFS very unstable with many small files

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 17:59 schrieb John Spray: > On Mon, Feb 26, 2018 at 4:50 PM, Oliver Freyermuth > wrote: >> Am 26.02.2018 um 17:15 schrieb John Spray: >>> On Mon, Feb 26, 2018 at 4:06 PM, Oliver Freyermuth >>> wrote: Am 26.02.2018 um 16:43 schrieb Patrick Donnelly: > On Sun, Feb 25, 2018

Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David C
Thanks, David. I think I've probably used the wrong terminology here, I'm not splitting PGs to create more PGs. This is the PG folder splitting that happens automatically, I believe it's controlled by the "filestore_split_multiple" setting (which is 8 on my OSDs, I believe that's the Luminous defau

Re: [ceph-users] SSD Bluestore Backfills Slow

2018-02-26 Thread Reed Dier
After my last round of backfills completed, I started 5 more bluestore conversions, which helped me recognize a very specific pattern of performance. > pool objects-ssd id 20 > recovery io 757 MB/s, 10845 objects/s > > pool fs-metadata-ssd id 16 > recovery io 0 B/s, 36265 keys/s, 1633 object

Re: [ceph-users] How to correctly purge a "ceph-volume lvm" OSD

2018-02-26 Thread Alfredo Deza
On Mon, Feb 26, 2018 at 11:24 AM, David Turner wrote: > If we're asking for documentation updates, the man page for ceph-volume is > incredibly outdated. In 12.2.3 it still says that bluestore is not yet > implemented and that it's planned to be supported. > '[--bluestore] filestore objectstore (

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-26 Thread David Turner
I'm glad that I was able to help out. I wanted to point out that the reason those steps worked for you as quickly as they did is likely that you configured your blocks.db to use the /dev/disk/by-partuuid/{guid} instead of /dev/sdx#. Had you configured your osds with /dev/sdx#, then you would have

Re: [ceph-users] CephFS very unstable with many small files

2018-02-26 Thread John Spray
On Mon, Feb 26, 2018 at 4:50 PM, Oliver Freyermuth wrote: > Am 26.02.2018 um 17:15 schrieb John Spray: >> On Mon, Feb 26, 2018 at 4:06 PM, Oliver Freyermuth >> wrote: >>> Am 26.02.2018 um 16:43 schrieb Patrick Donnelly: On Sun, Feb 25, 2018 at 3:49 PM, Oliver Freyermuth wrote: > Am

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread Patrick Donnelly
On Mon, Feb 26, 2018 at 7:59 AM, Patrick Donnelly wrote: > It seems in the above test you're using about 1KB per inode (file). > Using that you can extrapolate how much space the data pool needs s/data pool/metadata pool/ -- Patrick Donnelly ___ ceph-

Re: [ceph-users] CephFS very unstable with many small files

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 17:15 schrieb John Spray: > On Mon, Feb 26, 2018 at 4:06 PM, Oliver Freyermuth > wrote: >> Am 26.02.2018 um 16:43 schrieb Patrick Donnelly: >>> On Sun, Feb 25, 2018 at 3:49 PM, Oliver Freyermuth >>> wrote: Am 25.02.2018 um 21:50 schrieb John Spray: > On Sun, Feb 25, 2018

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 17:31 schrieb David Turner: > That was a good way to check for the recovery sleep.  Does your `ceph status` > show 128 PGs backfilling (or a number near that at least)?  The PGs not > backfilling will say 'backfill+wait'. Yes: pgs: 37778254/593342240 objects degraded (6.

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread David Turner
That was a good way to check for the recovery sleep. Does your `ceph status` show 128 PGs backfilling (or a number near that at least)? The PGs not backfilling will say 'backfill+wait'. On Mon, Feb 26, 2018 at 11:25 AM Oliver Freyermuth < freyerm...@physik.uni-bonn.de> wrote: > Am 26.02.2018 um

Re: [ceph-users] Install previous version of Ceph

2018-02-26 Thread Scottix
I have been trying the dpk -i route but hitting a lot of dependencies, so still working on it. On Mon, Feb 26, 2018 at 7:36 AM David Turner wrote: > In the past I downloaded the packages for a version and configured it as a > local repo on the server. basically it was a tar.gz that I would extr

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 16:59 schrieb Patrick Donnelly: > On Sun, Feb 25, 2018 at 10:26 AM, Oliver Freyermuth > wrote: >> Looking with: >> ceph daemon osd.2 perf dump >> I get: >> "bluefs": { >> "gift_bytes": 0, >> "reclaim_bytes": 0, >> "db_total_bytes": 84760592384, >>

Re: [ceph-users] How to correctly purge a "ceph-volume lvm" OSD

2018-02-26 Thread David Turner
If we're asking for documentation updates, the man page for ceph-volume is incredibly outdated. In 12.2.3 it still says that bluestore is not yet implemented and that it's planned to be supported. '[--bluestore] filestore objectstore (not yet implemented)' 'using a filestore setup (bluestore s

Re: [ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David Turner
Splitting PG's is one of the most intensive and disruptive things you can, and should, do to a cluster. Tweaking recovery sleep, max backfills, and heartbeat grace should help with this. Heartbeat grace can be set high enough to mitigate the OSDs flapping which slows things down by peering and ad

Re: [ceph-users] CephFS very unstable with many small files

2018-02-26 Thread John Spray
On Mon, Feb 26, 2018 at 4:06 PM, Oliver Freyermuth wrote: > Am 26.02.2018 um 16:43 schrieb Patrick Donnelly: >> On Sun, Feb 25, 2018 at 3:49 PM, Oliver Freyermuth >> wrote: >>> Am 25.02.2018 um 21:50 schrieb John Spray: On Sun, Feb 25, 2018 at 4:45 PM, Oliver Freyermuth > Now, with about

[ceph-users] Luminous | PG split causing slow requests

2018-02-26 Thread David C
Hi All I have a 12.2.1 cluster, all filestore OSDs, OSDs are spinners, journals on NVME. Cluster primarily used for CephFS, ~20M objects. I'm seeing some OSDs getting marked down, it appears to be related to PG splitting, e.g: 2018-02-26 10:27:27.935489 7f140dbe2700 1 _created [C,D] has 5121 ob

Re: [ceph-users] CephFS very unstable with many small files

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 16:43 schrieb Patrick Donnelly: > On Sun, Feb 25, 2018 at 3:49 PM, Oliver Freyermuth > wrote: >> Am 25.02.2018 um 21:50 schrieb John Spray: >>> On Sun, Feb 25, 2018 at 4:45 PM, Oliver Freyermuth Now, with about 100,000,000 objects written, we are in a disaster situation

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread David Turner
Patrick's answer supersedes what I said about RocksDB usage. My knowledge was more general for actually storing objects, not the metadata inside of MDS. Thank you for sharing Patrick. On Mon, Feb 26, 2018 at 11:00 AM Patrick Donnelly wrote: > On Sun, Feb 25, 2018 at 10:26 AM, Oliver Freyermuth

Re: [ceph-users] reweight-by-utilization reverse weight after adding new nodes?

2018-02-26 Thread David Turner
I would recommend continuing from where you are now and running `ceph osd reweight-by-utilization` again. Your weights might be a little more odd, but your data distribution should be the same. If you were to reset the weights for the previous OSDs, you would only incur an additional round of rew

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread Patrick Donnelly
On Sun, Feb 25, 2018 at 10:26 AM, Oliver Freyermuth wrote: > Looking with: > ceph daemon osd.2 perf dump > I get: > "bluefs": { > "gift_bytes": 0, > "reclaim_bytes": 0, > "db_total_bytes": 84760592384, > "db_used_bytes": 78920024064, > "wal_total_bytes":

Re: [ceph-users] How to "apply" and monitor bluestore compression?

2018-02-26 Thread Martin Emrich
Hi! Am 26.02.18 um 16:26 schrieb Igor Fedotov: I'm working on adding compression statistics to ceph/rados df reports. And AFAIK currently the only way to monitor compression ration is to inspect osd performance counters. Awesome, looking forward to it :) Cheers, Martin _

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread David Turner
When a Ceph system is in recovery, it uses much more RAM than it does while running healthy. This increase is often on the order of 4x more memory (at least back in the days of filestore, I'm not 100% certain about bluestore, but I would assume the same applies). You have another thread on the ML

Re: [ceph-users] CephFS very unstable with many small files

2018-02-26 Thread Patrick Donnelly
On Sun, Feb 25, 2018 at 3:49 PM, Oliver Freyermuth wrote: > Am 25.02.2018 um 21:50 schrieb John Spray: >> On Sun, Feb 25, 2018 at 4:45 PM, Oliver Freyermuth >>> Now, with about 100,000,000 objects written, we are in a disaster situation. >>> First off, the MDS could not restart anymore - it requir

Re: [ceph-users] Install previous version of Ceph

2018-02-26 Thread David Turner
In the past I downloaded the packages for a version and configured it as a local repo on the server. basically it was a tar.gz that I would extract that would place the ceph packages in a folder for me and swap out the repo config file to a version that points to the local folder. I haven't neede

Re: [ceph-users] How to "apply" and monitor bluestore compression?

2018-02-26 Thread Igor Fedotov
Hi Martin, On 2/26/2018 6:19 PM, Martin Emrich wrote: Hi! I just migrated my backup cluster from filestore to bluestore (8 OSDs, one OSD at a time, took two weeks but went smoothly). I also enabled compression on a pool beforehand and am impressed by the compression ratio (snappy, agressiv

Re: [ceph-users] 【mon】Problem with mon leveldb

2018-02-26 Thread David Turner
Mons won't compact and clean up old maps while any PG is in a non-clean state. What is your `ceph status`? I would guess this isn't your problem, but thought I'd throw it out there just in case. Also in Hammer, OSDs started telling each other when they clean up maps and this caused a map pointer

[ceph-users] How to "apply" and monitor bluestore compression?

2018-02-26 Thread Martin Emrich
Hi! I just migrated my backup cluster from filestore to bluestore (8 OSDs, one OSD at a time, took two weeks but went smoothly). I also enabled compression on a pool beforehand and am impressed by the compression ratio (snappy, agressive, default parameters). So apparently during backfilling

Re: [ceph-users] Migrating to new pools

2018-02-26 Thread Jason Dillaman
On Mon, Feb 26, 2018 at 9:56 AM, Eugen Block wrote: > I'm following up on the rbd export/import option with a little delay. > > The fact that the snapshot is not protected after the image is reimported is > not a big problem, you could deal with that or wait for a fix. > But there's one major prob

Re: [ceph-users] Migrating to new pools

2018-02-26 Thread Eugen Block
I'm following up on the rbd export/import option with a little delay. The fact that the snapshot is not protected after the image is reimported is not a big problem, you could deal with that or wait for a fix. But there's one major problem using this method: the VMs lose their rbd_children

Re: [ceph-users] Install previous version of Ceph

2018-02-26 Thread Ronny Aasen
On 23. feb. 2018 23:37, Scottix wrote: Hey, We had one of our monitor servers die on us and I have a replacement computer now. In between that time you have released 12.2.3 but we are still on 12.2.2. We are on Ubuntu servers I see all the binaries are in the repo but your package cache only

Re: [ceph-users] Linux Distribution: Is upgrade the kerner version a good idea?

2018-02-26 Thread Lenz Grimmer
On 02/25/2018 01:18 PM, Massimiliano Cuttini wrote: > Is upgrade the kernel to major version on a distribution a bad idea? > Or is just safe as like as upgrade like any other package? > I prefer ultra stables release instead of latest higher package. In that case it's probably best to stick with

[ceph-users] erasure-code-profile: what's "w=" ?

2018-02-26 Thread Wolfgang Lendl
hi, I have no idea what "w=8" means and can't find any hints in docs ... maybe someone can explain ceph 12.2.2 # ceph osd erasure-code-profile get ec42 crush-device-class=hdd crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=4 m=2 plugin=jerasure technique=reed_s

Re: [ceph-users] Linux Distribution: Is upgrade the kerner version a good idea?

2018-02-26 Thread Massimiliano Cuttini
Not good. I'm not worried about time and effort. I'm worried to fix this while there is not time. Ceph is builded to avoid downtime, not a good idea create it on an a system with availability issues. It is only with switching (when installing a node), subsequent kernel updates should be instal

[ceph-users] reweight-by-utilization reverse weight after adding new nodes?

2018-02-26 Thread Martin Palma
Hello, from some OSDs in our cluster we got the "nearfull" warning message so we run the "ceph osd reweight-by-utilization" command to better distribute the data. Now we have expanded out cluster with new nodes should we reverse the weight of the changed OSDs to 1.0? Best, Martin ___

Re: [ceph-users] How to correctly purge a "ceph-volume lvm" OSD

2018-02-26 Thread Oliver Freyermuth
Am 26.02.2018 um 13:02 schrieb Alfredo Deza: > On Sat, Feb 24, 2018 at 1:26 PM, Oliver Freyermuth > wrote: >> Dear Cephalopodians, >> >> when purging a single OSD on a host (created via ceph-deploy 2.0, i.e. using >> ceph-volume lvm), I currently proceed as follows: >> >> On the OSD-host: >> $ sy

Re: [ceph-users] How to correctly purge a "ceph-volume lvm" OSD

2018-02-26 Thread Alfredo Deza
On Sat, Feb 24, 2018 at 1:26 PM, Oliver Freyermuth wrote: > Dear Cephalopodians, > > when purging a single OSD on a host (created via ceph-deploy 2.0, i.e. using > ceph-volume lvm), I currently proceed as follows: > > On the OSD-host: > $ systemctl stop ceph-osd@4.service > $ ls -la /var/lib/ceph

Re: [ceph-users] fast_read in EC pools

2018-02-26 Thread Oliver Freyermuth
Some additional information gathered from our monitoring: It seems fast_read does indeed become active immediately, but I do not understand the effect. With fast_read = 0, we see: ~ 5.2 GB/s total outgoing traffic from all 6 OSD hosts ~ 2.3 GB/s total incoming traffic to all 6 OSD hosts With fa

[ceph-users] fast_read in EC pools

2018-02-26 Thread Oliver Freyermuth
Dear Cephalopodians, in the few remaining days when we can still play at our will with parameters, we just now tried to set: ceph osd pool set cephfs_data fast_read 1 but did not notice any effect on sequential, large file read throughput on our k=4 m=2 EC pool. Should this become active immedi

Re: [ceph-users] Proper procedure to replace DB/WAL SSD

2018-02-26 Thread Caspar Smit
2018-02-24 7:10 GMT+01:00 David Turner : > Caspar, it looks like your idea should work. Worst case scenario seems > like the osd wouldn't start, you'd put the old SSD back in and go back to > the idea to weight them to 0, backfilling, then recreate the osds. > Definitely with a try in my opinion,

Re: [ceph-users] Storage usage of CephFS-MDS

2018-02-26 Thread Oliver Freyermuth
Dear Cephalopodians, I have to extend my question a bit - in our system with 105,000,000 objects in CephFS (mostly stabilized now after the stress-testing...), I observe the following data distribution for the metadata pool: # ceph osd df | head ID CLASS WEIGHT REWEIGHT SIZE USEAVAIL %USE

Re: [ceph-users] MDS crash Luminous

2018-02-26 Thread David C
Thanks for the tips, John. I'll increase the debug level as suggested. On 25 Feb 2018 20:56, "John Spray" wrote: > On Sat, Feb 24, 2018 at 10:13 AM, David C wrote: > > Hi All > > > > I had an MDS go down on a 12.2.1 cluster, the standby took over but I > don't > > know what caused the issue. Sc

Re: [ceph-users] how to fix X is an unexpected clone

2018-02-26 Thread Stefan Priebe - Profihost AG
Am 26.02.2018 um 09:54 schrieb Saverio Proto: > Hello Stefan, > > ceph-object-tool does not exist on my setup, do yo mean the command > /usr/bin/ceph-objectstore-tool that is installed with the ceph-osd package ? Yes sorry i meant the ceph-objectstore-tool tool. With that you can remove objects.

Re: [ceph-users] how to fix X is an unexpected clone

2018-02-26 Thread Saverio Proto
Hello Stefan, ceph-object-tool does not exist on my setup, do yo mean the command /usr/bin/ceph-objectstore-tool that is installed with the ceph-osd package ? I have the following situation here in Ceph Luminous: 2018-02-26 07:15:30.066393 7f0684acb700 -1 log_channel(cluster) log [ERR] : 5.111f

Re: [ceph-users] CephFS very unstable with many small files

2018-02-26 Thread Oliver Freyermuth
I second Stijn's question for more details, also on the stress testing. Did you "only" have each node write 2M of files per directory, or each "job", i.e. nodes*(number of cores per node) processes? Do you have monitoring of the memory usage? Is the large amount of RAM actually used on the MDS

Re: [ceph-users] CephFS very unstable with many small files

2018-02-26 Thread Oliver Freyermuth
Hi Stijn, Am 26.02.2018 um 07:58 schrieb Stijn De Weirdt: > hi oliver, > in preparation for production, we have run very successful tests with large sequential data, and just now a stress-test creating many small files on CephFS. We use a replicated metadata pool (4 SS